The software for interactive evaluation of mass spectrometric imaging heterogeneity [version 1; peer review: awaiting peer review]

Mass spectrometry imaging is a promising tool complement to the histology study for evaluation of presence of different tissue types in the sample. To make this method faster, more accurate and precise we have presented earlier the cosine similarity measure maps (CSMM). The method provides the spatial distribution of cosine similarity measure metrics between chosen MSI pixel and the rest of the image. In cases when samples under test are heterogeneous and not guaranteed to have clear clusters with distinct borders, it is interesting to analyze the heterogeneity, area borders and their sensitivity to reference CSMM pixel selection. Here we present the software for interactive building of CSMM for different parameters, their visual analysis and saving such CSMM in publication-ready quality without additional programming. Source code, example datasets, binaries, and other information are available at https://github.com/EvgenyZhvansky/Interactive_CSMM.


Introduction
Mass spectrometry imaging (MSI) is a technique of building a map of the spatial distribution of molecular features across the tissue of interest without pretreatment. [1][2][3][4] This technique is a promising complement to the gold standard of tissue analysishistology study that is a time-consuming, labor-intensive, and sometimes subjective method.
Each pixel of raw MSI-map is a mass spectrum of the corresponding location in the sample. This multidimensional map is almost impossible to interpret by naked eyes and has to be converted to a simpler representation for better visualization and usability. [5][6][7][8][9][10][11][12] In our previous work, we introduced 13 a fast, precise, and accurate imaging tool based on the interactive building of the cosine similarity measure maps (CSMM) between the reference pixel and the rest of the image. Introduced technique well suited for visual estimation of presence, location, and level of heterogeneity of homogeneous regions in the image. It also allows extraction of the region reference mass spectra, and evaluation of the influence of the reference pixels on the distribution of similarity characteristics on the map.
Here we present a user-friendly interface for building and analysis of CSMMs of MSI data.

Implementation
Here we introduce Interactive CSMM, a MATLAB app, which provides an intuitive graphical interface for interactive evaluation of mass spectrometric imaging heterogeneity. An example of the interface is shown in Figure 1. Interactive CSMM was created in MATLAB R2019b. We also introduce the Python script for converting standard raw imzML file format to mat-file, which is required by the Interactive CSMM. The script currently depends on the following libraries: numpy, psutil, pyimzML and scipy.

Operation
The Interactive CSMM can be launched locally from any computer with MATLAB (R2019b or higher; lower versions also might work properly) installed. Installation and launching instructions are also available. All interfaces and plots of Interactive CSMM are highly interactive, allowing users to visualize data in real-time with interactive selection reference mass spectrum, as well as store the results of the analysis.

Use cases
The program 1) allows the user to interactively select the reference pixel, specify the mass range and other parameters for data binning 2) calculates the CSMM of all imaging data with respect to the selected pixel 3) label homogeneous areas and save the assigned area number, coordinates of the reference pixel and the name of the CSMM image file built on this pixel to a text file 4) save publication-ready images of CSMM with specified resolution.
To improve the interpretability of the image and clean up the pixelation, outliers, and measurement artifacts the program provides the smoothed version of CSMM, the visual map of the boundaries of homogeneous zones, 13 and the mass spectrum of the reference pixel.
The presence in the spectra ions distributed over the tissue equally could cause blurring of the picture. So, we provide users with options to specify ranges is m/z which are reflecting the tissue heterogeneity, and compare positions and shape of homogeneity regions obtained in different m/z ranges.
Another effect that complicates the interpretation of MS images is the presence of transition zones due to gradual changes in the ion's intensity within such zones. In our method, due to the building of the CSMM over the mass range, the transition zones become clearly visible because they include the peaks of both boundary zones. That effect is more difficult to achieve with standard imaging approaches (Supplementary Materials of the method describing article 13 ), which consider the distribution of individual ions.
The additional benefit of our method is that there is no data preprocessing is required other than binning. The parameters of binding can be changed online. No alignment is required as well, it can be replaced with a larger binning. It could be also shown that the method works well with non-normalized data and there is no need for baseline correction.
CSMM allows you to define zones of least and greatest similarity. It does not automatically divide the measured map into zones. But you can find areas with similar spectra by varying the reference pixel. By changing the other parameters (to a greater extent, the mass range), it is possible to optimize the CSMM color map and improve the visualization of heterogeneity of the measured sample. The smooth changes in heterogeneity can be observed as smooth color changes on the CSMM. We tested this method on different data sources 14 (measured with different ion sources: MALDI-imaging, DESI-imaging; and different mass analyzers TOF MS, Orbitrap, ICR MS) (Supplementary Materials of the method describing article 13 ). Our article presents CSMMs for colorectal adenocarcinoma data.
Preprocessing steps and descriptions of operations are presented in the manual.

Conclusion
We presented software that allows users to quickly evaluate the presence and structure of heterogeneous areas in the sample, and manually make a dataset of feature spectrum for the homogeneous zones. By varying the reference spectrum and the mass range used in the construction of the CSMM, it is possible to understand which mass ranges most reflect the heterogeneity and make the boundaries between the zones more contrasting. A more detailed discussion is presented in the method describing article. 13