ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Research Article

All ions must serve: The role of various regimes of data acquisition in joint classifier for intraoperative mass spectrometry-based glial tumour identification  

[version 1; peer review: 1 approved]
PUBLISHED 20 Jul 2023
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

Abstract

Background: Ambient ionisation mass spectrometry, in combination with machine learning techniques, provides a promising tool for rapid intraoperative tumour tissue identification. However, deficiency of non-tumour control samples leads to the classifiers overfitting, especially in neurosurgical applications. Ensemble learning approaches based on the analysis of multimodal mass spectrometry data are able to overcome the overfitting problem through the extended time of data acquisition. In this work, the contribution of each regime of the data acquisition and the requirements for the metrics for further mass spectrometry set-up optimisation are evaluated.
Methods: Two independent datasets of the multimodal molecular profiles, a total of 81 glial tumour and non-tumour pathological tissues, were analysed in a cross-validation set-up. The XGboost algorithm was used to build classifiers, and their performance was evaluated using different testing and validating sets. The individual classifiers for each mass spectrometry regime were aggregated into joint classifiers. The impact of each regime was evaluated by the exclusion of specific regimes from the aggregation.
Results: The aggregated classifiers with excluded regimes show lower accuracy for most, but not all, excluded regimes. False positive rates have been found to be increased in most cases proving the strong effect of the ensemble learning approach on the overcoming of the “small sample size” problem.
Conclusions: The impact of each group of regimes – with different ion polarity, resolution or mass range of spectra was found to be non-linear. It might be attributed to biochemical reasons as well as to the physical limitation of mass analysers. The required metrics for the evaluation of each regime contribution to the classification efficiency should be a numerical estimation of how the classifier depends on any given regime and could not be estimated only by excluding any group of regimes at all.

Keywords

Mass spectrometry, ambient ionisation, glial tumours, ion polarity, ion suppression, lipids

Introduction

The expanding capabilities of ambient ionisation mass spectrometry make metabolic molecular profiling a promising approach for various life science and medical applications.1,2 The simplicity and rapidity of sample collection, processing, and analysis are the key factors explaining the growing interest in ambient ionisation methods for diagnostics tasks inside and outside hospitals.37 On the other hand, the lack of sample purification or separation makes the resulting mass spectra interpretation a challenging task. It arouses interest in the application of machine learning techniques for data analysis.79 Indeed, molecular profiles obtained with ambient ionisation mass spectrometry is a multidimensional data which, however, represents only a part of the molecular composition of an analysed biological specimen. The different ionisation potential leads to competition for charge carriers during the ionisation process, while the limited dynamic range of the detector confines the simultaneous detection of all ions present in a wide range of concentration. Thus, the variety of the detected molecules (chemical space) became restricted, so only partial information on the underlying biological processes could be considered during the analysis.

Oncological surgery, especially neurosurgery, is one of the most demanding fields where new intraoperative tissue classification techniques are urgently needed.1013 The resection volume is the most important characteristic for the patient outcome since complete resection significantly reduces the likelihood of the tumour reoccurrence. In the meantime, excessive resection is not appropriate in neurosurgery, especially in cases where functional centres of the central nervous system are affected by the malignancy. Multiple studies demonstrate the potential of an ambient ionisation molecular profiling implementation as an additional decision-making technique for neurosurgery.3,14,15 The classification of the resected tissue in these studies is based either on the lipid alterations16 accompanying the malignant transformation of cells or on the presence of specific water-soluble metabolites such as 2-hydroxyglutarate;1618 however, the detection of different classes of molecules requires the specific ionisation regimes: ion polarities and mass ranges. Further translation of such studies into clinical practice not only requires the translation of methods from high-resolution mass spectrometers, which are common in the scientific institutes, to low-resolution instruments that prevail in clinical practice,19 but will confront the problem of high natural variability of the molecular composition of malignant and, especially, healthy tissues. The realisation of research on cells or animal models is not suitable in the case of metabolic profiling as cell cultures do not represent the physiological environment of cells, while an animal model does not provide sufficient genetic intra- and intertumoural variabilities.20

The development of classification models for intraoperative glial tumour tissue identification is suffering heavily from the limited availability of control samples, which could not be easily obtained due to medical and ethical reasons. In combination with the aforementioned limitations, it results in a “small sample size” problem which substantially reduces the accuracy of the classification models. As it was demonstrated earlier,21 the joint classifier that combines multiple weak learners fitted on the mass spectrometric data of various regimes boosts the classification accuracy significantly but parallel acquisition of multimodal data may slow down the analysis procedure substantially. Ensemble learning methods provide an opportunity to reduce the data set unbalancing impact on the classification model’s characteristics.22 Indeed, the concurrent consideration of ions detected in different regimes complements the chemical space with molecules of various classes, which is necessary to account for the natural molecular variability of intact and malignant tissues. In this work, we analyse the contribution of each regime of the data acquisition in attempt to evaluate their role in the joint classifier proposed and determine the requirements for the metrics for further mass spectrometry set-up optimisation.

Methods

Ethical considerations

The clinical samples were analysed under an approved N.N. Burdenko National Scientific and Practical Center for Neurosurgery (NSPCN) Institutional Review Board protocol (order Nr. 131 from 17.07.2018) in accordance with the Helsinki Declaration as revised in 2013. The study was conducted in accordance with the recommendations of the ethical committee of the N.N. Burdenko NSPCN order Nr.40 from 12.04.2016 revised by order Nr. 131 from 17.07.2018. A signed written informed consent form explicitly noting that all removed tissues can be used for further research was obtained from all patients.

Data sets

Two independent data sets of human glial tumour and non-tumour pathological tissues were analysed in this work. The first data set consisted of 55 biopsy samples from 41 glioblastoma patients and eight samples from eight non-tumour patients. The second, validation, data set consisted of 26 biopsy samples with a known tumour cell percentage (ranging from 0% to 100%) from 11 glioblastoma patients.3,21 It was specifically noted that all samples from the same patient were included only in one of the aforementioned data sets. The data sets were acquired with inline cartridge extraction mass spectrometry approach10,23 using a hybrid high- and low-resolution LTQ Orbitrap XL ETD mass spectrometer (Thermo Fisher Scientific, San Jose, CA, USA) as described in a preceding work describing the proposed ensemble learning approach.21 Briefly, the brain tumour tissue sample was placed inside a disposable cartridge arranged in front of the vacuum interface of the mass spectrometer. The high voltage and solvent flow consisted of 3:3:3:1 (vol.) methanol:isopropanol:acetonitrile:water supplemented with 0.1% (vol.) acetic acid and was applied to the cartridge to perform extraction and ionisation of the molecules of interest. HPLC-grade solvents and acid were obtained from Merck (Merck KGaA, Darmstadt, Germany). Each molecular profile consisted of spectra acquired in eight different regimes of measurement – all combinations of (1) positive or negative ion mode; (2) high- (Orbitrap mass analyser, resolution 30,000 FWHM at m/z = 400) or low-resolution (linear ion trap mass analyser); (3) wide (100–2000) or narrow (500–1000) m/z range.21,24

Data analysis

All simulations were performed in a standard machine-learning cross-validation set-up. The spectra were normalised, aligned, and all peaks presented in less than ten scans per diagnosis were deleted. The resulting spectra were transformed into a feature matrix with individual scans as rows and peak intensities as columns.21 For the first data set, the resulting matrix rows were divided into training and testing subsets in a 60/40 ratio, ensuring that scans from the same tissue fragment are always assigned to the same set. For each analysed combination of regimes, 10% of all corresponding rows from the training set matrix were chosen at random and were used for the optimisation of XGboost algorithm metaparameters. An optimal set of metaparameters were then used to build a classifier using the training subset, and the classifier performance was evaluated using a testing subset. To assign a class (tumour or non-tumour) to the sample, predictions obtained for each scan corresponding to the sample were aggregated either using voting (Vote) or mean probability (MeanP) calculation.21 In voting, the class predicted by the majority of classifiers for the scan became the final prediction. In probability averaging, the probability of belonging to each class was calculated using XGboost for each scanб and these probabilities were averaged. These aggregated results characterise clinical samples, with all spectra from a sample contributing to the final prediction.21

The mass spectra data were preprocessed using the R environment version 4.0.4 with the R packages MALDIquant25 and the models analysis was performed with the KNIME Analytics Platform ver. 4.5.2. The classification algorithm was realised as the KNIME workflow26 and is available at the KNIME Community Hub via the link https://kni.me/w/dWtqs1_6S2XVP6EG.

Results

The initial data set was used to train classifiers to evaluate the contribution of various regimes. The different subsets were selected from all data for each of the following:

  • all data obtained with positive or negative ion mode (despite the resolution and mass range)

  • all data obtained in high- or low-resolution (despite the polarity and mass range)

  • all data obtained in narrow or wide mass range (despite the polarity and resolution)

The results obtained on testing and validation data sets are shown in Table 1. For all the classifiers, the accuracy was found to be lower than for the joint classifier for all regimes.21 The accuracy obtained for the validation set was lower than for the corresponding training sets. However, the specificity and sensitivity of classifiers change variously through all the variations. The exclusion of positive polarity from the classification process significantly increases the false positive results, which means that many non-tumour samples have been incorrectly identified as tumour samples. In the case of the negative ion mode exclusion, the same effect was observed only for vote-aggregated classifies; however, the probability-based classifier trained solely on positive ion mode data did not show significant accuracy alteration but initially showed a poor false positive rate. Comparison of the impact of different resolutions on the classification demonstrates the tendency of false negative results for the probability-based classifier trained on high-resolution data in contrast to the tendency of false positive results for vote-based. In the case of low-resolution data, both aggregation types show some tendency to false positive tumour identification. The exclusion of any mass range from the data leads to a substantial loss of accuracy due to an increased rate of false positive identification. Such an effect is especially strong in the case of the analysis of narrow mass range data.

Table 1. The classifiers validated on the subsets with obtained by the division of all data by different regimes.

Accuracy – the ratio of correctly identified samples to all samples; sensitivity – the ratio of correctly identified tumour samples to all tumour samples; specificity – the ratio of correctly identified non-tumour pathology samples to all non-tumour pathology samples.

Data setRegimeAggregationAccuracyF-MeasureSensitivitySpecificity
Regimes with different polarities
TrainingPositiveMeanP0,9600,9770,9940,781
ValidatingPositiveMeanP0,9620,9781,0000,750
TrainingPositiveVote0,9850,9910,9821,000
ValidatingPositiveVote0,9230,9571,0000,500
TrainingNegativeMeanP0,9670,9800,9830,892
ValidatingNegativeMeanP0,9230,9571,0000,500
TrainingNegativeVote0,9900,9940,9940,973
ValidatingNegativeVote0,9620,9781,0000,750
Regimes with different resolution
TrainingHigh res.MeanP0,9710,9840,9940,810
ValidatingHigh res.MeanP0,9000,9780,9881,000
TrainingHigh res.Vote0,9940,9970,9941,000
ValidatingHigh res.Vote0,9000,9411,0000,500
TrainingLow res.MeanP0,9710,9830,9880,892
ValidatingLow res.MeanP0,9620,9781,0000,750
TrainingLow res.Vote0,9810,9880,9880,946
ValidatingLow res.Vote0,9620,9781,0000,750
Regimes with different mass range
TrainingNarrowMeanP0,9710,9830,9940,861
ValidatingNarrowMeanP0,8850,9361,0000,250
TrainingNarrowVote0,9810,9890,9940,917
ValidatingNarrowVote0,9230,9571,0000,500
TrainingWideMeanP0,9700,9820,9820,909
ValidatingWideMeanP0,9620,9781,0000,750
TrainingWideVote0,9800,9880,9880,939
ValidatingWideVote0,9620,9781,0000,750

Discussion

The impact of various data acquisition regimes on the classification model is not unexpected, but the reasons for declined accuracy are directly related to the chemical and physical limitations of each experimental regime. The most expected difference is obviously interconnected with the polarity of the data.27 Indeed, positive ions spectra are characterised by the presence of various types of phospholipids28 such as phosphatidylcholines, phosphatidylethanolamines, and phosphatidylserines. Other lipids, such as phosphatidic acids and phosphatidylinositols, could be ionised solely in negative ion mode, as well as many of the metabolites, including specific oncometabolite 2-hydroxyglutarate1618 and nurometabolite N-acetylaspartate.16,29 The lipid metabolism is considerably altered during malignancy resulting in significant changes in the lipid content of the cells. Thus, the classifier based on one polarity only will be characterised with decreased accuracy. For all single-polarity classifiers, the decreased specificity (i.e. the rate of false positive identifications) and increased sensitivity (i.e. the rate of false negative identifications) is observed, which means the aggregation of different polarities into a single classifier substantially helps to overcome overfitting caused by a small sample size problem.

High-resolution mass spectra could be transformed to the same dimension as the low-resolution data,19 but some details might be lost during the transformation. However, on the particular data set of glial tumour tissues analysed in this study, classifiers based on the high-resolution data show a significantly higher loss of accuracy during validation on the samples obtained from patients of the second cohort. The increased efficiency of the tumour tissue classification with the high-resolution-based models originated from the richer feature matrix as a high-resolution mass analyser provides peak-rich spectra, but the same fact negatively affects the classification results for validation data set as high-resolution-based classifiers are more sensitive to the natural molecular variability of the tissues.

Limited dynamic range of various mass analysers could also affect the classification efficiency. The wide mass range allows the accumulation of more ions in each mass spectrum, but some minor ions could be lost due to the limited capacity of ion traps.30 The narrow mass range (from m/z 500 to 1000) is able to detect as many lipids as possible, which could be useful for the determination of the affected metabolic alterations in various lipid classes. However, it was found that the exclusion of the wide mass range from the training data set significantly reduces the classification accuracy. The model became substantially overfitted as the rate of false positive identifications increases significantly. The observed alterations between the lipid composition of the tumour and non-tumour tissues28 affect relatively abundant lipids, thus providing changes in the intensities of the relatively high peaks in mass spectra which could be determined on a wide mass range data. The contribution of the metabolites, which are mostly small molecules,16 to classification accuracy, appears to be important as the classifiers trained on narrow mass range data only lose accuracy more noticeably compared to the wide mass range classifiers.

The obtained results demonstrate the non-linear contribution of each data acquisition regime to the effectiveness of an aggregated classifier regardless of the aggregation method implemented. As the underlying reasons for the observed results are interconnected with the biochemical background of the malignancy and the physical limitations of the method, it is not possible to evaluate the impact of any of the regimes directly and independently. Since the aggregation of multiple regimes improves the accuracy of the classification, it is required to collect a lot of data in every experiment, which means an increase in the time required for data acquisition for each sample. As it is opposed to the main idea of ambient ionisation mass spectrometry – rapidity of the analysis, it is crucial to develop the method to assess the impact of each regime. The required metrics of such an impact should be a numerical estimation of how the classification accuracy and other characteristics change if any of the given regimes are included or excluded from the consideration, which allows one to determine the optimal set of regimes required for the accurate and high-performance implementation of ambient ionisation molecular profiling in clinical practice. It is proposed that the Shapley value31,32 calculation could be found as an appropriate metric; however, the current size of the data sets is supposed to be insufficient to make such calculations at this time, so the further collection of both tumour and control tissues is still required.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 20 Jul 2023
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Pekov SI, Zavorotnyuk DS, Sorokin AA et al. All ions must serve: The role of various regimes of data acquisition in joint classifier for intraoperative mass spectrometry-based glial tumour identification   [version 1; peer review: 1 approved]. F1000Research 2023, 12:858 (https://doi.org/10.12688/f1000research.130001.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 20 Jul 2023
Views
7
Cite
Reviewer Report 04 Sep 2023
Per Malmberg, Chalmers University of Technology, Gothenburg, Sweden 
Approved
VIEWS 7
This is an interesting approach to glial tumor identification using ambient MS in all combinations of positive and negative ion mode, high- and low mass resolution and at different mass ranges. Much of this work has been published before and ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Malmberg P. Reviewer Report For: All ions must serve: The role of various regimes of data acquisition in joint classifier for intraoperative mass spectrometry-based glial tumour identification   [version 1; peer review: 1 approved]. F1000Research 2023, 12:858 (https://doi.org/10.5256/f1000research.142728.r198058)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 20 Jul 2023
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.