Role of Artificial intelligence model in prediction of low back pain using T2 weighted MRI of Lumbar spine

Ali Muhaimil; Saikiran Pendem; Niranjana Sampathilla; Priya P S; Kaushik Nayak; Krishnaraj Chadaga; Anushree Goswami; Obhuli Chandran M; Abhijith S

doi:10.12688/f1000research.154680.2

Home Browse Role of Artificial intelligence model in prediction of low back pain...

ALL Metrics

Views

Downloads

Get PDF

Get XML

Export

▬

✚

Research Article

Revised

Role of Artificial intelligence model in prediction of low back pain using T2 weighted MRI of Lumbar spine

[version 2; peer review: 2 approved, 1 approved with reservations, 1 not approved]

Ali Muhaimil¹, Saikiran Pendem ¹, Niranjana Sampathilla ², [...] Priya P S³, Kaushik Nayak¹, Krishnaraj Chadaga⁴, Anushree Goswami², Obhuli Chandran M¹, Abhijith S¹

Ali Muhaimil¹, Saikiran Pendem ¹, [...] Niranjana Sampathilla ², Priya P S³, Kaushik Nayak¹, Krishnaraj Chadaga⁴, Anushree Goswami², Obhuli Chandran M¹, Abhijith S¹

PUBLISHED 10 Oct 2024

Author details Author details

¹ Department of Medical Imaging Technology, Manipal College of Health Professions, Manipal Academy of Higher Education, Karnataka, Manipal, 576104, India
² Department of Biomedical Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, 576104, India
³ Department of Radio Diagnosis and Imaging, Kasturba Medical College, Manipal Academy of Higher Education, Karnataka, Manipal, 576104, India
⁴ Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, 576104, India

Ali Muhaimil
Roles: Data Curation, Formal Analysis, Methodology, Writing – Original Draft Preparation

Saikiran Pendem
Roles: Conceptualization, Data Curation, Formal Analysis, Investigation, Methodology, Resources, Software, Supervision, Validation, Visualization, Writing – Original Draft Preparation, Writing – Review & Editing

Niranjana Sampathilla
Roles: Conceptualization, Formal Analysis, Methodology, Software, Supervision, Validation, Visualization, Writing – Review & Editing

Priya P S
Roles: Conceptualization, Supervision, Validation, Writing – Review & Editing

Kaushik Nayak
Roles: Conceptualization, Data Curation, Formal Analysis, Investigation, Methodology, Resources, Software, Supervision, Validation, Visualization, Writing – Review & Editing

Krishnaraj Chadaga
Roles: Formal Analysis, Software, Supervision, Validation

Anushree Goswami
Roles: Formal Analysis, Validation

Obhuli Chandran M
Roles: Methodology, Writing – Review & Editing

Abhijith S
Roles: Methodology, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

This article is included in the Manipal Academy of Higher Education gateway.

This article is included in the Artificial Intelligence and Machine Learning gateway.

Abstract

Background

Low back pain (LBP), the primary cause of disability, is the most common musculoskeletal disorder globally and the primary cause of disability. Magnetic resonance imaging (MRI) studies are inconclusive and less sensitive for identifying and classifying patients with LBP. Hence, this study aimed to investigate the role of artificial intelligence (AI) models in the prediction of LBP using T2 weighted MRI image of the lumbar spine.

Methods

This was a prospective case-control study. A total of 200 MRI patients (100 cases and controls each) referred for lumbar spine and whole spine screening were included. The scans were performed using 3.0 Tesla MRI (United Imaging Healthcare). T2 weighted images of the lumbar spine were segmented to extract radiomic features. Machine learning (ML) models, such as random forest, decision tree, logistic regression, K-nearest neighbors, adaboost, and deep learning methods (DL), such as ResNet and GoogleNet, were used, and performance measures were calculated.

Results

Our study showed that Random forest and AdaBoost are the most reliable ML models for predicting LBP. Random forest showed high performance with area under curve (AUC) values from 0.83 to 0.88 across all lumbar vertebrae and L2-L3, L3-L4, and L4-L5 intervertebral discs (IVDs), with AUCs of 0.88 the highest at L5-S1 IVD (0.92). Adaboost demonstrated high performance at the L2-L5 vertebrae with AUC values of 0.82 to 0.90, with the highest AUC (0.97) at the L5-S1 IVD. Among the DL models, GoogleNet outperformed the other models at 30 epochs with an accuracy of 0.85, followed by ResNet 18 (30 epochs) with an accuracy of 0.84.

Conclusion

The study demonstrated that ML and DL models can effectively predict LBP from MRI T2 weighted image of the lumbar spine. ML and DL models could also enhance the diagnostic accuracy of LBP, potentially leading to better patient management and outcomes.

Keywords

Deep learning, Machine learning, low back pain, intervertebral discs, lumbar vertebrae

Corresponding authors: Saikiran Pendem, Niranjana Sampathilla, Kaushik Nayak

Competing interests: No competing interests were disclosed.

Grant information: The author(s) declared that no grants were involved in supporting this work.

Copyright: © 2024 Muhaimil A et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Muhaimil A, Pendem S, Sampathilla N et al. Role of Artificial intelligence model in prediction of low back pain using T2 weighted MRI of Lumbar spine [version 2; peer review: 2 approved, 1 approved with reservations, 1 not approved]. F1000Research 2024, 13:1035 (https://doi.org/10.12688/f1000research.154680.2) First published: 10 Sep 2024, 13:1035 (https://doi.org/10.12688/f1000research.154680.1) Latest published: 10 Oct 2024, 13:1035 (https://doi.org/10.12688/f1000research.154680.2)

Revised Amendments from Version 1

As per the reviewer’s suggestion, the advantages of using a wide range of machine learning algorithms and deep learning algorithms such as ResNet and GoogleNet were included in the methodology section. Highlights of the class variability were provided in the methodology section. Clinical significance of the results obtained from classification algorithms were provided in the discussion section.

See the authors' detailed response to the review by Tarun Gangil

Introduction

Low back pain (LBP) is the most prevalent musculoskeletal condition worldwide and the leading cause of disability. In 2019, it held the 9^th position in disability-adjusted life years (DALYs) accounting for 2.5% of the overall “DALYS.” LBP was the primary cause of years lived with disability (YLDs), representing 7.41% of the total YLDS. In 2020, there were more than half a billion prevalent cases of LBP globally, and projections indicate that this number will exceed 800 million by 2050. Although age-standardized rates have slightly decreased over the past three decades, the number of LBP cases continues to increase owing to population growth and aging, particularly in Asia and Africa.¹^,² LBP can be caused by various factors, including lifestyle, psychological, and social factors. To reduce the incidence of LBP, it is essential to address modifiable risk factors, such as smoking and obesity, which are associated with a high risk of developing condition.³^,⁴ LBP can result from injuries or degenerative changes in the lumbar region, including facet joints, intervertebral discs (IVD’s), ligaments, and muscles. It is also associated with annual tears, disc height reduction, facet degeneration, and end-plate abnormalities such as Schmorl’s nodes, fractures, erosion, and calcifications.⁵^–⁸

MRI of the spine is a noninvasive technique that is regarded as the gold standard for detecting and diagnosing spinal diseases. T2 weighted MRI enhances tissue contrast and offers greater sensitivity than traditional CT imaging for diagnosing conditions such as IVD herniation, nerve root entrapment, and spinal canal stenosis. MRI can identify IVD degeneration and vertebral endplate changes, which are associated with clinically significant LBP. Imaging studies have revealed that 87% of asymptomatic individuals also exhibit lumbar IVD abnormalities on MRI.⁹^–¹³

Radiomics is a vital medical technique used in clinical practice for evaluation, diagnosis, selection of a course of treatment, and monitoring. Radiomics, a rapidly advancing artificial intelligence (AI) method in medical imaging, can objectively, reproducibly, and efficiently extract numerous quantitative features from medical images. These features are used to develop radiomic models or signatures that aid in interpreting various clinical phenotypes, such as patient genotyping, treatment efficacy, and clinical outcomes.¹⁴^–¹⁷

AI encompasses systems that can generate accurate interference from large datasets using advanced computational algorithms. Similar to humans, machines require learning for intelligent behavior. Therefore, AI includes various learning algorithms, such as machine learning (ML) and increasingly popular deep learning (DL) algorithms. Although AI originated in the 1950s, its development has accelerated since 2000, owing to advancements in computational power. Currently, AI technology provides indispensable tools for intelligent data analysis, particularly for solving medical diagnostic problems. The relationship between radiomics and AI is symbiotic. The high-dimensional nature of radiomics demands powerful analytic tools, and AI, with its advanced capabilities, is well-suited for this task. Conversely, AI applications with medical images rely on radiomics because the metrics used to train and build AI models are derived from radiomic approaches, specifically through feature extraction and feature engineering techniques.¹⁸^–²⁶

Few studies have assessed the utility of radiomics-based ML models and DL techniques for predicting LBP. Hence, this study aimed to investigate the role of AI models in the prediction of LBP using T2 weighted MRI image of the lumbar spine.

Methods

This was a prospective, case-control study. The institutional ethical committee (IEC2:179/2023) was obtained from Kasturba medical college and Hospital, Manipal, India on 20^th July 2023, followed by the Clinical trial registry (CTRI) registration: CTRI/2023/08/056954, 25/08/2023, https://ctri.nic.in/Clinicaltrials/login.php. Written informed consent (IC) was obtained from all the participants.

Eligibility criteria

Patients referred for MRI of the lumbar spine and whole-spine screening were included. The patients were screened using the questionnaire “Delphi definitions of low back pain prevalence (DOLBaPP)”²⁷ questionnaire to check for LBP prevalence. Patients were considered symptomatic if they experienced LBP for 12 months. Patients were considered asymptomatic if they experienced no current back pain and no memory (severe or disabling) back pain. A total of 100 cases and controls (Mean age; cases: 48.48±16.1 years; controls: 51.46±18.4 years) were included. Demographic details of the patients are shown in Table 1. Patients with tumors, severe osteoporosis, or previous spine surgeries were excluded.

Table 1. Showing the demographic details of the population.

	Cases (Symptomatic)	Controls (Asymptomatic)
Subject (n)	100	100
Age in years (Mean±SD)	48.48±16.1	51.46±18.4
Gender	42 (Females)	40 (Females)
Gender	58 (Males)	60 (Males)

MRI Image acquisition: All MRI scans were performed with 3.0 Tesla MRI (United Imaging uMR 780). The MRI image acquisition parameters are listed in Table 2.

Table 2. showing the acquisition parameters of T2 weighted MRI Lumbar spine.

Parameter	Sagittal T2
Sequence	Fast Recovery Fast Spin echo (FRFSE)
TR (msec)	2494
TE (msec)	100
Matrix size	224 × 199
Slice thickness (mm)	3.5
Flip angle (Degrees)	90

Segmentation and radiomic feature extraction

The DICOM MRI images of MRI T2 weighted images of the lumbar vertebrae and disc space for each patient were loaded into the 3D slicer software (Version 4.10.2). Segmentation of the lumbar vertebrae and intervertebral discs (IVD) was performed manually (Figure 1). Radiomic features from the lumbar vertebrae and IVDs were extracted for both the cases and controls (Supplementary file 1).

Figure 1. Showing the segmentation of lumbar vertebrae and intervertebral disc on T2 weighted image.

Machine learning model

ML classifiers such as Random Forest, Decision tree, logistic regression, k-nearest neighbors (KNN), and AdaBoost were used. We have utilized a wide range of ML classifiers since different classifiers may perform better with data features and this allows in through benchmarking and the selection of optimal model for a specific problem. Each ML method has its own advantages, random forest excels in robust and accuracy, decision tree offers interpretability, logistic regression is effective for linear relationships, KNN is good for smaller dataset, adaboost improves performance by merging weak learners. The ML classifiers were run in the Conda virtual environment, which was integrated with Python (version 3.9.7).²⁸ Several libraries such as NumPy,²⁹ scikit,³⁰ pandas,³¹ seaborn,³² matplotlib,³³ and others were installed to support the analysis. The training of the models utilized 8 GB of RAM, along with an Intel^®core^TM i5 Central Processing Unit (HP ProBook 440). The study was conducted on a 64-bit Windows operating system to run the classifiers.

Data normalization: This is an essential step because it assigns equal weight to each variable, preventing any single variable from disproportionately influencing the model performance owing to its large numerical values. In our study, the min-max normalization (rescaling) technique was employed for the entire dataset.

Feature selection: Mutual information method was used for the feature selection of the top 20 radiomic features at each lumbar vertebra and IVD for both cases and controls.

Model training and validation

The data were split into training and testing ratios of 80:20. The data were subjected to five-fold cross-validation, where different subsets were trained to assess model efficiency. The input data were split into five equal parts: four groups for training and five for testing using various permutations and combinations in the cross-validation process. The parameters were hypertuned using a grid search technique that automates this tuning to determine the best values.

Performance metrics of the ML models

The performance metrics of the ML models for the test dataset were assessed using accuracy, precision, F1 score, area under the curve (AUC), Hamming loss, Jaccord score, log loss, and Mathew’s correlation coefficient (MCC).

Validation of the testing model from the confusion matrix was assessed using

Accuracy = ((TP + TN) / (TP + TN + FP + FN))

Precision = (TP / (TP + FP)) = PPV

Recall = (TP / (TP + FN))

F1 = (2_(Precision_Recall) / (Precision + Recall))

Where, TP - True positive, TN - True negative, FP - False positive, FN - False negative, PPV - Positive predictive value.

Deep learning Model: MRI Images of the lumbar spine were collected in the Joint Photographic Experts Group (JPEG) format. The input images were cropped and resized to 184 × 282 pixels to mainly include the lumbar vertebrae and disc space. Intensity normalization was performed on all images such that the pixel values across multiple images were normalized to the same statistical distribution, facilitating improved analysis of MRI images. Further as an assistance DL, a subset of AI, was used. Transfer learning is a DL technique in which pre-trained networks are utilized to train the model for custom usage. One of the major advantages of this method is that it avoids training the network from scratch by using weights that are trained on the 1000 class ImageNet dataset. There are multiple pre-trained models in which GoogleNet and ResNet (18 and 50) were used (Figures 2–4). These are convolutional neural networks, meaning that convolution layers play a major role. These are feature extraction layers that perform convolution with different kernels. GoogleNet and ResNet were chosen due to their powerful ability to learn complex patterns in data, especially in image analysis, medical diagnostics and classification problems. Both are exceptional at automatically learning deep features particularly involving images, where they can capture complex and minute details. GoogleNet inception modules process input using parallel convolution layers with varying kernel sizes, enhancing efficiency by capturing features at different scales with fewer parameters. ResNet solves the vanishing gradient issue, making it possible to train very deep networks efficiently. This permit learning more complicated representations improves performance and tasks like image classification and object recognition.

Figure 2. Architectural configuration delineating the structure of ResNet50.

Figure 3. Architectural configuration delineating the structure of ResNet18.

Figure 4. Architectural configuration delineating the structure of GoogleNet.

GoogLeNet is a part of the inception model, which is a 22-layer deep network that is computationally efficient. On the other hand, ResNet has an advantage over other networks because it consists of skip connections, which avoids the vanishing gradient.

The dataset was divided into a 90:10 training and test split ratio. The training set was further divided into training and validation sets. The validation of the dataset occurs simultaneously during training. Deep learning (DL) models were implemented using MATLAB 2023b owing to its better visualization and ease of use.

To obtain optimum results, the hyperparameters were adjusted. Epochs are the number of times the entire dataset is passed through the network for training. In this case, the dataset was trained for 30, 50, and 100 epochs, respectively. Initial learn rate is the amount of learning that happens at a step i.e. step size at which parameters are updated during training process. This was set to 0.001. The optimizer used was a Stochastic Gradient Descent with momentum (sgdm).

DL model performance was assessed using the specificity, sensitivity, Precision, NPV, Recall, F1 score.

A binary classification problem which helps in predicting the LBP was used for ML and DL methods.

Results

In our study we included 100 symptomatic and asymptomatic cases.

The mean age and sex of the symptomatic and asymptomatic cases are shown in Table 1.

In our study, we analyzed ML models based on radiomic features and DL methods to predict LBP in symptomatic and asymptomatic cases.

Feature reduction for ML model development: The top 20 radiomic features for each lumbar vertebra and IVD were identified using a mutual information algorithm and are presented in Tables 3, 4.

Table 3. Showing the top 20 radiomic features selected at each vertebrae level for ML Models.

S.No.	Vertebrae	Features selected for ML models
1	L1	Dependence entropy, Run Length Non-Uniformity, Energy, Zone Entropy, Total Energy, Size Zone Non-Uniformity Normalized, Large Area Emphasis, Maximum 2D Diameter Row, Small Area Emphasis, Zone Variance, Robust Mean Absolute Deviation, Idn, Least Axis Length, Difference Variance, Strength, Imc2, Long Run High Grey Level Emphasis, Joint Entropy, entropy, Large Dependence High Grey Level Emphasis
2	L2	Surface Area, Dependence Entropy, Total Energy, Long Run High Gray Level Emphasis, Maximum 2D Diameter Row, Entropy, Idn, Imc2, Energy, Kurtosis, Least AxisLength, Small Dependence Low Gray Level Emphasis, Mean, Cluster Tendency, Run Percentage, Run Length Non Uniformity, Size Zone Non Uniformity Normalized, MCC, Long Run Low Gray Level Emphasis, Large Dependence Emphasis
3	L3	Least Axis Length, Idmn, Long Run High Grade Level Emphasis, Mean, Run Entropy, Run Length Non Uniformity Normalized, Strength, Kurtosis, Inverse Variance, Maximum3D Diameter, Run Variance, Sum Entropy, Auto Correlation, Maximum, Size Zone Non Uniformity Normalised, Short Run Emphasis, Skewness, Small Dependence High Gray Level Emphasis, Large Dependence High Gray Level Emphasis, Correlation
4	L4	Minor Axis Length, Large Dependence Emphasis, Gray Level Variance, Gray Level Non-Uniformity Normalized, Coarseness, Sum Squares, Difference Variance, Maximum 2D Diameter Row, Large Area High Gray Level Emphasis, Id, Elongation, Zone Entropy, Contrast, Maximum 2D Diameter Slice, Large Area Low Gray Level Emphasis, Mean, Large Dependence Low Gray Level Emphasis, Least Axis Length, Root Mean Squared, Low Gray Level Emphasis
5	L5	Contrast, Maximum, Small Area Low Gray Level Emphasis, Mean, Complexity, Least Axis Length, Range, Large Area Low Gray Level Emphasis, Idmn, Gray Level Non-Uniformity, Interquartile Range, Run Variance, 90 Percentile, Cluster Shade, Difference Entropy, Gray Level Variance, Maximum Probability, Gray Level Non Uniformity, Large Dependence Low Gray Level Emphasis, Run Entropy

Table 4. Showing the top 20 radiomic features selected at each IVD for ML Models.

S.No.	Intervertebral disc	Features selected for ML models
1	L1-L2	Sum Squares, Cluster Tendency, Zone Variance, Root Mean Squared, Short Run High Gray Level Emphasis, Joint Average, Sum Average, Large Area Emphasis, Gray Level Non-Uniformity Normalized, Entropy, Cluster Shade, Short Run Emphasis, High Gray Level Run Emphasis, Maximum 2D Diameter Row, Run Percentage, Dependence Entropy, Contrast, Median, Run Entropy, Interquartile Range
2	L2-L3	Maximum 2D Diameter Row, Short Run Emphasis, Maximum, High Gray Level Zone Emphasis, Maximum Probability, Difference Variance, Dependence Entropy, Id, Gray Level Non Uniformity Normalized, Contrast, Gray Level Non Uniformity Normalized, Gray Level Variance, High Gray Level Emphasis, Skewness, Range, Joint Entropy, Cluster Prominence, Run Variance, Low Gray Level Run Emphasis, Complexity
3	L3-L4	High Gray Level Zone Emphasis, Maximum 3D Diameter, Maximum, Range, Robust Mean Absolute Deviation, Auto correlation, Small Area High Gray Level Emphasis, Low Gray Level Zone Emphasis, Mean, Run Variance, Zone Entropy, Interquartile Range, Energy, Sum Entropy, Joint Average, Sum Average, Gray Level Variance, Long Run High Gray Level Emphasis, Cluster Prominence, Gray Level Normalized
4	L4-L5	Low Gray Level Emphasis, Dependence Entropy, Entropy, Minor Axis Length, Correlation, Small Area High Gray Level Emphasis, Maximum Probability, Difference Variance, Dependence Non Uniformity Normalized, Contrast, McC, Sum Average, Joint Average, Maximum 2D Diameter Column, Dependence Non Uniformity, Maximum 2D Diameter Row, Run Variance, Run Length Non Uniformity Normalized, Dependence Variance, Energy
5	L5-S1	Flatness, Zone Percentage, Least Axis Length, Entropy, Joint Entropy, Gray Level Non Uniformity Normalized, Joint Energy, Gray Level Non Uniformity, Size Zone Non Uniformity, Coarseness, Dependence Non Uniformity, Surface Area, Low Gray Level Zone Emphasis, Short Run High Gray Level Emphasis, Maximum, Cluster Shade, Short Run Emphasis, Dependence Non Uniformity Normalized, Range, Run Length Non Uniformity.

Machine learning (ML) classifiers

In this study, ML methods such as Random Forest, Decision tree, Logistic regression, KNN, Adaboost were studied. The performance of the ML classifiers at the lumbar vertebrae and intervertebral disc using five-fold cross-validation is shown in Tables 5, 6⁴³ for the five classifier models.

Lumbar vertebrae

The random forest showed high performance across all lumbar vertebrae, with AUC values from 0.83 to 0.88 across all lumbar vertebrae. Decision tree models exhibited moderate performance with AUC values between 0.65 and 0.76, suggesting lower predictive accuracy compared to other models. Logistic regression performed well, particularly at L5 with an AUC of 0.82, and maintained good performance across other vertebral levels with AUC values from 0.73 0.79. KNN also showed strong performance, especially at L2-L4 vertebrae with AUC values of 0.79 to 0.83, and slightly lower AUC values at L1(0.70) and L5(0.68). AdaBoost demonstrated high performance at L2–L5 vertebrae with AUC values of 0.82 to 0.90, although its performance at L1 was moderate, with an AUC of 0.67. The ML models showed slightly improved performance at the lower vertebral levels (L4 and L5) compared to the upper vertebral levels (L1-L3) Figures 5, 6.

Figure 5. ROC curve and confusion matrix for random forest (a,b) and adaboost (c,d) at L4.

Figure 6. ROC curve and confusion matrix for random forest (a,b) and adaboost (c,d) at L5.

Lumbar intervertebral disc

Random forest showed strong performance at L2-L3, L3-L4 and L4-L5 IVD’s with AUCs of 0.88 and the highest at L5-S1 IVD (AUC-0.92). Decision tree models showed moderate performance, with the highest AUC at the L5-S1 IVD (0.85) and lower values at other disks, particularly at the L4-L5 IVD (AUC-0.65). Logistic regression showed the highest AUC at L3-L4 IVD (0.90) and maintained good performance at other disks, with AUC ranging from 0.79 0.87. KNN showed the highest AUC at the L4-L5 disk IVD (0.88) and moderately at other IVD disk between 0.73 and 0.78. Adaboost showed the highest AUC (0.97) at the L5-S1 IVD and exhibited strong results at the L2-L3 (0.86) and L3-L4 (0.83) IVD. The random forest and adaboost models showed high performance, particularly at the L5-S1 IVD (Figure 7).

Figure 7. ROC curve and confusion matrix for random forest (a,b) and adaboost (c,d) at L5-S1 IVD.

Deep learning methods

The performance measures of the DL methods for LBP prediction are presented in Table 7.

Table 7. Showing the performance measures of DL methods for test dataset in prediction of LBP.

DL Model	Performance measures
DL Model	Sensitivity	Specificity	Precision	NPV	FPR	Accuracy	F1 score	MCC
ResNet50, 30 Epoch	0.77	0.81	0.82	0.75	0.19	0.79	0.79	0.57
GoogleNet, 30 Epoch	0.85	0.86	0.86	0.85	0.14	0.85	0.86	0.71
ResNet18, 30 Epoch	0.80	0.89	0.91	0.77	0.10	0.84	0.85	0.69
ResNet50, 50 epoch	0.78	0.83	0.85	0.75	0.16	0.80	0.81	0.61
GoogleNet, 50 Epoch	0.83	0.85	0.85	0.83	0.15	0.84	0.84	0.68
ResNet18, 50 Epoch	0.79	0.91	0.92	0.75	0.09	0.84	0.85	0.69
ResNet50, 100 Epoch	0.78	0.83	0.85	0.75	0.16	0.80	0.81	0.61

GoogleNet at 30 epochs outperformed the other DL models in terms of accuracy (0.85) and F1 score (0.86) for predicting LBP. ResNet 18 at 30 epochs had the second highest performance, with high accuracy (0.84) and F1 score (0.85). ResNet 50 showed consistent results at both 50 epochs (0.80-accuracy and 0.81-F1 score) and 100 epochs (accuracy-0.80 and F1 score-0.81), but with slightly lower performance metrics than GoogleNet and ResNet 18 at 30 epochs (accuracy-0.84 and F1 score-0.85).

Discussion

In our study, we used ML and DL algorithms to predict LBP by using T2 weighted images of the lumbar spine. MRI studies have shown that a significant percentage of asymptomatic patients have abnormalities related to lumbar intervertebral discs. Imaging studies often fail to provide definitive answers regarding the source of pain. Imaging techniques are valuable tools for diagnosis and are often clinically inconclusive in identifying the precise etiology of low back pain. The high prevalence of asymptomatic abnormalities, risk of overdiagnosis, and lack of correlation between imaging findings and pain highlight the need for a cautious and judicious approach to the use of imaging in LBP management. Clinicians should rely on thorough clinical assessment and consider imaging findings as part of a broader diagnostic strategy rather than the sole determinant of patient care.³⁴^–³⁶

Our study noted that ML models showed improved performance at the lower vertebral levels (L4 and L5) compared to the upper vertebral levels (L1-L3) and random forest and adaboost models exhibited particularly high performance at the L5-S1 IVD. Abdollah et al.³⁷ reported that texture features extracted from T2 maps revealed significant textural differences in the L5-S1 lumbar IVD, upper and lower endplate regions, and the L4-5 lower endplate regions between individuals who are symptomatic and asymptomatic of LBP, which may not be apparent to the naked eye. The IVD and endplate zones of patients with LBP were more anisotropic, suggesting different patterns of degeneration due to varying patterns of collagen network destruction. Increased anisotropy may indicate fluid redistribution and changes in hydrostatic pressure, causing an uneven load distribution in pain-sensitive areas. Differences in Gray Level Co-occurrence Matrix features such as contrast, energy, and homogeneity provide additional evidence for the hypothesis of unique degeneration patterns in LBP. The random forest algorithm and Gini importance index indicate energy as a unique feature for classification. Ketola et al.³⁸ also reported difference in T2 weighted images analyzed using logistic regression to classify textural features based on a pain questionnaire in a sample of 518 subjects. The best classification accuracy (83%) and AUC (0.91) were achieved at the lowest two IVDS, with a specificity score of 83% and a sensitivity score of 82%. These results suggest that texture features in the lower lumbar discs (L4-L5 and L5-S1) are more predictive of LBP, supported by the findings of increased anisotropy and genetic correlations. Another study by Aggarwal et al.³⁹ reported that decreased L2 and L4 disc heights significantly predicted LBP. They also reported that thickening of the ligamentum flavum, particularly at the lower lumbar levels, contributes to spinal stenosis and LBP.

The DL models used in our study were useful for predicting LBP using MRI. GoogleNet with 30 epochs showed the highest performance with an accuracy of 0.85 and an F1 score (0.86) for predicting LBP. Won et al.⁴⁰ employed a CNN to automatically grade spinal stenosis on MRI images of 542 patients, obtaining accuracy measures of 83.0% and 77.9% in comparison to the ground truth assessed by two separate doctors. Jamaludin et al.⁴¹^,⁴² developed a CNN that segments the vertebrae and intervertebral discs with an accuracy of 95.6%. This model also identifies disc narrowing, marrow changes, endplate defects, spondylolisthesis, and central canal stenosis, and performs Pfirrmann grading with accuracy percentages ranging from 70.1% and 95.4%. Additionally, it can directly highlight abnormalities of the IVD and vertebrae using heatmaps, referred to as evidence hotspots⁴¹^,⁴²

According to our study, ML and DL models could provide more efficient, reliable, noninvasive diagnostic insights by accurately identifying abnormalities in the lumbar vertebrae and intervertebral discs (IVDs), even in cases where conventional MRI image assessments were inconclusive. By improving the ability to predict LBP, ML and DL algorithms could guide better clinical-decision making, reducing unnecessary surgical interventions.

Our study had a few limitations. First, the sample size is sufficient for the initial analysis; a larger sample size could provide more robust results and improve the reliability of machine learning (ML), and deep learning (DL) models. Second, manual segmentation of the lumbar vertebrae and IVD is time-consuming and subject to inter-operator variability. Automated segmentation methods can enhance reproducibility and efficiency. Third, the study did not include risk factors, radiological findings, or their role in assessing LBP using machine learning methods.

Conclusion

Our study found that ML classifiers, such as random forest and adaboost, exhibited the highest performance, particularly in the lower lumbar vertebrae and IVD, while decision tree and logistic regression models showed moderate performance in the prediction of LBP. For DL methods, GoogleNet achieved the best results at 30 epochs, followed closely by ResNet, which demonstrated high precision and specificity. Our findings highlight the potential of advanced ML and DL techniques for accurately predicting LBP, with random forest, AdaBoost, and GoogleNet showing the most promising results.

Ethical approval

This was a prospective, case-control study. The institutional ethical committee (IEC2:179/2023) was obtained from Kasturba medical college and Hospital, Manipal, India on 20^th July 2023, followed by the Clinical trial registry (CTRI) registration: CTRI/2023/08/056954, 25/08/2023, https://ctri.nic.in/Clinicaltrials/login.php. Written Informed consent (IC) was obtained from all the participants.

Data availability

Underlying data

Figshare: F1000 ML and DL Data, https://doi.org/10.6084/m9.figshare.26394847.v2.⁴³

This project contains following underlying data:

• Radiomic features of lumbar spine cases (demographic characteristics of cases, radiomic features, spreadsheet)
• Radiomic features of controls of the lumbar spine (demographic characteristics of controls, radiomic features–spreadsheet)
• Anonymous Images cases (MRI JPEG images)
• Anonymous Images controls (MRI JPEG images)

Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).

Extended data

Figshare: F1000 ML and DL Data, https://doi.org/10.6084/m9.figshare.26394847.v2 ⁴³

This project contains following Extended data:

• Table 5 and 6
• Supplementary file 1

References

1. GBD 2021 Low Back Pain Collaborators: Global, regional, and national burden of low back pain, 1990-2020, its attributable risk factors, and projections to 2050: a systematic analysis of the Global Burden of Disease Study 2021. Lancet Rheumatol. 2023; 5(6): e316–e329.
2. Gu Y, Wang Z, Shi H, et al.: Global, Regional, and National Change Patterns in the Incidence of Low Back Pain From 1990 to 2019 and Its Predicted Level in the Next Decade. Int. J. Public Health. 2024 Feb; 69(69): 1606299. Publisher Full Text
3. Hartvigsen J, Hancock MJ, Kongsted A, et al.: What low back pain is and why we need to pay attention. Lancet. 2018; 391(10137): 2356–2367. PubMed Abstract | Publisher Full Text
4. Chou R, Shekelle P: Will this patient develop persistent disabling low back pain? JAMA. 2010; 303(13): 1295–1302. Publisher Full Text
5. Videman T, Battie MC, Gibbons LE, et al.: Associations between back pain history and lumbar MRI findings. Spine. 2003 Mar; 28(6): 582–588. Publisher Full Text
6. Videman T, Nurminen M: The occurrence of anular tears and their relation to lifetime back pain history: a cadaveric study using barium sulfate discography. Spine (Phila Pa 1976). 2004; 29(23): 2668–2676. PubMed Abstract | Publisher Full Text
7. Boswell MV, Singh V, Staats PS, et al.: Accuracy of precision diagnostic blocks in the diagnosis of chronic spinal pain of facet or zygapophysial joint origin. Pain Physician. 2003; 6(4): 449–456. PubMed Abstract | Publisher Full Text
8. Wang Y, Videman T, Battie MC: ISSLS prize winner: Lumbar vertebral endplate lesions: associations with disc degeneration and back pain history. Spine (Phila Pa 1976). 2012; 37(17): 1490–1496. PubMed Abstract | Publisher Full Text
9. Farshad-Amacker NA, Farshad M, Winklehner A, et al.: MR imaging of degenerative disc disease. Eur. J. Radiol. 2015; 84(9): 1768–1776. Publisher Full Text
10. Gundry CR, Fritts HM: Magnetic resonance imaging of the musculoskeletal system. Part 8. The spine, section 2. Clin. Orthop. Relat. Res. 1997; 343(343): 260–271. Publisher Full Text
11. Reddy MM, Gangavelli R, Priyanka P, et al.: Influence of Lumbar Spinal Canal Dimensions on Neurological Claudication Symptomatology- A Case Control Study. Biomed. Pharmacol. J. 2021; 14(2): 1019–1024. Publisher Full Text
12. Jarvik JJ, Hollingworth W, Heagerty P, et al.: The Longitudinal Assessment of Imaging and Disability of the Back (LAIDBack) Study: baseline data. Spine (Phila Pa 1976). 2001; 26(10): 1158–1166. PubMed Abstract | Publisher Full Text
13. Boden SD, Davis DO, Dina TS, et al.: Abnormal magnetic-resonance scans of the lumbar spine in asymptomatic subjects. A prospective investigation. J. Bone Joint Surg. Am. 1990; 72(3): 403–408. PubMed Abstract
14. Aerts HJ, Velazquez ER, Leijenaar RT, et al.: Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach. Nat. Commun. 2014; 5: 4644. Publisher Full Text
15. Hood L, Friend SH: Predictive, personalized, preventive, participatory (P4) cancer medicine. Nat. Rev. Clin. Oncol. 2011; 8(3): 184–187. PubMed Abstract | Publisher Full Text
16. Peng H, Dong D, Fang MJ, et al.: Prognostic Value of Deep Learning PET/CT-Based Radiomics: Potential Role for Future Individual Induction Chemotherapy in Advanced Nasopharyngeal Carcinoma. Clin. Cancer Res. 2019; 25(14): 4271–4279. PubMed Abstract | Publisher Full Text
17. Gillies RJ, Kinahan PE, Hricak H: Radiomics: Images Are More than Pictures, They Are Data. Radiology. 2016; 278(2): 563–577. PubMed Abstract | Publisher Full Text | Free Full Text
18. Thrall JH, Li X, Li Q, et al.: Artificial Intelligence and Machine Learning in Radiology: Opportunities, Challenges, Pitfalls, and Criteria for Success. J. Am. Coll. Radiol. 2018; 15(3 Pt B): 504–508. PubMed Abstract | Publisher Full Text
19. Chartrand G, Cheng PM, Vorontsov E, et al.: Deep Learning: A Primer for Radiologists. Radiographics. 2017; 37(7): 2113–2131. PubMed Abstract | Publisher Full Text
20. Erickson BJ, Korfiatis P, Akkus Z, et al.: Machine Learning for Medical Imaging. Radiographics. 2017; 37(2): 505–515. PubMed Abstract | Publisher Full Text | Free Full Text
21. Arimura H, Soufi M, Kamezawa H, et al.: Radiomics with artificial intelligence for precision medicine in radiation therapy. J. Radiat. Res. 2019; 60(1): 150–157. PubMed Abstract | Publisher Full Text | Free Full Text
22. Kononenko I: Machine learning for medical diagnosis: history, state of the art and perspective. Artif. Intell. Med. 2001; 23(1): 89–109. PubMed Abstract | Publisher Full Text
23. Auffermann WF, Gozansky EK, Tridandapani S: Artificial Intelligence in Cardiothoracic Radiology. AJR Am. J. Roentgenol. 2019; 212(5): 997–1001. Publisher Full Text
24. Harmon SA, Tuncer S, Sanford T, et al.: Artificial intelligence at the intersection of pathology and radiology in prostate cancer. Diagn. Interv. Radiol. 2019; 25: 183–188. PubMed Abstract | Publisher Full Text | Free Full Text
25. Le EPV, Wang Y, Huang Y, et al.: Artificial intelligence in breast imaging. Clin. Radiol. 2019; 74: 357–366. Publisher Full Text
26. Bi WL, Hosny A, Schabath MB, et al.: Artificial intelligence in cancer imaging: Clinical challenges and applications. CA Cancer J. Clin. 2019; 69: 127–157. PubMed Abstract | Publisher Full Text
27. Dionne CE, Dunn KM, Croft PR, et al.: A consensus approach toward the standardization of back pain definitions for use in prevalence studies. Spine. 2008; 33(1): 95–103. PubMed Abstract | Publisher Full Text
28. Python Software Foundation: Python Language Reference, version 3.9.7.2021. Reference Source
29. Harris CR, Millman KJ, van der Walt SJ , et al.: Array programming with NumPy. Nature. 2020; 585: 357–362. PubMed Abstract | Publisher Full Text | Free Full Text
30. Pedregosa F, Varoquaux G, Gramfort A, et al.: Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011; 12: 2825–2830.
31. McKinney W: Data Structures for Statistical Computing in Python.van der Walt S , Millman J, editors. Proceedings of the 9th Python in Science Conference. 2010; pp. 56–61.
32. Waskom ML: Seaborn: Statistical data visualization. Journal of Open Source Software. 2021; 6(60): 3021. Publisher Full Text
33. Hunter JD: Matplotlib: A 2D Graphics Environment. Computing in Science & Engineering. 2007; 9(3): 90–95. Publisher Full Text
34. Brinjikji W, Luetmer PH, Comstock B, et al.: Systematic literature review of imaging features of spinal degeneration in asymptomatic populations. AJNR Am. J. Neuroradiol. 2015; 36(4): 811–816. PubMed Abstract | Publisher Full Text | Free Full Text
35. Maher C, Underwood M, Buchbinder R: Non-specific low back pain. Lancet. 2017; 389(10070): 736–747. Publisher Full Text
36. Chou R, Fu R, Carrino JA, et al.: Imaging strategies for low-back pain: systematic review and meta-analysis. Lancet. 2009; 373(9662): 463–472. PubMed Abstract | Publisher Full Text
37. Abdollah V, Parent EC, Dolatabadi S, et al.: Texture analysis in the classification of T₂ -weighted magnetic resonance images in persons with and without low back pain. J. Orthop. Res. 2021; 39(10): 2187–2196. PubMed Abstract | Publisher Full Text
38. Ketola JHJ, Inkinen SI, Karppinen J, et al.: T₂ -weighted magnetic resonance imaging texture as predictor of low back pain: A texture analysis-based classification pipeline to symptomatic and asymptomatic cases. J. Orthop. Res. 2021; 39(11): 2428–2438. PubMed Abstract | Publisher Full Text
39. Aggarwal N: Prediction of low back pain using artificial intelligence modeling. J. Med. Artif. Intell. 2021; 4: 2. Publisher Full Text
40. Won D, Lee HJ, Lee SJ, et al.: Spinal Stenosis Grading in Magnetic Resonance Imaging Using Deep Convolutional Neural Networks. Spine (Phila Pa 1976). 2020; 45(12): 804–812. PubMed Abstract | Publisher Full Text
41. Jamaludin A, Kadir T, Zisserman A: SpineNet: Automated classification and evidence visualization in spinal MRIs. Med. Image Anal. 2017; 41: 63–73. PubMed Abstract | Publisher Full Text
42. Jamaludin A, Lootus M, Kadir T, et al.: ISSLS PRIZE in bioengineering science 2017: Automation of reading of radiological features from magnetic resonance images (MRIs) of the lumbar spine without human intervention is comparable with an expert radiologist. Eur. Spine J. 2017; 26: 1374–1383. PubMed Abstract | Publisher Full Text
43. Pendem S: F1000 ML and DL Data. Dataset. figshare. 2024. Publisher Full Text

Comments on this article Comments (0)

Version 2

VERSION 2 PUBLISHED 10 Sep 2024

Author details Author details

Ali Muhaimil
Roles: Data Curation, Formal Analysis, Methodology, Writing – Original Draft Preparation

Niranjana Sampathilla
Roles: Conceptualization, Formal Analysis, Methodology, Software, Supervision, Validation, Visualization, Writing – Review & Editing

Priya P S
Roles: Conceptualization, Supervision, Validation, Writing – Review & Editing

Kaushik Nayak
Roles: Conceptualization, Data Curation, Formal Analysis, Investigation, Methodology, Resources, Software, Supervision, Validation, Visualization, Writing – Review & Editing

Krishnaraj Chadaga
Roles: Formal Analysis, Software, Supervision, Validation

Anushree Goswami
Roles: Formal Analysis, Validation

Obhuli Chandran M
Roles: Methodology, Writing – Review & Editing

Abhijith S
Roles: Methodology, Writing – Review & Editing

Competing interests

No competing interests were disclosed.

Grant information

The author(s) declared that no grants were involved in supporting this work.

Article Versions (2)

version 2

Revised

Published: 10 Oct 2024, 13:1035

https://doi.org/10.12688/f1000research.154680.2

version 1

Published: 10 Sep 2024, 13:1035

https://doi.org/10.12688/f1000research.154680.1

© 2024 Muhaimil A et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

SEE MORE DETAILS

CITE

how to cite this article

Muhaimil A, Pendem S, Sampathilla N et al. Role of Artificial intelligence model in prediction of low back pain using T2 weighted MRI of Lumbar spine [version 2; peer review: 2 approved, 1 approved with reservations, 1 not approved]. F1000Research 2024, 13:1035 (https://doi.org/10.12688/f1000research.154680.2)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?

Key to Reviewer Statuses VIEW HIDE

ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested

Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.

Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions

Version 2

VERSION 2

PUBLISHED 10 Oct 2024

Revised

Views

Reviewer Report 18 Nov 2024

Eugene Ozhinsky, University of California San Francisco, San Francisco, California, USA

Not Approved

https://doi.org/10.5256/f1000research.173049.r335609

The manuscript describes a study using machine learning and deep learning techniques to predic whether the patients had lower-back pain. The study addresses an important issue of diagnosing lower-back pain. Here is a list of questions the authors could use to improve the manuscript:

Introduction:

Why is it useful to predict LBP from MR images? How can it help the patients?
Please cite previous work using ML and DL to study LBP. How is your study different?

Methods:

I don't think this study is a prospective case-control study. The data was analyzed after it was collected and the patients were not followed up on.

How were the patients recruited? What conditions did the controls have?

Which software was used to extract radiomic features?

Please clarify if there was a separate test set not used for finetuning.
The manuscript states that "The input data were split into five equal parts: four groups for training and five for testing...". It is not clear how 4 and 5 groups were generated from 5 equal parts.

How many MRI slices were used from each subject? How were the slices chosen? How were the features from these slices combined? How did DL models work with the slices if there were more than one slice?

Discussion:

How did your study improve upon previous approaches in literature?

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

No
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Radiology, Magnetic Resonance Imaging, Machine Learning, Musculoskeletal Imaging, Focused Ultrasound

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

CITE

Report a concern

Respond or Comment

Views

Reviewer Report 30 Oct 2024

Shashi Kumar Shetty, K S Hegde Medical Academy, Mangalore,, Karnataka, India

Approved

https://doi.org/10.5256/f1000research.173049.r335604

Major comments:

The study effectively explores the role of machine learning (ML) and deep learning (DL) models in predicting low back pain using T2 weighted MRI images. This novel method has substantial potential for improving non-invasive

Major comments:

The study effectively explores the role of machine learning (ML) and deep learning (DL) models in predicting low back pain using T2 weighted MRI images. This novel method has substantial potential for improving non-invasive diagnostic accuracy in LBP.
The use of mutual information for radiomic feature selection and reduction at each lumbar vertebral body and disc space has been clearly highlighted in the study which would serve as reference for future studies.
The choice of Machine learning (ML) models such as random forest and ad boost, allows study to benefit from models known for their robustness and interpretability, which is especially important in medical diagnostics.
The inclusion of convolution neural networks (CNN) such as GoogleNet and ResNet (transfer learning) methods reflects advanced approach, leveraging the strength of these models to capture medical images.
The use of balanced datasets in each class (symptomatic and asymptomatic cases) in binary classification tasks is clearly understandable from the manuscript.

Minor comments:

For quick understanding of readers, raw MRI images of lumbar spine were used for deep learning and radiomics were utilized for deep learning may be mentioned in discussion.

The article is well-structured, informative, and presents a promising approach to making use of AI for non-invasive diagnosis, and improved patient outcomes in musculoskeletal health.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Radiology, artificial intelligence, radiation protection

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

CITE

Report a concern

Respond or Comment

Views

Reviewer Report 30 Oct 2024

Priyanka Chandrasekhar, Sri Ramachandra Institute of Higher Education and Research (Deemed to be University), Chennai, Tamil Nadu, India; Department of Allied Health Sciences, The Apollo University, Chittoor, Andhra Pradesh, India

Approved

https://doi.org/10.5256/f1000research.173049.r335830

Major comments:

The study represents a progressive approach to diagnose low back pain (LBP), utilizing radiomics based machine learning (ML) and T2 weighted MRI image based deep learning (DL) models. In musculoskeletal imaging, precise pain source

Major comments:

The study represents a progressive approach to diagnose low back pain (LBP), utilizing radiomics based machine learning (ML) and T2 weighted MRI image based deep learning (DL) models. In musculoskeletal imaging, precise pain source identification is crucial but frequently difficult with conventional MRI imaging, in such cases, Artificial intelligence plays a major role for non-invasive diagnosis.
The use of Delphi definitions of low back pain (DOLBaPP) questionnaire is excellent. It provides a standardized way to assess LBP prevalence, helping ensure consistency in identifying symptomatic and asymptomatic patients. The 12-month criterion for identifying cases is well chosen, as it allows for identification of more chronic presentations of LBP. Similarly, the clear mentioning of controls avoids ambiguity, ensuring clear differentiation between cases and controls.
Balanced datasets with equal representation of symptomatic and asymptomatic groups were significant strength, as it reduces class balance issues that often skew artificial intelligence models performance in medical studies.
The article explained the rationale for the selection of various ML and DL models, and the strengths of each algorithm in the context of LBP prediction.

Minor Comments:

Typographical errors like F1 score formula can be corrected.

The study contributes to the growing field of AI in medical imaging and health care, emphasizing the importance of integrating AI tools into clinical workflows for better management of LBP.

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Magnetic Resonance Imaging, Artificial intelligence in health care

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

CITE

Report a concern

Respond or Comment

Views

Reviewer Report 12 Oct 2024

Tarun Gangil, The institute of cancer research, London, UK

Approved with Reservations

https://doi.org/10.5256/f1000research.173049.r330660

Question 1: Authors should highlight , why they have chosen a wide range of machine learning algorithms, clustering algorithm (KNN) and deep learning algorithms such as ResNet and GoogleNet ?
No concerns anymore.

Question 2: Do the analysis were performed only on the radiomics dataset ? If that is the case then why do they have used segmentation algorithms such as ResNet and GoogleNet ?
If I understand the author's response correctly, the MRI images were used directly, and the analysis performed was a classification task, not a segmentation task. Also, could it be mentioned explicitly, if not otherwise stated in the manuscript, that if it is a binary classification task, then how many samples for each class have the authors used for the analysis?

Question 2: Do the analysis were performed only on the radiomics dataset ? If that is the case then why do they have used segmentation algorithms such as ResNet and GoogleNet ?
No concerns anymore

Question 3: If MRI images are used in analysis, without radiomics dataset, then what is the average sensitivity of the ground truth segmentation mask across all samples?
No concerns anymore

Question 4: Do authors have used Images as an input in the analysis, apart from the radiomics dataset, directly in the AI models.
No concerns anymore

Question 5: Also, highlight about class variability, given it is a binary classification problem or Multiclass?
Please mention about the number of samples from each class and also I believe the formula of F1 score needs correction. If it is convenient then use the equations settings from word to write any equations.

Question 6: Also, post analysis , it is needed for the authors to highlight the clinical significance of the results obtained from the classification algorithms.
No concerns.

Question 7: Can authors highlight the features which are having high importance towards the classification? I can suggest the use of SHAP analysis will be appropriate in this case, and further, the variables highlighted as important by the best-performing ML model can be compared with the clinical literature to conclude the findings of this research.

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Artificial Intelligence in Oncology, Image Processing , Deep Learning and Machine Learning

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Respond or Comment

Version 1

VERSION 1

PUBLISHED 10 Sep 2024

Views

Reviewer Report 01 Oct 2024

Tarun Gangil, The institute of cancer research, London, UK

Approved with Reservations

https://doi.org/10.5256/f1000research.169735.r325460

Authors should highlight , why they have chosen a wide range of machine learning algorithms, clustering algorithm (KNN) and deep learning algorithms such as ResNet and GoogleNet ?

Do the analysis were performed only on the radiomics dataset ? If that is the case then why do they have used segmentation algorithms such as ResNet and GoogleNet ?

If MRI images are used in analysis, without radiomics dataset, then what is the average sensitivity of the ground truth segmentation mask across all samples?

Do authors have used Images as an input in the analysis , apart from the radiomics dataset, directly in the AI models.

Also, highlight about class variability, given it is a binary classification problem or Multiclass ?

Also, post analysis , it is needed for the authors to highlight the clinical significance of the results obtained from the classification algorithms.

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

No
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

No

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Artificial Intelligence in Oncology, Image Processing , Deep Learning and Machine Learning

CITE

Report a concern

Author Response 10 Oct 2024

Saikiran Pendem, Department of Medical Imaging Technology, Manipal College of Health Professions, Manipal Academy of Higher Education, Karnataka, 576104, India

10 Oct 2024

Author Response

Question 1: Authors should highlight , why they have chosen a wide range of machine learning algorithms, clustering algorithm (KNN) and deep learning algorithms such as ResNet and GoogleNet ?
... Continue reading Question 1: Authors should highlight , why they have chosen a wide range of machine learning algorithms, clustering algorithm (KNN) and deep learning algorithms such as ResNet and GoogleNet ?

Ans: The advantages for using wide range of machine learning algorithms, cluster algorithm (KNN) and deep learning algorithms such as ResNet and GoogleNet were provided in the methodology.

We have utilized a wide range of ML classifiers since different classifiers may perform better with data features and this allows in through benchmarking and the selection of an optimal model for a specific problem. Each ML method has its own advantages, random forest excels in robust and accuracy, decision tree offers interpretability, logistic regression is effective for linear relationships, KNN is good for smaller dataset, adaboost improves performance by merging weak learners.
GoogleNet and ResNet were chosen due to their powerful ability to learn complex patterns in data, especially in image analysis, medical diagnostics and classification problems. Both are exceptional at automatically learning deep features particularly involving images, where they can capture complex and minute details. GoogleNet inception modules process input using parallel convolution layers with varying kernel sizes, ehancing efficiency by capturing features at different scales with fewer parameters. ResNet solves the vanishing gradient issue, making it possible to train very deep networks efficiently. This permit learning more complicated representations improves performance and tasks like image classification and object recognition.

Question 2: Do the analysis were performed only on the radiomics dataset ? If that is the case then why do they have used segmentation algorithms such as ResNet and GoogleNet ?

Ans: The radiomics features were extracted for the purpose of using the ML classifiers. As for the Deep Learning models, there were no radiomics features. These models are black boxes which extract the features from the images without human intervention and classify them.

Question 3: If MRI images are used in analysis, without radiomics dataset, then what is the average sensitivity of the ground truth segmentation mask across all samples?

Ans: In case of Deep Learning method, the MRI images are used for the analysis (classification) without radiomic features. Hence, no ground truth masks. The classification model gives sensitivity between the range of 75-85%.

Question 4: Do authors have used Images as an input in the analysis, apart from the radiomics dataset, directly in the AI models.

Ans: Yes, for the deep learning models, direct images for the classification task.

Question 5: Also, highlight about class variability, given it is a binary classification problem or Multiclass?

Ans: It is a binary classification problem which helps in predicting the LBP. Class variability impacts evaluation metrics such as sensitivity, specificity, precision, and F1-score. The more variability there is within and between classes, the more robust the model needs to be.
The same is included in the methodology section.

Question 6: Also, post analysis , it is needed for the authors to highlight the clinical significance of the results obtained from the classification algorithms.

Ans: The clinical significance of the results obtained from the classification algorithms were included in the discussion section.
According to our study, ML and DL models could provide more efficient, reliable, noninvasive diagnostic insights by accurately identifying abnormalities in the lumbar vertebrae and intervertebral discs (IVDs), even in cases where conventional MRI image assessments were inconclusive. By improving the ability to predict LBP, ML and DL algorithms could guide better clinical decision making, reduce unnecessary surgical interventions.
Question 1: Authors should highlight , why they have chosen a wide range of machine learning algorithms, clustering algorithm (KNN) and deep learning algorithms such as ResNet and GoogleNet ?

Ans: The advantages for using wide range of machine learning algorithms, cluster algorithm (KNN) and deep learning algorithms such as ResNet and GoogleNet were provided in the methodology.

We have utilized a wide range of ML classifiers since different classifiers may perform better with data features and this allows in through benchmarking and the selection of an optimal model for a specific problem. Each ML method has its own advantages, random forest excels in robust and accuracy, decision tree offers interpretability, logistic regression is effective for linear relationships, KNN is good for smaller dataset, adaboost improves performance by merging weak learners.
GoogleNet and ResNet were chosen due to their powerful ability to learn complex patterns in data, especially in image analysis, medical diagnostics and classification problems. Both are exceptional at automatically learning deep features particularly involving images, where they can capture complex and minute details. GoogleNet inception modules process input using parallel convolution layers with varying kernel sizes, ehancing efficiency by capturing features at different scales with fewer parameters. ResNet solves the vanishing gradient issue, making it possible to train very deep networks efficiently. This permit learning more complicated representations improves performance and tasks like image classification and object recognition.

Question 2: Do the analysis were performed only on the radiomics dataset ? If that is the case then why do they have used segmentation algorithms such as ResNet and GoogleNet ?

Ans: The radiomics features were extracted for the purpose of using the ML classifiers. As for the Deep Learning models, there were no radiomics features. These models are black boxes which extract the features from the images without human intervention and classify them.

Question 3: If MRI images are used in analysis, without radiomics dataset, then what is the average sensitivity of the ground truth segmentation mask across all samples?

Ans: In case of Deep Learning method, the MRI images are used for the analysis (classification) without radiomic features. Hence, no ground truth masks. The classification model gives sensitivity between the range of 75-85%.

Question 4: Do authors have used Images as an input in the analysis, apart from the radiomics dataset, directly in the AI models.

Ans: Yes, for the deep learning models, direct images for the classification task.

Question 5: Also, highlight about class variability, given it is a binary classification problem or Multiclass?

Ans: It is a binary classification problem which helps in predicting the LBP. Class variability impacts evaluation metrics such as sensitivity, specificity, precision, and F1-score. The more variability there is within and between classes, the more robust the model needs to be.
The same is included in the methodology section.

Question 6: Also, post analysis , it is needed for the authors to highlight the clinical significance of the results obtained from the classification algorithms.

Ans: The clinical significance of the results obtained from the classification algorithms were included in the discussion section.
According to our study, ML and DL models could provide more efficient, reliable, noninvasive diagnostic insights by accurately identifying abnormalities in the lumbar vertebrae and intervertebral discs (IVDs), even in cases where conventional MRI image assessments were inconclusive. By improving the ability to predict LBP, ML and DL algorithms could guide better clinical decision making, reduce unnecessary surgical interventions.
Competing Interests: No competing interests were disclosed. Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response 10 Oct 2024

Saikiran Pendem, Department of Medical Imaging Technology, Manipal College of Health Professions, Manipal Academy of Higher Education, Karnataka, 576104, India

10 Oct 2024

Author Response

Question 1: Authors should highlight , why they have chosen a wide range of machine learning algorithms, clustering algorithm (KNN) and deep learning algorithms such as ResNet and GoogleNet ?
... Continue reading Question 1: Authors should highlight , why they have chosen a wide range of machine learning algorithms, clustering algorithm (KNN) and deep learning algorithms such as ResNet and GoogleNet ?

Ans: The advantages for using wide range of machine learning algorithms, cluster algorithm (KNN) and deep learning algorithms such as ResNet and GoogleNet were provided in the methodology.

We have utilized a wide range of ML classifiers since different classifiers may perform better with data features and this allows in through benchmarking and the selection of an optimal model for a specific problem. Each ML method has its own advantages, random forest excels in robust and accuracy, decision tree offers interpretability, logistic regression is effective for linear relationships, KNN is good for smaller dataset, adaboost improves performance by merging weak learners.
GoogleNet and ResNet were chosen due to their powerful ability to learn complex patterns in data, especially in image analysis, medical diagnostics and classification problems. Both are exceptional at automatically learning deep features particularly involving images, where they can capture complex and minute details. GoogleNet inception modules process input using parallel convolution layers with varying kernel sizes, ehancing efficiency by capturing features at different scales with fewer parameters. ResNet solves the vanishing gradient issue, making it possible to train very deep networks efficiently. This permit learning more complicated representations improves performance and tasks like image classification and object recognition.

Question 2: Do the analysis were performed only on the radiomics dataset ? If that is the case then why do they have used segmentation algorithms such as ResNet and GoogleNet ?

Ans: The radiomics features were extracted for the purpose of using the ML classifiers. As for the Deep Learning models, there were no radiomics features. These models are black boxes which extract the features from the images without human intervention and classify them.

Question 3: If MRI images are used in analysis, without radiomics dataset, then what is the average sensitivity of the ground truth segmentation mask across all samples?

Ans: In case of Deep Learning method, the MRI images are used for the analysis (classification) without radiomic features. Hence, no ground truth masks. The classification model gives sensitivity between the range of 75-85%.

Question 4: Do authors have used Images as an input in the analysis, apart from the radiomics dataset, directly in the AI models.

Ans: Yes, for the deep learning models, direct images for the classification task.

Question 5: Also, highlight about class variability, given it is a binary classification problem or Multiclass?

Ans: It is a binary classification problem which helps in predicting the LBP. Class variability impacts evaluation metrics such as sensitivity, specificity, precision, and F1-score. The more variability there is within and between classes, the more robust the model needs to be.
The same is included in the methodology section.

Question 6: Also, post analysis , it is needed for the authors to highlight the clinical significance of the results obtained from the classification algorithms.

Ans: The clinical significance of the results obtained from the classification algorithms were included in the discussion section.
According to our study, ML and DL models could provide more efficient, reliable, noninvasive diagnostic insights by accurately identifying abnormalities in the lumbar vertebrae and intervertebral discs (IVDs), even in cases where conventional MRI image assessments were inconclusive. By improving the ability to predict LBP, ML and DL algorithms could guide better clinical decision making, reduce unnecessary surgical interventions.
Question 1: Authors should highlight , why they have chosen a wide range of machine learning algorithms, clustering algorithm (KNN) and deep learning algorithms such as ResNet and GoogleNet ?

Ans: The advantages for using wide range of machine learning algorithms, cluster algorithm (KNN) and deep learning algorithms such as ResNet and GoogleNet were provided in the methodology.

We have utilized a wide range of ML classifiers since different classifiers may perform better with data features and this allows in through benchmarking and the selection of an optimal model for a specific problem. Each ML method has its own advantages, random forest excels in robust and accuracy, decision tree offers interpretability, logistic regression is effective for linear relationships, KNN is good for smaller dataset, adaboost improves performance by merging weak learners.
GoogleNet and ResNet were chosen due to their powerful ability to learn complex patterns in data, especially in image analysis, medical diagnostics and classification problems. Both are exceptional at automatically learning deep features particularly involving images, where they can capture complex and minute details. GoogleNet inception modules process input using parallel convolution layers with varying kernel sizes, ehancing efficiency by capturing features at different scales with fewer parameters. ResNet solves the vanishing gradient issue, making it possible to train very deep networks efficiently. This permit learning more complicated representations improves performance and tasks like image classification and object recognition.

Question 2: Do the analysis were performed only on the radiomics dataset ? If that is the case then why do they have used segmentation algorithms such as ResNet and GoogleNet ?

Ans: The radiomics features were extracted for the purpose of using the ML classifiers. As for the Deep Learning models, there were no radiomics features. These models are black boxes which extract the features from the images without human intervention and classify them.

Question 3: If MRI images are used in analysis, without radiomics dataset, then what is the average sensitivity of the ground truth segmentation mask across all samples?

Ans: In case of Deep Learning method, the MRI images are used for the analysis (classification) without radiomic features. Hence, no ground truth masks. The classification model gives sensitivity between the range of 75-85%.

Question 4: Do authors have used Images as an input in the analysis, apart from the radiomics dataset, directly in the AI models.

Ans: Yes, for the deep learning models, direct images for the classification task.

Question 5: Also, highlight about class variability, given it is a binary classification problem or Multiclass?

Ans: It is a binary classification problem which helps in predicting the LBP. Class variability impacts evaluation metrics such as sensitivity, specificity, precision, and F1-score. The more variability there is within and between classes, the more robust the model needs to be.
The same is included in the methodology section.

Question 6: Also, post analysis , it is needed for the authors to highlight the clinical significance of the results obtained from the classification algorithms.

Ans: The clinical significance of the results obtained from the classification algorithms were included in the discussion section.
According to our study, ML and DL models could provide more efficient, reliable, noninvasive diagnostic insights by accurately identifying abnormalities in the lumbar vertebrae and intervertebral discs (IVDs), even in cases where conventional MRI image assessments were inconclusive. By improving the ability to predict LBP, ML and DL algorithms could guide better clinical decision making, reduce unnecessary surgical interventions.
Competing Interests: No competing interests were disclosed. Close
Report a concern

Views

Reviewer Report 24 Sep 2024

Approved

https://doi.org/10.5256/f1000research.169735.r325453

Major comments:
The research article presents a case-control study to assess machine and deep learning models for prediction of low back pain using T2-weighted MRI images of lumbar spine.
The utility of mutual information for radiomic feature selection is a good approach; however, the reasons for selecting this over frequently used feature selection method such as least absolute shrinkage and selection operator (LASSO) is not adequately justified.
Though the study employed five-fold cross validation, it would benefit from discussing additional methods such as bootstrapping or using different validation splits.
The study briefly mentions hyperparameter tuning, but it did not discuss in detail whether any steps were taken to reduce overfitting.
Future research directions in how AI model predictions should be clinically interpreted and implications of incorporating these AI models into the diagnostic workflow can be provided.
Minor comments:
Inconsistent capitalization to be corrected – the words “random forest” and “adaboost” are capitalized inconsistently.
The article effectively demonstrates the use of machine and deep learning models in accurately predicting low back pain and provides promising contributions to knowledge about the use of AI in healthcare.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

I cannot comment. A qualified statistician is required.
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Magnetic Resonance Imaging, Artificial intelligence in health care

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

CITE

Report a concern

Respond or Comment

Comments on this article Comments (0)

Version 2

VERSION 2 PUBLISHED 10 Sep 2024

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2	3	4
Version 2 (revision) 10 Oct 24	read	read	read	read
Version 1 10 Sep 24	read	read

Priyanka Chandrasekhar, Sri Ramachandra Institute of Higher Education and Research (Deemed to be University), Chennai, India; The Apollo University, Chittoor, India
Tarun Gangil, The institute of cancer research, London, UK
Shashi Kumar Shetty, K S Hegde Medical Academy, Mangalore,, India
Eugene Ozhinsky, University of California San Francisco, San Francisco, USA

Comments on this article

All Comments(0)

Add a comment

Browse by related subjects

Back to all reports

Reviewer Report

19 Views

18 Nov 2024 | for Version 2

Eugene Ozhinsky, University of California San Francisco, San Francisco, California, USA

19 Views Cite this report Responses(0)

Not Approved

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

No
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Radiology, Magnetic Resonance Imaging, Machine Learning, Musculoskeletal Imaging, Focused Ultrasound

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

5 Views

30 Oct 2024 | for Version 2

Shashi Kumar Shetty, K S Hegde Medical Academy, Mangalore,, Karnataka, India

5 Views Cite this report Responses(0)

Approved

Major comments:

The study effectively explores the role of machine learning (ML) and deep learning (DL) models in predicting low back pain using T2 weighted MRI images. This novel method has substantial potential for improving non-invasive diagnostic accuracy in LBP.
The use of mutual information for radiomic feature selection and reduction at each lumbar vertebral body and disc space has been clearly highlighted in the study which would serve as reference for future studies.
The choice of Machine learning (ML) models such as random forest and ad boost, allows study to benefit from models known for their robustness and interpretability, which is especially important in medical diagnostics.
The inclusion of convolution neural networks (CNN) such as GoogleNet and ResNet (transfer learning) methods reflects advanced approach, leveraging the strength of these models to capture medical images.
The use of balanced datasets in each class (symptomatic and asymptomatic cases) in binary classification tasks is clearly understandable from the manuscript.

Minor comments:

For quick understanding of readers, raw MRI images of lumbar spine were used for deep learning and radiomics were utilized for deep learning may be mentioned in discussion.

The article is well-structured, informative, and presents a promising approach to making use of AI for non-invasive diagnosis, and improved patient outcomes in musculoskeletal health.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Radiology, artificial intelligence, radiation protection

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

8 Views

30 Oct 2024 | for Version 2

8 Views Cite this report Responses(0)

Approved

Major comments:

The study represents a progressive approach to diagnose low back pain (LBP), utilizing radiomics based machine learning (ML) and T2 weighted MRI image based deep learning (DL) models. In musculoskeletal imaging, precise pain source identification is crucial but frequently difficult with conventional MRI imaging, in such cases, Artificial intelligence plays a major role for non-invasive diagnosis.
The use of Delphi definitions of low back pain (DOLBaPP) questionnaire is excellent. It provides a standardized way to assess LBP prevalence, helping ensure consistency in identifying symptomatic and asymptomatic patients. The 12-month criterion for identifying cases is well chosen, as it allows for identification of more chronic presentations of LBP. Similarly, the clear mentioning of controls avoids ambiguity, ensuring clear differentiation between cases and controls.
Balanced datasets with equal representation of symptomatic and asymptomatic groups were significant strength, as it reduces class balance issues that often skew artificial intelligence models performance in medical studies.
The article explained the rationale for the selection of various ML and DL models, and the strengths of each algorithm in the context of LBP prediction.

Minor Comments:

Typographical errors like F1 score formula can be corrected.

The study contributes to the growing field of AI in medical imaging and health care, emphasizing the importance of integrating AI tools into clinical workflows for better management of LBP.

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Magnetic Resonance Imaging, Artificial intelligence in health care

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

24 Views

12 Oct 2024 | for Version 2

Tarun Gangil, The institute of cancer research, London, UK

24 Views Cite this report Responses(0)

Approved With Reservations

Question 1: Authors should highlight , why they have chosen a wide range of machine learning algorithms, clustering algorithm (KNN) and deep learning algorithms such as ResNet and GoogleNet ?
No concerns anymore.

Question 2: Do the analysis were performed only on the radiomics dataset ? If that is the case then why do they have used segmentation algorithms such as ResNet and GoogleNet ?
If I understand the author's response correctly, the MRI images were used directly, and the analysis performed was a classification task, not a segmentation task. Also, could it be mentioned explicitly, if not otherwise stated in the manuscript, that if it is a binary classification task, then how many samples for each class have the authors used for the analysis?

Question 2: Do the analysis were performed only on the radiomics dataset ? If that is the case then why do they have used segmentation algorithms such as ResNet and GoogleNet ?
No concerns anymore

Question 3: If MRI images are used in analysis, without radiomics dataset, then what is the average sensitivity of the ground truth segmentation mask across all samples?
No concerns anymore

Question 4: Do authors have used Images as an input in the analysis, apart from the radiomics dataset, directly in the AI models.
No concerns anymore

Question 5: Also, highlight about class variability, given it is a binary classification problem or Multiclass?
Please mention about the number of samples from each class and also I believe the formula of F1 score needs correction. If it is convenient then use the equations settings from word to write any equations.

Question 6: Also, post analysis , it is needed for the authors to highlight the clinical significance of the results obtained from the classification algorithms.
No concerns.

Question 7: Can authors highlight the features which are having high importance towards the classification? I can suggest the use of SHAP analysis will be appropriate in this case, and further, the variables highlighted as important by the best-performing ML model can be compared with the clinical literature to conclude the findings of this research.

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Artificial Intelligence in Oncology, Image Processing , Deep Learning and Machine Learning

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

27 Views

01 Oct 2024 | for Version 1

Tarun Gangil, The institute of cancer research, London, UK

27 Views Cite this report Responses(1)

Approved With Reservations

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

No
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

No

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Artificial Intelligence in Oncology, Image Processing , Deep Learning and Machine Learning

Respond to this report

Responses (1)

Author Response

10 Oct 2024

Saikiran Pendem, Department of Medical Imaging Technology, Manipal College of Health Professions, Manipal Academy of Higher Education, Karnataka, 576104, India

Question 1: Authors should highlight , why they have chosen a wide range of machine learning algorithms, clustering algorithm (KNN) and deep learning algorithms such as ResNet and GoogleNet ?

Ans: The advantages for using wide range of machine learning algorithms, cluster algorithm (KNN) and deep learning algorithms such as ResNet and GoogleNet were provided in the methodology.

We have utilized a wide range of ML classifiers since different classifiers may perform better with data features and this allows in through benchmarking and the selection of an optimal model for a specific problem. Each ML method has its own advantages, random forest excels in robust and accuracy, decision tree offers interpretability, logistic regression is effective for linear relationships, KNN is good for smaller dataset, adaboost improves performance by merging weak learners.
GoogleNet and ResNet were chosen due to their powerful ability to learn complex patterns in data, especially in image analysis, medical diagnostics and classification problems. Both are exceptional at automatically learning deep features particularly involving images, where they can capture complex and minute details. GoogleNet inception modules process input using parallel convolution layers with varying kernel sizes, ehancing efficiency by capturing features at different scales with fewer parameters. ResNet solves the vanishing gradient issue, making it possible to train very deep networks efficiently. This permit learning more complicated representations improves performance and tasks like image classification and object recognition.

Question 2: Do the analysis were performed only on the radiomics dataset ? If that is the case then why do they have used segmentation algorithms such as ResNet and GoogleNet ?

Ans: The radiomics features were extracted for the purpose of using the ML classifiers. As for the Deep Learning models, there were no radiomics features. These models are black boxes which extract the features from the images without human intervention and classify them.

Question 3: If MRI images are used in analysis, without radiomics dataset, then what is the average sensitivity of the ground truth segmentation mask across all samples?

Ans: In case of Deep Learning method, the MRI images are used for the analysis (classification) without radiomic features. Hence, no ground truth masks. The classification model gives sensitivity between the range of 75-85%.

Question 4: Do authors have used Images as an input in the analysis, apart from the radiomics dataset, directly in the AI models.

Ans: Yes, for the deep learning models, direct images for the classification task.

Question 5: Also, highlight about class variability, given it is a binary classification problem or Multiclass?

Ans: It is a binary classification problem which helps in predicting the LBP. Class variability impacts evaluation metrics such as sensitivity, specificity, precision, and F1-score. The more variability there is within and between classes, the more robust the model needs to be.
The same is included in the methodology section.

Question 6: Also, post analysis , it is needed for the authors to highlight the clinical significance of the results obtained from the classification algorithms.

Ans: The clinical significance of the results obtained from the classification algorithms were included in the discussion section.
According to our study, ML and DL models could provide more efficient, reliable, noninvasive diagnostic insights by accurately identifying abnormalities in the lumbar vertebrae and intervertebral discs (IVDs), even in cases where conventional MRI image assessments were inconclusive. By improving the ability to predict LBP, ML and DL algorithms could guide better clinical decision making, reduce unnecessary surgical interventions.

View more View less

Competing Interests

No competing interests were disclosed.

Back to all reports

Reviewer Report

19 Views

24 Sep 2024 | for Version 1

19 Views Cite this report Responses(0)

Approved

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

I cannot comment. A qualified statistician is required.
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Magnetic Resonance Imaging, Artificial intelligence in health care

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

Alongside their report, reviewers assign a status to the article:

Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested

Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.

Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions

[1] 1. GBD 2021 Low Back Pain Collaborators: Global, regional, and national burden of low back pain, 1990-2020, its attributable risk factors, and projections to 2050: a systematic analysis of the Global Burden of Disease Study 2021. Lancet Rheumatol. 2023; 5(6): e316–e329.

[2] 2. Gu Y, Wang Z, Shi H, et al.: Global, Regional, and National Change Patterns in the Incidence of Low Back Pain From 1990 to 2019 and Its Predicted Level in the Next Decade. Int. J. Public Health. 2024 Feb; 69(69): 1606299. Publisher Full Text

[3] 3. Hartvigsen J, Hancock MJ, Kongsted A, et al.: What low back pain is and why we need to pay attention. Lancet. 2018; 391(10137): 2356–2367. PubMed Abstract | Publisher Full Text

[4] 4. Chou R, Shekelle P: Will this patient develop persistent disabling low back pain? JAMA. 2010; 303(13): 1295–1302. Publisher Full Text

[5] 5. Videman T, Battie MC, Gibbons LE, et al.: Associations between back pain history and lumbar MRI findings. Spine. 2003 Mar; 28(6): 582–588. Publisher Full Text

[6] 6. Videman T, Nurminen M: The occurrence of anular tears and their relation to lifetime back pain history: a cadaveric study using barium sulfate discography. Spine (Phila Pa 1976). 2004; 29(23): 2668–2676. PubMed Abstract | Publisher Full Text

[7] 7. Boswell MV, Singh V, Staats PS, et al.: Accuracy of precision diagnostic blocks in the diagnosis of chronic spinal pain of facet or zygapophysial joint origin. Pain Physician. 2003; 6(4): 449–456. PubMed Abstract | Publisher Full Text

[8] 8. Wang Y, Videman T, Battie MC: ISSLS prize winner: Lumbar vertebral endplate lesions: associations with disc degeneration and back pain history. Spine (Phila Pa 1976). 2012; 37(17): 1490–1496. PubMed Abstract | Publisher Full Text

[9] 9. Farshad-Amacker NA, Farshad M, Winklehner A, et al.: MR imaging of degenerative disc disease. Eur. J. Radiol. 2015; 84(9): 1768–1776. Publisher Full Text

[10] 10. Gundry CR, Fritts HM: Magnetic resonance imaging of the musculoskeletal system. Part 8. The spine, section 2. Clin. Orthop. Relat. Res. 1997; 343(343): 260–271. Publisher Full Text

[11] 11. Reddy MM, Gangavelli R, Priyanka P, et al.: Influence of Lumbar Spinal Canal Dimensions on Neurological Claudication Symptomatology- A Case Control Study. Biomed. Pharmacol. J. 2021; 14(2): 1019–1024. Publisher Full Text

[12] 12. Jarvik JJ, Hollingworth W, Heagerty P, et al.: The Longitudinal Assessment of Imaging and Disability of the Back (LAIDBack) Study: baseline data. Spine (Phila Pa 1976). 2001; 26(10): 1158–1166. PubMed Abstract | Publisher Full Text

[13] 13. Boden SD, Davis DO, Dina TS, et al.: Abnormal magnetic-resonance scans of the lumbar spine in asymptomatic subjects. A prospective investigation. J. Bone Joint Surg. Am. 1990; 72(3): 403–408. PubMed Abstract

[14] 14. Aerts HJ, Velazquez ER, Leijenaar RT, et al.: Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach. Nat. Commun. 2014; 5: 4644. Publisher Full Text

[15] 15. Hood L, Friend SH: Predictive, personalized, preventive, participatory (P4) cancer medicine. Nat. Rev. Clin. Oncol. 2011; 8(3): 184–187. PubMed Abstract | Publisher Full Text

[16] 16. Peng H, Dong D, Fang MJ, et al.: Prognostic Value of Deep Learning PET/CT-Based Radiomics: Potential Role for Future Individual Induction Chemotherapy in Advanced Nasopharyngeal Carcinoma. Clin. Cancer Res. 2019; 25(14): 4271–4279. PubMed Abstract | Publisher Full Text

[17] 17. Gillies RJ, Kinahan PE, Hricak H: Radiomics: Images Are More than Pictures, They Are Data. Radiology. 2016; 278(2): 563–577. PubMed Abstract | Publisher Full Text | Free Full Text

[18] 18. Thrall JH, Li X, Li Q, et al.: Artificial Intelligence and Machine Learning in Radiology: Opportunities, Challenges, Pitfalls, and Criteria for Success. J. Am. Coll. Radiol. 2018; 15(3 Pt B): 504–508. PubMed Abstract | Publisher Full Text

[19] 19. Chartrand G, Cheng PM, Vorontsov E, et al.: Deep Learning: A Primer for Radiologists. Radiographics. 2017; 37(7): 2113–2131. PubMed Abstract | Publisher Full Text

[20] 20. Erickson BJ, Korfiatis P, Akkus Z, et al.: Machine Learning for Medical Imaging. Radiographics. 2017; 37(2): 505–515. PubMed Abstract | Publisher Full Text | Free Full Text

[21] 21. Arimura H, Soufi M, Kamezawa H, et al.: Radiomics with artificial intelligence for precision medicine in radiation therapy. J. Radiat. Res. 2019; 60(1): 150–157. PubMed Abstract | Publisher Full Text | Free Full Text

[22] 22. Kononenko I: Machine learning for medical diagnosis: history, state of the art and perspective. Artif. Intell. Med. 2001; 23(1): 89–109. PubMed Abstract | Publisher Full Text

[23] 23. Auffermann WF, Gozansky EK, Tridandapani S: Artificial Intelligence in Cardiothoracic Radiology. AJR Am. J. Roentgenol. 2019; 212(5): 997–1001. Publisher Full Text

[24] 24. Harmon SA, Tuncer S, Sanford T, et al.: Artificial intelligence at the intersection of pathology and radiology in prostate cancer. Diagn. Interv. Radiol. 2019; 25: 183–188. PubMed Abstract | Publisher Full Text | Free Full Text

[25] 25. Le EPV, Wang Y, Huang Y, et al.: Artificial intelligence in breast imaging. Clin. Radiol. 2019; 74: 357–366. Publisher Full Text

[26] 26. Bi WL, Hosny A, Schabath MB, et al.: Artificial intelligence in cancer imaging: Clinical challenges and applications. CA Cancer J. Clin. 2019; 69: 127–157. PubMed Abstract | Publisher Full Text

[27] 27. Dionne CE, Dunn KM, Croft PR, et al.: A consensus approach toward the standardization of back pain definitions for use in prevalence studies. Spine. 2008; 33(1): 95–103. PubMed Abstract | Publisher Full Text

[28] 28. Python Software Foundation: Python Language Reference, version 3.9.7.2021. Reference Source

[29] 29. Harris CR, Millman KJ, van der Walt SJ , et al.: Array programming with NumPy. Nature. 2020; 585: 357–362. PubMed Abstract | Publisher Full Text | Free Full Text

[30] 30. Pedregosa F, Varoquaux G, Gramfort A, et al.: Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011; 12: 2825–2830.

[31] 31. McKinney W: Data Structures for Statistical Computing in Python.van der Walt S , Millman J, editors. Proceedings of the 9th Python in Science Conference. 2010; pp. 56–61.

[32] 32. Waskom ML: Seaborn: Statistical data visualization. Journal of Open Source Software. 2021; 6(60): 3021. Publisher Full Text

[33] 33. Hunter JD: Matplotlib: A 2D Graphics Environment. Computing in Science & Engineering. 2007; 9(3): 90–95. Publisher Full Text

[34] 34. Brinjikji W, Luetmer PH, Comstock B, et al.: Systematic literature review of imaging features of spinal degeneration in asymptomatic populations. AJNR Am. J. Neuroradiol. 2015; 36(4): 811–816. PubMed Abstract | Publisher Full Text | Free Full Text

[35] 35. Maher C, Underwood M, Buchbinder R: Non-specific low back pain. Lancet. 2017; 389(10070): 736–747. Publisher Full Text

[36] 36. Chou R, Fu R, Carrino JA, et al.: Imaging strategies for low-back pain: systematic review and meta-analysis. Lancet. 2009; 373(9662): 463–472. PubMed Abstract | Publisher Full Text

[37] 37. Abdollah V, Parent EC, Dolatabadi S, et al.: Texture analysis in the classification of T₂ -weighted magnetic resonance images in persons with and without low back pain. J. Orthop. Res. 2021; 39(10): 2187–2196. PubMed Abstract | Publisher Full Text

[38] 38. Ketola JHJ, Inkinen SI, Karppinen J, et al.: T₂ -weighted magnetic resonance imaging texture as predictor of low back pain: A texture analysis-based classification pipeline to symptomatic and asymptomatic cases. J. Orthop. Res. 2021; 39(11): 2428–2438. PubMed Abstract | Publisher Full Text

[39] 39. Aggarwal N: Prediction of low back pain using artificial intelligence modeling. J. Med. Artif. Intell. 2021; 4: 2. Publisher Full Text

[40] 40. Won D, Lee HJ, Lee SJ, et al.: Spinal Stenosis Grading in Magnetic Resonance Imaging Using Deep Convolutional Neural Networks. Spine (Phila Pa 1976). 2020; 45(12): 804–812. PubMed Abstract | Publisher Full Text

[41] 41. Jamaludin A, Kadir T, Zisserman A: SpineNet: Automated classification and evidence visualization in spinal MRIs. Med. Image Anal. 2017; 41: 63–73. PubMed Abstract | Publisher Full Text

[42] 42. Jamaludin A, Lootus M, Kadir T, et al.: ISSLS PRIZE in bioengineering science 2017: Automation of reading of radiological features from magnetic resonance images (MRIs) of the lumbar spine without human intervention is comparable with an expert radiologist. Eur. Spine J. 2017; 26: 1374–1383. PubMed Abstract | Publisher Full Text

[43] 43. Pendem S: F1000 ML and DL Data. Dataset. figshare. 2024. Publisher Full Text

Role of Artificial intelligence model in prediction of low back pain using T2 weighted MRI of Lumbar spine

Abstract

Background

Methods

Results

Conclusion

Keywords

Revised Amendments from Version 1

Introduction

Methods

Eligibility criteria

Table 1. Showing the demographic details of the population.

Table 2. showing the acquisition parameters of T2 weighted MRI Lumbar spine.

Segmentation and radiomic feature extraction

Figure 1. Showing the segmentation of lumbar vertebrae and intervertebral disc on T2 weighted image.

Machine learning model

Model training and validation

Performance metrics of the ML models

Figure 2. Architectural configuration delineating the structure of ResNet50.

Figure 3. Architectural configuration delineating the structure of ResNet18.

Figure 4. Architectural configuration delineating the structure of GoogleNet.

Results

Table 3. Showing the top 20 radiomic features selected at each vertebrae level for ML Models.

Table 4. Showing the top 20 radiomic features selected at each IVD for ML Models.

Machine learning (ML) classifiers

Lumbar vertebrae

Figure 5. ROC curve and confusion matrix for random forest (a,b) and adaboost (c,d) at L4.

Figure 6. ROC curve and confusion matrix for random forest (a,b) and adaboost (c,d) at L5.

Lumbar intervertebral disc

Figure 7. ROC curve and confusion matrix for random forest (a,b) and adaboost (c,d) at L5-S1 IVD.

Deep learning methods

Table 7. Showing the performance measures of DL methods for test dataset in prediction of LBP.

Discussion

Conclusion

Ethical approval

Data availability

Underlying data

Extended data

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated