Optimizing machine learning performance for medical imaging analyses in low-resource environments: The prospects of CNN-based Feature Extractors

Itunuoluwa Isewon; Emmanuel Alagbe; Jelili Oyelade

doi:10.12688/f1000research.156122.1

Home Browse Optimizing machine learning performance for medical imaging analyses...

ALL Metrics

-

Views

-

Downloads

Get PDF

Get XML

Export

▬

✚

Research Article

Optimizing machine learning performance for medical imaging analyses in low-resource environments: The prospects of CNN-based Feature Extractors

[version 1; peer review: 1 approved with reservations]

Itunuoluwa Isewon ^1-3, Emmanuel Alagbe^1,2, Jelili Oyelade^1-3

PUBLISHED 17 Jan 2025

Author details Author details

¹ Department of Computer and Information Sciences, Covenant University, Ota, Ogun State, 112104, Nigeria
² Covenant Applied Informatics and Communication African Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112104, Nigeria
³ Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112104, Nigeria

Itunuoluwa Isewon
Roles: Conceptualization, Formal Analysis, Funding Acquisition, Methodology, Resources, Visualization, Writing – Original Draft Preparation, Writing – Review & Editing

Emmanuel Alagbe
Roles: Formal Analysis, Investigation, Methodology, Software, Validation, Writing – Original Draft Preparation, Writing – Review & Editing

Jelili Oyelade
Roles: Data Curation, Funding Acquisition, Project Administration, Resources, Supervision, Writing – Original Draft Preparation, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

Abstract

Background

Machine learning (ML) algorithms have generally enhanced the speed and accuracy of image-based diagnosis, and treatment strategy planning, compared to the traditional approach of interpreting medical images by experienced radiologists. Convolutional neural networks (CNNs) have been particularly useful in this regard. However, training CNNs come with significant time and computational cost necessitating the development of efficient solutions for deploying CNNs in low-resource environments. This study investigates the use of pre-trained CNNs as feature extractors in medical imaging analyses and highlights the key considerations to be taken into account when implementing these extractors.

Methods

Eight medical imaging datasets covering several diseases (e.g. breast cancer, brain tumor and malaria) were used. Five ML algorithms (k-nearest neighbours, logistic regression, naïve Bayes, random forests and light gradient boosting machine) were implemented with three pre-trained CNN models (VGG-16, EfficientNet-B0, and ResNet-50). These pre-trained models were deployed as feature extractors fed into the classifiers for image classification tasks. The performance of these classifiers was assessed using a ten-fold cross validation scheme with metrics such as accuracy, F1 score, specificity, sensitivity, AUC-ROC, Matthews’ correlation coefficient (MCC), precision, time and space complexities.

Results

From our experiments, we found a general improvement in ML models’ performance after feature extraction (FE). Of the three FE models tested,EfficientNet-B0 performed best in terms of predicitve performance i.e. accuracy, specificity, sensitivity, AUC-ROC, MCC, F1 score, and precision. However, VGG-16 had the best performance in terms of time and memory efficiency. Our results identify two conditions necessary for the optimal performance of the FE models; (i) balanced datasets - a set where classes or categories are represented in approximately equal proportions, (ii) big data sets - adequate number of objects for training and testing. Interestingly, the performance of the ML models did not correlate with the number of class labels i.e. the type of classification task whether binary or multi-class had no influence in the models’ performance. Of the five algorithms investigated, logistic regression benefitted the most from the adoption of the feature extractors.

Conclusion

Our results confirm that the use of CNNs as feature extractors offer an effective balance between high performance and computational efficiency, making them well-suited for use in low-resource environments.

Keywords

Machine learning, feature extraction, image classification, medical imaging, precision medicine, Convolutional neural networks, Deep learning, low resources environments.

Corresponding author: Itunuoluwa Isewon

Competing interests: No competing interests were disclosed.

Grant information: This work was supported by funding from the World Bank awarded to Covenant Applied Informatics, Communication Africa Centre of Excellence (CApIC-ACE) through the ACE Impact Project (2019 – 2024) and Covenant University Center for Research, Innovation and Discovery (CUCRID).
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright: © 2025 Isewon I et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Isewon I, Alagbe E and Oyelade J. Optimizing machine learning performance for medical imaging analyses in low-resource environments: The prospects of CNN-based Feature Extractors [version 1; peer review: 1 approved with reservations]. F1000Research 2025, 14:100 (https://doi.org/10.12688/f1000research.156122.1) First published: 17 Jan 2025, 14:100 (https://doi.org/10.12688/f1000research.156122.1) Latest published: 17 Jan 2025, 14:100 (https://doi.org/10.12688/f1000research.156122.1)

Background

Medical imaging performs a significant role in modern medicine. Its advent has revolutionized diagnosis, treatment planning, and monitoring of various diseases.¹ Techniques such as magnetic resonance imaging (MRI), X-ray, ultrasound, and computed tomography (CT) scans among others provide non-invasive methods to visualize the interior structures of the body, enabling timely discovery of conditions like cancer, cardiovascular diseases, and neurological disorders.² Accurate diagnosis and effective treatment strategies are achieved traditionally by employing the services of experienced radiologists to reliably interpret these images. However, the continuous increase in volume of medical imaging data produced poses significant challenges, necessitating the development of automated and efficient image analysis methods.³

Machine learning (ML) and consequently deep learning (DL) has surfaced as a formidable means of handling the complexities resulting from the presence of vast amounts of medical imaging data.⁴ Among other use cases, image classification has been a key application area where, ML algorithms aim to automatically classify images into pre-defined classes. This is particularly valuable in medical diagnostics, where rapid and accurate classification of images can significantly improve the clinical decision-making process and treatment outcomes.⁵

Convolutional Neural Networks (CNNs) have demonstrated remarkable excellence in image classification tasks, outperforming traditional ML methods and even human performance in certain scenarios.^6,7 The hierarchical structure of CNNs allows them to effectively capture spatial hierarchies and intricate features within images making CNNs particularly well-suited for medical imaging.⁸ Notable successes of CNNs in medical imaging include the detection of lung cancer,⁹ breast cancer,¹⁰ diabetic retinopathy,¹¹ classification of skin lesions,¹² and segmentation of brain tumours,¹³ among others. Despite their advantages, the adoption of CNNs in low-resource environments faces several hindrances such as high computational costs, energy constraints, limited annotated datasets, network connectivity issues, hardware constraints (storage and memory limitations), and limited technical expertise. In addition to increased time complexity, training CNNs requires substantial computational resources, including large amounts of memory and powerful graphics processing units (GPUs), which may not be available in resource-constrained settings.¹⁴ These challenges necessitate the development of more accessible and efficient solutions for deploying CNNs in low-resource environments.

In many image classification experiments, CNNs are not only used as classifiers but also as feature extractors.¹⁵ Feature extraction (FE) is a fundamental aspect of image classification. Traditionally, this process relied on manual techniques and domain-specific knowledge to design features that could be fed into ML algorithms. However, CNNs have revolutionized feature extraction by automating this process. Through the successive layers of convolutions and pooling, CNNs can learn to extract hierarchical and informative features directly from raw image data, eradicating the need for handcrafted feature engineering.¹⁶ By leveraging the pre-trained layers of a CNN, researchers can extract features from images and use them as inputs to other ML algorithms or for further analysis.¹⁵

This study investigates the use of pre-trained CNNs as feature extractors in medical imaging analyses and emphasizes the important factors to consider when utilizing these extractors. Questions answered by this study include:

1. What is the overall impact of using the feature extractors? What is the best feature extractor?
2. What is the impact of data imbalance on the performance of the models?
3. What is the impact of the number of classes on the performance of the models?
4. What is the impact of the number of images on the performance of the models?
5. Which model benefits most from the use of feature extractors?

The remainder of this paper is structured as follows: Section 2 covers the methods. In Section 3, we present the results for our experiments and provide critical discussions in Section 4. Finally, we end with a conclusion and the lessons learned.

Methods

Datasets

Eight image datasets were used in this study. Table 1 presents the description as well as the links to each of these datasets.

Table 1. Datasets used in this study.

class	Designation	Disease	nImages	nClasses	Distribution	link
Binary	DS1	Malaria	27,558	2	Balanced	¹⁷
	DS2	COVID-19	800	2	Balanced	¹⁸
	DS3	Breast Cancer	7909	2	Imbalanced	¹⁹
	DS4	Skin Cancer	9605	2	Imbalanced	²⁰
Multiclass	DS5	Brain Tumor	3264	4	Imbalanced	²¹
	DS6	Lung and Colon Cancer	25,000	5	Balanced	²²
	DS7	Skin Lesions	10,000	7	Imbalanced	²³
	DS8	Colorectal Cancer	5,000	8	Balanced	²⁴

Feature extractors

CNNs have proven to be very beneficial in image preprocessing and classification. The deep (convolutional and pooling) layers in CNN architectures enable them to obtain highly descriptive features that leads to better representation of images in a dataset.^25,26 In this study, three pre-trained CNNs were used, they include VGG-16,²⁷ EfficientNet-B0,²⁸ and ResNet-50.²⁹ These feature extractors were implemented with default settings in python using the Tensorflow framework.³⁰

Machine learning algorithms

Five ML algorithms were utilized in this study. They include logistic regression (LR),³¹ k-nearest neighbours (KNN),³² naive Bayes (NB),³³ radom forests (RF),³⁴ and light gradient boosting machine (LGBM).³⁵ LR, NB, KNN and RF were executed in python via the Scikit learn library,³⁶ while LGBM was implemented using the stand-alone python library developed by its authors.³⁷

Performance metrics

Using ten-fold cross validation, the models’ performances were evaluated and benchmarked using the following performance metrics; time and memory requirements, accuracy, sensitivity, specificity, AUC-ROC, MCC, precision, and F1-score. To eliminate bias from excessive hyperparameter tuning, all models’ parameters were left at their default values. As RF and LGBM are affected by randomization, we used a “random state” parameter to ensure reproducibility. All random states were set to zero and other parameters were set to default values.

Experimental setting

All images were resized into an array of 224 by 224 pixels with 3 colour channels to prevent disparities due to image size. For each of the (n=8) datasets, five ML algorithms were used. To ascertain the impact of the three FE techniques, the baseline performance of each ML algorithm was obtained followed by the performance of the models after FE with each of the FE techniques. As a result, 160 (8 × 5 × 4) experiments were conducted. These experiments were conducted on a server featuring 20 Intel(R) Xeon(R) Silver 4210R CPU processors, 526GB of RAM, and running the Ubuntu 22.04.4 LTS operating system.

Results

Feature extraction improves model performance

Figure 1 shows the averaged performance of ML models across all datasets. it reveals that the adoption of FE generally results in an improvement in ML models’ performance. Our results show that EfficientNet-B0 had the best predicitive performance followed closely by ResNet-50. However, VGG-16 had better efficiency in terms of memory and time utilization ( Figure 2). FE generally led to a reduction in memory and time complexities when compared to the baseline models.

Figure 1. Average performance of ML models trained with different CNN-based feature extractors.

Figure 2. Resource usage of ML models trained with different CNN-based feature extractors.

Balanced datasets enhance the effectiveness of feature extraction

Four datasets (DS1, DS2, DS6, and DS8) have balanced distributions across the different classes of images while the other four (DS3, DS4, DS5 and DS7) have imbalanced distributions. From our results as presented in Figure 3, FE improves ML performance, with the effect being more significant on balanced datasets. There is also a reduction in memory and time required irrespective of whether datasets were balanced or not ( Figure 4).

Figure 3. Performance of ML models on balanced and imbalanced datasets after feature extraction.

Figure 4. Resource usage of ML models on balanced and imbalanced datasets after feature extraction.

Feature extraction performance not correlated with number of classes

Four datasets (DS1, DS2, DS3, and DS4) had two classes of images while DS5, DS6, DS7, and DS8 had four, five, seven and eight classes respectively. As revealed in Figure 5, there was no correlation between the performance of ML models after FE and number of classes. The best results were obtained on DS6 (5 classes) while the worst results were obtained on DS7 (7 classes). Notably, DS6 had a balanced class distribution, while DS7 had the steepest imbalance among all the datasets used. The Melanocytic Nevi (NV) class had 6705 images, while the other six classes had 3295 images. Figure 6 shows that the least memory and time were required when ML models were trained on DS5 (4 classes).

Figure 5. Performance of ML models on datasets with different number of classes after feature extraction.

Figure 6. Resource usage of ML models on datasets with different number of classes after feature extraction.

Feature extraction excels with more training samples but demands higher computational resources

Three datasets (DS2, DS5, and DS8) have between 0 and 5000 images, Three datasets (DS3, DS4, and DS7) have between 5000 and 10000 images, while DS1 and DS6 have above 10000 images. Figure 7 reveals that the best performances were obtained with larger sample sizes. However, the increase in sample sizes also results in increased memory and time consumption ( Figure 8).

Figure 7. Performance of ML models on datasets with different sample sizes after feature extraction.

Figure 8. Resource usage of ML models on datasets with different sample sizes after feature extraction.

Logistic regression benefits significantly from feature extraction

Figure 9 reveals that LR benefitted the most from the adoption of FE before model training while NB benefitted the least. There was a general increase in model performance after FE regardless of the choice of ML models. In terms of memory and time utilization ( Figure 10), there was a general decrease in time spent after FE, however memory utilization increased in NB, KNN, and RF.

Figure 9. Performance of ML models after feature extraction.

Figure 10. Resource usage of ML models after feature extraction.

Feature extraction significantly enhances model performance and resource efficiency for shallow machine learning models

The baseline performance and resource usage of LR, LR after FE with VGG-16,and image classification with VGG-16 on the DS2 dataset are presented in Figures 11 and 12 respectively. The VGG-16 classifier was trained for 20 epochs with a batch size of 64 images. While the predictive performance of LR after FE is slightly lower than the end-to-end image classification with VGG-16, it produces comparable results, more resource efficient and it is significantly higher than the baseline LR. The VGG-16 + LR model required considerably lesser memory and time than both the baseline LR and the VGG-16 end-to-end models.

Figure 11. Performance of baseline LR, LR after VGG-16 FE and VGG-16 end-to-end image classification.

Figure 12. Resource usage of baseline LR, LR after VGG-16 based FE and VGG-16 end-to-end image classification.

Discussion

Here, we have investigated the use of VGG-16, EfficientNet-B0 and ResNet-50 as feature extractors to improve the performance of ML models. The models performed best after EfficientNet based FE. This is not surprising since EfficientNet has been known in literature to yield high performances.^28,38 However, since VGG-16 produced smaller features maps (512 *7 * 7) than EfficientNet-B0 (1280 * 7 * 7), the computational cost for ML models after VGG-16 FE is lesser. Previous studies have reported that ML models perform better with balanced datasets that imbalanced ones.³⁹ Our findings confirms this as the increase in ML model performance after FE was more pronounced in balanced datasets.

We observed that the impact of FE increases with the number of learning samples which agrees with existing studies.^40,41 Multiclass classification is often believed to be more challenging than binary classification with many studies aiming to find optimal methods for binarizing multiclass problems.⁴² However, our findings reveal for the first time that the number of classes do not provide a hinderance to model improvement if there is adequate representation of learning samples in each class.

The impact of FE was most significant when LR was trained on CNN-derived features. As stated in the work of Levy and O’Malley.⁴³ LR should not be automatically dismissed when conducting ML experiments. Our findings provide novel evidence that simple/shallow ML models have the capacity to deliver comparable results as ensemble or DL models but they require extensive FE. For example, we observed that after FE, LR had a significantly improved predicitive performance and was still able to maintain its resource efficiency as a computational lightweight model. Our comparison in this study also show that LR performed well after VGG-16 FE and resulted in significant reduction in memory and time usage compared to baseline LR (memory = 79.68%, time = 81.28 %) and VGG-16 end-to-end model (memory = 95.39 %, time = 99.79%). This significant reduction in computational cost would greatly enhance the feasibility of adopting ML for medical imaging analysis in low-resource environments.⁴⁴

This study however has a few limitations; only five ML algorithms were used, in the future, it would be beneficial to conduct studies involving more ML algorithms. This would enable researchers to compare the performance of CNN-based FE on several groups of ML algorithms (e.g. kernel based models vs tree-based models, shallow models vs ensemble models). Similarly, only three pre-trained CNNs were investigated, there are- more pretrained models such as the Inception and ConvNeXt series.⁴⁵ Lastly, all datasets used in this study have less than 30,000 images. From our findings one could extrapolate that the improvement in performance of the models would only increase with more images.

Conclusion

In conclusion, our study revealed that using pre-trained CNNs as feature extractors holds much promise for improving model performance especially shallow ML models. Its ability to reduce computational demands and accelerate the ML training process makes it especially valuable in resource-limited settings. We observed that small sample sizes and class imbalance, rather than a higher number of classes, pose the greatest challenge to ML model improvement after feature extraction.

Data availability

All data analysed during this study are openly accessible.

1. Malaria dataset: https://www.kaggle.com/datasets/iarunava/cell-images-for-detecting-malaria
2. Covid dataset: https://www.kaggle.com/datasets/mrtejas/covid-19-and-normal-x-ray-dataset-balanced
3. Breast cancer dataset: http://web.inf.ufpr.br/vri/breast-cancer-database
4. Skin cancer dataset: https://www.kaggle.com/datasets/hasnainjaved/melanoma-skin-cancer-dataset-of-10000-images
5. Brain tumor dataset: https://www.kaggle.com/datasets/sartajbhuvaji/brain-tumor-classification-mri
6. Lung and colon cancer dataset: https://academictorrents.com/details/7a638ed187a6180fd6e464b3666a6ea0499af4af
7. Skin lesions dataset: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DBW86T
8. Colorectal cancer dataset: https://www.kaggle.com/datasets/kmader/colorectal-histology-mnist

Software availability

Source code available from: https://github.com/ItunuIsewon/Medical-Image-Feature-Extraction Archived software available from Zenodo: https://doi.org/10.5281/zenodo.13693875.⁴⁶

License: Creative Commons Attribution 4.0 International

References

1. Hussain S, Mubeen I, Ullah N, et al.: Modern Diagnostic Imaging Technique Applications and Risk Factors in the Medical Field: A Review. Biomed. Res. Int. 2022; 2022: 1–19. PubMed Abstract | Publisher Full Text | Free Full Text
2. Kumar D, Pratap B, Boora N, et al.: A comparative study of medical imaging modalities. Int. J. Radiol. Sci. 2021; 3: 9–16. Publisher Full Text
3. Panayides AS, Amini A, Filipovic ND, et al.: AI in Medical Imaging Informatics: Current Challenges and Future Directions. IEEE J. Biomed. Health Inform. 2020; 24: 1837–1857. PubMed Abstract | Publisher Full Text | Free Full Text
4. Chakraborty C, Bhattacharya M, Pal S, et al.: From machine learning to deep learning: Advances of the recent data-driven paradigm shift in medicine and healthcare. Curr. Res. Biotechnol. 2024; 7: 100164. Publisher Full Text
5. Li M, Jiang Y, Zhang Y, et al.: Medical image analysis using deep learning algorithms. Front. Public Health. 2023; 11: 11. Publisher Full Text
6. Ersavas T, Smith MA, Mattick JS: Novel applications of Convolutional Neural Networks in the age of Transformers. Sci. Rep. 2024; 14: 1–11.
7. Yu S, Jia S, Xu C: Convolutional neural networks for hyperspectral image classification. Neurocomputing. 2017; 219: 88–98. Publisher Full Text
8. Razzaghi P, Abbasi K, Bayat P: Learning spatial hierarchies of high-level features in deep neural network. J. Vis. Commun. Image Represent. 2020; 70: 102817. Publisher Full Text
9. Adetiba E, Olugbara OO: Lung cancer prediction using neural network ensemble with histogram of oriented gradient genomic features. Sci. World J. 2015; 2015. PubMed Abstract | Publisher Full Text | Free Full Text
10. Simonyan EO, Badejo Joke A, Weijin JS: Histopathological breast cancer classification using CNN. Mater. Today Proc. 2023. Publisher Full Text
11. Pratt H, Coenen F, Broadbent DM, et al.: Convolutional Neural Networks for Diabetic Retinopathy. Procedia Comput. Sci. 2016; 90: 200–205. Publisher Full Text
12. Shetty B, Fernandes R, Rodrigues AP, et al.: Skin lesion classification of dermoscopic images using machine learning and convolutional neural network. Sci. Rep. 2022; 12: 1–11.
13. Balamurugan T, Gnanamanoharan E: Brain tumor segmentation and classification using hybrid deep CNN with LuNetClassifier. Neural. Comput. Applic. 2023; 35: 4739–4753. Publisher Full Text
14. Sarker IH: Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions. SN Comput. Sci. 2021; 2: 420. PubMed Abstract | Publisher Full Text | Free Full Text
15. Zhao HH, Liu H: Multiple classifiers fusion and CNN feature extraction for handwritten digits recognition. Granul. Comput. 2020; 5: 411–418. Publisher Full Text
16. Kim HE, Cosa-Linan A, Santhanam N, et al.: Transfer learning for medical image classification: a literature review. BMC Med. Imaging. 2022; 22: 1–13.
17. Malaria Cell Images Dataset: Accessed 9 Jul 2024. Reference Source
18. COVID-19 and Normal X-ray Dataset: Balanced. Accessed 9 Jul 2024. Reference Source
19. Spanhol FA, Oliveira LS, Petitjean C, et al.: A Dataset for Breast Cancer Histopathological Image Classification. IEEE Trans. Biomed. Eng. 2016; 63: 1455–1462. Publisher Full Text
20. Melanoma Skin Cancer Dataset of 10000 Images. Accessed 9 Jul 2024. Reference Source
21. Brain Tumor Classification (MRI). Accessed 9 Jul 2024. Reference Source
22. Borkowski AA, Bui MM, Thomas LB, et al.: Lung and Colon Cancer Histopathological Image Dataset (LC25000).2019.
23. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions - ViDIR Dataverse. Accessed 9 Jul 2024. Publisher Full Text
24. Colorectal Histology MNIST. Accessed 9 Jul 2024. Reference Source
25. Liu B, Pan D, Shuai Z, et al.: ECSD-Net: A joint optic disc and cup segmentation and glaucoma classification network based on unsupervised domain adaptation. Comput. Methods Prog. Biomed. 2022; 213: 106530. PubMed Abstract | Publisher Full Text
26. Khan S, Yairi T: A review on the application of deep learning in system health management. Mech. Syst. Signal Process. 2018; 107: 241–265. Publisher Full Text
27. Simonyan K, Zisserman A: Very deep convolutional networks for large-scale image recognition. 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings. 2015.
28. Tan M, Le QV: EfficientNet: Rethinking model scaling for convolutional neural networks. 36th International Conference on Machine Learning, ICML 2019. 2019-June; 2019: 10691–700.
29. He K, Zhang X, Ren S, et al.: Deep residual learning for image recognition. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2016; 2016-Decem; pp. 770–778.
30. Abadi M, Barham P, Chen J, et al.: TensorFlow: A system for large-scale machine learning. Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016. 2016; pp. 265–283.
31. Nazish USI, Salam A, Ullah W, et al.: COVID-19 Lung Image Classification Based on Logistic Regression and Support Vector Machine. Lect. Notes Netw. Syst. 2021; 239: 13–23. Publisher Full Text
32. Zhou NR, Liu XX, Chen YL, et al.: Quantum K-Nearest-Neighbor Image Classification Algorithm Based on K-L Transform. Int. J. Theor. Phys. 2021; 60: 1209–1224. Publisher Full Text
33. Sivakumar V, Yogesh CK, Vatchala S, et al.: An efficient lung image classification and detection using spiral-optimized Gabor filter with convolutional neural network. Int. J. Imaging Syst. Technol. 2024; 34: e23013. Publisher Full Text
34. Kim HE, Cosa-Linan A, Santhanam N, et al.: Transfer learning for medical image classification: a literature review. BMC Med. Imaging. 2022; 22: 1–13.
35. Li M, Liu L, Qi J, et al.: MRI-based machine learning models predict the malignant biological behavior of meningioma. BMC Med. Imaging. 2023; 23: 1–9.
36. Pedregosa F, Varoquaux G, Gramfort A, et al.: Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011; 12: 2825–2830.
37. Microsoft Corporation: Welcome to LightGBM’s documentation! — LightGBM 4.0.0 documentation. Website; 2023. Accessed 5 Jul 2024. Reference Source
38. Mzoughi H, Njeh I, Ben SM, et al.: Deep efficient-nets with transfer learning assisted detection of COVID-19 using chest X-ray radiology imaging. Multimed. Tools Appl. 2023; 82: 39303–39325. PubMed Abstract | Publisher Full Text | Free Full Text
39. Isewon I, Alagbe E, Rotimi S, et al.: A Multi-Omics Classifier For Prediction Of Androgen Deprivation Treatment Response In Prostate Cancer Patients. Proceedings - 2022 IEEE International Conference on Bioinformatics and Biomedicine. BIBM; 2022; 2022. : 749–752.
40. Hirahara D, Takaya E, Takahara T, et al.: Effects of data count and image scaling on Deep Learning training. PeerJ. Comput. Sci. 2020; 6: e312–e313. Publisher Full Text
41. Kempster C, Butler G, Kuznecova E, et al.: Fully automated platelet differential interference contrast image analysis via deep learning. Sci. Rep. 2022; 12: 1–13.
42. Lan G, Gao Z, Tong L, et al.: Class binarization to neuroevolution for multiclass classification. Neural. Comput. Applic. 2022; 34: 19845–19862. Publisher Full Text
43. Levy JJ, O’Malley AJ: Don’t dismiss logistic regression: The case for sensible extraction of interactions in the era of machine learning. BMC Med. Res. Methodol. 2020; 20: 171. PubMed Abstract | Publisher Full Text | Free Full Text
44. Javaid M, Haleem A, Pratap Singh R, et al.: Significance of machine learning in healthcare: Features, pillars and applications. Int. J. Intell. Netw. 2022; 3: 58–73. Publisher Full Text
45. Keras: Keras Applications. Keras.Io; 2022; 1. Accessed 9 Jul 2024. Reference Source
46. ItunuIsewon: ItunuIsewon/Medical-Image-Feature-Extraction: Medical-image-Feature-Extraction (MIFE). Zenodo. 2024. Publisher Full Text

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 17 Jan 2025

Author details Author details

¹ Department of Computer and Information Sciences, Covenant University, Ota, Ogun State, 112104, Nigeria
² Covenant Applied Informatics and Communication African Centre of Excellence (CApIC-ACE), Covenant University, Ota, Ogun State, 112104, Nigeria
³ Covenant University Bioinformatics Research (CUBRe), Covenant University, Ota, Ogun State, 112104, Nigeria

Itunuoluwa Isewon
Roles: Conceptualization, Formal Analysis, Funding Acquisition, Methodology, Resources, Visualization, Writing – Original Draft Preparation, Writing – Review & Editing

Emmanuel Alagbe
Roles: Formal Analysis, Investigation, Methodology, Software, Validation, Writing – Original Draft Preparation, Writing – Review & Editing

Jelili Oyelade
Roles: Data Curation, Funding Acquisition, Project Administration, Resources, Supervision, Writing – Original Draft Preparation, Writing – Review & Editing

Competing interests

No competing interests were disclosed.

Grant information

This work was supported by funding from the World Bank awarded to Covenant Applied Informatics, Communication Africa Centre of Excellence (CApIC-ACE) through the ACE Impact Project (2019 – 2024) and Covenant University Center for Research, Innovation and Discovery (CUCRID).
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Article Versions (1)

version 1

Published: 17 Jan 2025, 14:100

https://doi.org/10.12688/f1000research.156122.1

Copyright

© 2025 Isewon I et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

0

SEE MORE DETAILS

CITE

how to cite this article

Isewon I, Alagbe E and Oyelade J. Optimizing machine learning performance for medical imaging analyses in low-resource environments: The prospects of CNN-based Feature Extractors [version 1; peer review: 1 approved with reservations]. F1000Research 2025, 14:100 (https://doi.org/10.12688/f1000research.156122.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Version 1

VERSION 1

PUBLISHED 17 Jan 2025

Views

7

Reviewer Report 24 Feb 2025

Veronika Cheplygina, IT University Copenhagen, Copenhagen, Denmark

Approved with Reservations

https://doi.org/10.5256/f1000research.171389.r362705

The paper presents a benchmarking study of popular/traditional CNNs as feature extractors for medical images, such that the subsequently trained classifiers can be used in low-resource settings. Experiments on eight medical datasets ranging from 800 to 27K patients show that ... Continue reading

The paper presents a benchmarking study of popular/traditional CNNs as feature extractors for medical images, such that the subsequently trained classifiers can be used in low-resource settings. Experiments on eight medical datasets ranging from 800 to 27K patients show that class balance and dataset size are important for good performance.

The paper is clearly written and organized, it is a plus that the results are split up into subsections which already state the conclusion of a particular experiment. While the methods are not what is considered “state of the art” today, I believe it is a valuable contribution to revisit such benchmarks, in particular in low resource settings.

There are some aspects of the paper which I think can be improved.

1)   Some relevant medical image analysis literature is missing, for example surveys on deep learning, transfer learning, how to deal with the small sample size setting etc. Specifically, transfer learning (of which off-the-shelf feature extraction is one type) is often used in medical imaging because it allows using weights of already pre-trained models for example on ImageNet or RadImageNet, which then requires little fine-tuning, which is still computationally feasible and leads to good results even for small target datasets.

2)   Given the low resource setting, I think it would be relevant to explain more about the setting, what kind of resources might be available in a clinic in Nigeria for example. It would be also relevant to contrast the findings to the memory/time requirements be of a fully trained model like a ResNet-50. I do not suggest the authors do such experiments, but one could try to extrapolate from related literature addressing such topics.

3)   Too few details about the ML methods (KNN, RF etc.), since several have parameters that need to be either chosen or tuned on the validation set. Is any feature scaling applied? This could have large effects on performance of KNN in particular.

4)   Most importantly, the datasets are taken “as-is” and aside from the basic statistics no details are given as to what is being investigated. For example COVID can be easily diagnosed through PCR tests, so a person with complaints about COVID would not be immediately referred for an X-ray.

I have checked some of the datasets that are sourced from Kaggle. I believe these are often not the original data sources (the original being for example NIH, ISIC archive, grand-challenge.org or others), and in many cases a description of the data on Kaggle is missing – for example, the demographics of the patients, how many hospitals the data is from etc. The diversity of the dataset is important for understanding the results. It could be worth trying to find the original source, to give proper credit to the data creators (the sharing on Kaggle is often disregarding the dataset license and terms of use). At the least, there should be a description of what the dataset is actually aiming to do.

If data in some of these datasets (as happens on Kaggle) is aggregated from multiple other datasets, this can present problems as some images become duplicated, and the presence of images from different sources can introduce shortcut (see recent literature on shortcut learning in medical images). For example if hospital A has more patients with cancer than hospital B, but A and B use different equipment which is visible in the image, then ML can learn that recognizing the equipment is a shortcut for the cancer diagnosis. Looking at the COVID dataset, I see that some images are in PNG and others in JPG format, which would suggest such aggregation.

Although, I do believe this shortcut learning issue would be less prevalent in the types of methods that the authors test (because they are more regularized / less prone to overfitting), so it could be relevant to comment on this in the discussion.

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Not applicable
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Partly

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Machine learning, medical imaging

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Respond or Comment

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 17 Jan 2025

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1
Version 1 17 Jan 25	read

Veronika Cheplygina, IT University Copenhagen, Copenhagen, Denmark

Comments on this article

All Comments(0)

Add a comment

Sign up for content alerts

Browse by related subjects

Back to all reports

Reviewer Report

7 Views

24 Feb 2025 | for Version 1

Veronika Cheplygina, IT University Copenhagen, Copenhagen, Denmark

7 Views Cite this report Responses(0)

Approved With Reservations

The paper presents a benchmarking study of popular/traditional CNNs as feature extractors for medical images, such that the subsequently trained classifiers can be used in low-resource settings. Experiments on eight medical datasets ranging from 800 to 27K patients show that class balance and dataset size are important for good performance.

The paper is clearly written and organized, it is a plus that the results are split up into subsections which already state the conclusion of a particular experiment. While the methods are not what is considered “state of the art” today, I believe it is a valuable contribution to revisit such benchmarks, in particular in low resource settings.

There are some aspects of the paper which I think can be improved.

1)   Some relevant medical image analysis literature is missing, for example surveys on deep learning, transfer learning, how to deal with the small sample size setting etc. Specifically, transfer learning (of which off-the-shelf feature extraction is one type) is often used in medical imaging because it allows using weights of already pre-trained models for example on ImageNet or RadImageNet, which then requires little fine-tuning, which is still computationally feasible and leads to good results even for small target datasets.

2)   Given the low resource setting, I think it would be relevant to explain more about the setting, what kind of resources might be available in a clinic in Nigeria for example. It would be also relevant to contrast the findings to the memory/time requirements be of a fully trained model like a ResNet-50. I do not suggest the authors do such experiments, but one could try to extrapolate from related literature addressing such topics.

3)   Too few details about the ML methods (KNN, RF etc.), since several have parameters that need to be either chosen or tuned on the validation set. Is any feature scaling applied? This could have large effects on performance of KNN in particular.

4)   Most importantly, the datasets are taken “as-is” and aside from the basic statistics no details are given as to what is being investigated. For example COVID can be easily diagnosed through PCR tests, so a person with complaints about COVID would not be immediately referred for an X-ray.

I have checked some of the datasets that are sourced from Kaggle. I believe these are often not the original data sources (the original being for example NIH, ISIC archive, grand-challenge.org or others), and in many cases a description of the data on Kaggle is missing – for example, the demographics of the patients, how many hospitals the data is from etc. The diversity of the dataset is important for understanding the results. It could be worth trying to find the original source, to give proper credit to the data creators (the sharing on Kaggle is often disregarding the dataset license and terms of use). At the least, there should be a description of what the dataset is actually aiming to do.

If data in some of these datasets (as happens on Kaggle) is aggregated from multiple other datasets, this can present problems as some images become duplicated, and the presence of images from different sources can introduce shortcut (see recent literature on shortcut learning in medical images). For example if hospital A has more patients with cancer than hospital B, but A and B use different equipment which is visible in the image, then ML can learn that recognizing the equipment is a shortcut for the cancer diagnosis. Looking at the COVID dataset, I see that some images are in PNG and others in JPG format, which would suggest such aggregation.

Although, I do believe this shortcut learning issue would be less prevalent in the types of methods that the authors test (because they are more regularized / less prone to overfitting), so it could be relevant to comment on this in the discussion.

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

Yes
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Not applicable
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Partly

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Machine learning, medical imaging

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Respond to this report

Responses (0)

[1] 1. Hussain S, Mubeen I, Ullah N, et al.: Modern Diagnostic Imaging Technique Applications and Risk Factors in the Medical Field: A Review. Biomed. Res. Int. 2022; 2022: 1–19. PubMed Abstract | Publisher Full Text | Free Full Text

[2] 2. Kumar D, Pratap B, Boora N, et al.: A comparative study of medical imaging modalities. Int. J. Radiol. Sci. 2021; 3: 9–16. Publisher Full Text

[3] 3. Panayides AS, Amini A, Filipovic ND, et al.: AI in Medical Imaging Informatics: Current Challenges and Future Directions. IEEE J. Biomed. Health Inform. 2020; 24: 1837–1857. PubMed Abstract | Publisher Full Text | Free Full Text

[4] 4. Chakraborty C, Bhattacharya M, Pal S, et al.: From machine learning to deep learning: Advances of the recent data-driven paradigm shift in medicine and healthcare. Curr. Res. Biotechnol. 2024; 7: 100164. Publisher Full Text

[5] 5. Li M, Jiang Y, Zhang Y, et al.: Medical image analysis using deep learning algorithms. Front. Public Health. 2023; 11: 11. Publisher Full Text

[6] 6. Ersavas T, Smith MA, Mattick JS: Novel applications of Convolutional Neural Networks in the age of Transformers. Sci. Rep. 2024; 14: 1–11.

[7] 7. Yu S, Jia S, Xu C: Convolutional neural networks for hyperspectral image classification. Neurocomputing. 2017; 219: 88–98. Publisher Full Text

[8] 8. Razzaghi P, Abbasi K, Bayat P: Learning spatial hierarchies of high-level features in deep neural network. J. Vis. Commun. Image Represent. 2020; 70: 102817. Publisher Full Text

[9] 9. Adetiba E, Olugbara OO: Lung cancer prediction using neural network ensemble with histogram of oriented gradient genomic features. Sci. World J. 2015; 2015. PubMed Abstract | Publisher Full Text | Free Full Text

[10] 10. Simonyan EO, Badejo Joke A, Weijin JS: Histopathological breast cancer classification using CNN. Mater. Today Proc. 2023. Publisher Full Text

[11] 11. Pratt H, Coenen F, Broadbent DM, et al.: Convolutional Neural Networks for Diabetic Retinopathy. Procedia Comput. Sci. 2016; 90: 200–205. Publisher Full Text

[12] 12. Shetty B, Fernandes R, Rodrigues AP, et al.: Skin lesion classification of dermoscopic images using machine learning and convolutional neural network. Sci. Rep. 2022; 12: 1–11.

[13] 13. Balamurugan T, Gnanamanoharan E: Brain tumor segmentation and classification using hybrid deep CNN with LuNetClassifier. Neural. Comput. Applic. 2023; 35: 4739–4753. Publisher Full Text

[14] 14. Sarker IH: Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions. SN Comput. Sci. 2021; 2: 420. PubMed Abstract | Publisher Full Text | Free Full Text

[15] 15. Zhao HH, Liu H: Multiple classifiers fusion and CNN feature extraction for handwritten digits recognition. Granul. Comput. 2020; 5: 411–418. Publisher Full Text

[16] 16. Kim HE, Cosa-Linan A, Santhanam N, et al.: Transfer learning for medical image classification: a literature review. BMC Med. Imaging. 2022; 22: 1–13.

[17] 17. Malaria Cell Images Dataset: Accessed 9 Jul 2024. Reference Source

[18] 18. COVID-19 and Normal X-ray Dataset: Balanced. Accessed 9 Jul 2024. Reference Source

[19] 19. Spanhol FA, Oliveira LS, Petitjean C, et al.: A Dataset for Breast Cancer Histopathological Image Classification. IEEE Trans. Biomed. Eng. 2016; 63: 1455–1462. Publisher Full Text

[20] 20. Melanoma Skin Cancer Dataset of 10000 Images. Accessed 9 Jul 2024. Reference Source

[21] 21. Brain Tumor Classification (MRI). Accessed 9 Jul 2024. Reference Source

[22] 22. Borkowski AA, Bui MM, Thomas LB, et al.: Lung and Colon Cancer Histopathological Image Dataset (LC25000).2019.

[23] 23. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions - ViDIR Dataverse. Accessed 9 Jul 2024. Publisher Full Text

[24] 24. Colorectal Histology MNIST. Accessed 9 Jul 2024. Reference Source

[25] 25. Liu B, Pan D, Shuai Z, et al.: ECSD-Net: A joint optic disc and cup segmentation and glaucoma classification network based on unsupervised domain adaptation. Comput. Methods Prog. Biomed. 2022; 213: 106530. PubMed Abstract | Publisher Full Text

[26] 26. Khan S, Yairi T: A review on the application of deep learning in system health management. Mech. Syst. Signal Process. 2018; 107: 241–265. Publisher Full Text

[27] 27. Simonyan K, Zisserman A: Very deep convolutional networks for large-scale image recognition. 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings. 2015.

[28] 28. Tan M, Le QV: EfficientNet: Rethinking model scaling for convolutional neural networks. 36th International Conference on Machine Learning, ICML 2019. 2019-June; 2019: 10691–700.

[29] 29. He K, Zhang X, Ren S, et al.: Deep residual learning for image recognition. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2016; 2016-Decem; pp. 770–778.

[30] 30. Abadi M, Barham P, Chen J, et al.: TensorFlow: A system for large-scale machine learning. Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016. 2016; pp. 265–283.

[31] 31. Nazish USI, Salam A, Ullah W, et al.: COVID-19 Lung Image Classification Based on Logistic Regression and Support Vector Machine. Lect. Notes Netw. Syst. 2021; 239: 13–23. Publisher Full Text

[32] 32. Zhou NR, Liu XX, Chen YL, et al.: Quantum K-Nearest-Neighbor Image Classification Algorithm Based on K-L Transform. Int. J. Theor. Phys. 2021; 60: 1209–1224. Publisher Full Text

[33] 33. Sivakumar V, Yogesh CK, Vatchala S, et al.: An efficient lung image classification and detection using spiral-optimized Gabor filter with convolutional neural network. Int. J. Imaging Syst. Technol. 2024; 34: e23013. Publisher Full Text

[34] 34. Kim HE, Cosa-Linan A, Santhanam N, et al.: Transfer learning for medical image classification: a literature review. BMC Med. Imaging. 2022; 22: 1–13.

[35] 35. Li M, Liu L, Qi J, et al.: MRI-based machine learning models predict the malignant biological behavior of meningioma. BMC Med. Imaging. 2023; 23: 1–9.

[36] 36. Pedregosa F, Varoquaux G, Gramfort A, et al.: Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011; 12: 2825–2830.

[37] 37. Microsoft Corporation: Welcome to LightGBM’s documentation! — LightGBM 4.0.0 documentation. Website; 2023. Accessed 5 Jul 2024. Reference Source

[38] 38. Mzoughi H, Njeh I, Ben SM, et al.: Deep efficient-nets with transfer learning assisted detection of COVID-19 using chest X-ray radiology imaging. Multimed. Tools Appl. 2023; 82: 39303–39325. PubMed Abstract | Publisher Full Text | Free Full Text

[39] 39. Isewon I, Alagbe E, Rotimi S, et al.: A Multi-Omics Classifier For Prediction Of Androgen Deprivation Treatment Response In Prostate Cancer Patients. Proceedings - 2022 IEEE International Conference on Bioinformatics and Biomedicine. BIBM; 2022; 2022. : 749–752.

[40] 40. Hirahara D, Takaya E, Takahara T, et al.: Effects of data count and image scaling on Deep Learning training. PeerJ. Comput. Sci. 2020; 6: e312–e313. Publisher Full Text

[41] 41. Kempster C, Butler G, Kuznecova E, et al.: Fully automated platelet differential interference contrast image analysis via deep learning. Sci. Rep. 2022; 12: 1–13.

[42] 42. Lan G, Gao Z, Tong L, et al.: Class binarization to neuroevolution for multiclass classification. Neural. Comput. Applic. 2022; 34: 19845–19862. Publisher Full Text

[43] 43. Levy JJ, O’Malley AJ: Don’t dismiss logistic regression: The case for sensible extraction of interactions in the era of machine learning. BMC Med. Res. Methodol. 2020; 20: 171. PubMed Abstract | Publisher Full Text | Free Full Text

[44] 44. Javaid M, Haleem A, Pratap Singh R, et al.: Significance of machine learning in healthcare: Features, pillars and applications. Int. J. Intell. Netw. 2022; 3: 58–73. Publisher Full Text

[45] 45. Keras: Keras Applications. Keras.Io; 2022; 1. Accessed 9 Jul 2024. Reference Source

[46] 46. ItunuIsewon: ItunuIsewon/Medical-Image-Feature-Extraction: Medical-image-Feature-Extraction (MIFE). Zenodo. 2024. Publisher Full Text

Optimizing machine learning performance for medical imaging analyses in low-resource environments: The prospects of CNN-based Feature Extractors

Abstract

Background

Methods

Results

Conclusion

Keywords

Background

Methods

Datasets

Table 1. Datasets used in this study.

Feature extractors

Machine learning algorithms

Performance metrics

Experimental setting

Results

Feature extraction improves model performance

Figure 1. Average performance of ML models trained with different CNN-based feature extractors.

Figure 2. Resource usage of ML models trained with different CNN-based feature extractors.

Balanced datasets enhance the effectiveness of feature extraction

Figure 3. Performance of ML models on balanced and imbalanced datasets after feature extraction.

Figure 4. Resource usage of ML models on balanced and imbalanced datasets after feature extraction.

Feature extraction performance not correlated with number of classes

Figure 5. Performance of ML models on datasets with different number of classes after feature extraction.

Figure 6. Resource usage of ML models on datasets with different number of classes after feature extraction.

Feature extraction excels with more training samples but demands higher computational resources

Figure 7. Performance of ML models on datasets with different sample sizes after feature extraction.

Figure 8. Resource usage of ML models on datasets with different sample sizes after feature extraction.

Logistic regression benefits significantly from feature extraction

Figure 9. Performance of ML models after feature extraction.

Figure 10. Resource usage of ML models after feature extraction.

Feature extraction significantly enhances model performance and resource efficiency for shallow machine learning models

Figure 11. Performance of baseline LR, LR after VGG-16 FE and VGG-16 end-to-end image classification.

Figure 12. Resource usage of baseline LR, LR after VGG-16 based FE and VGG-16 end-to-end image classification.

Discussion

Conclusion

Data availability

Software availability

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated