Perspectives and guidance for developing artificial intelligence-based applications for healthcare using medical images

Bapu Koundinya Desiraju; Ramachandran Thiruvengadam; Nitya Wadhwa; Ashok Khurana; Aris T Papageorghiou; J. Alison Noble; Shinjini Bhatnagar

doi:10.12688/f1000research.152426.1

Home Browse Perspectives and guidance for developing artificial intelligence-based...

ALL Metrics

-

Views

-

Downloads

Get PDF

Get XML

Export

▬

✚

Opinion Article

Perspectives and guidance for developing artificial intelligence-based applications for healthcare using medical images

[version 1; peer review: 2 not approved]

Bapu Koundinya Desiraju¹, Ramachandran Thiruvengadam², Nitya Wadhwa¹, [...] Ashok Khurana³, Aris T Papageorghiou⁴, J. Alison Noble⁵, Shinjini Bhatnagar ¹

Bapu Koundinya Desiraju¹, Ramachandran Thiruvengadam², [...] Nitya Wadhwa¹, Ashok Khurana³, Aris T Papageorghiou⁴, J. Alison Noble⁵, Shinjini Bhatnagar ¹

PUBLISHED 23 Aug 2024

Author details Author details

¹ MCH, Translational health science and technology institute, New Delhi, Faridabad, 121001, India
² Biochemistry, Pondicherry institute of medical sciences, Pondicherry, Pondicherry, 605014, India
³ The Ultrasound Lab, New Delhi, New delhi, India
⁴ Nuffield Department of Women’s and Reproductive Health, University of Oxford, Oxford, UK
⁵ Institute of Biomedical Engineering, University of Oxford, Oxford, UK

Bapu Koundinya Desiraju
Roles: Conceptualization, Funding Acquisition, Writing – Original Draft Preparation

Ramachandran Thiruvengadam
Roles: Conceptualization, Writing – Original Draft Preparation

Nitya Wadhwa
Roles: Conceptualization, Supervision, Writing – Review & Editing

Ashok Khurana
Roles: Supervision

Aris T Papageorghiou
Roles: Supervision

J. Alison Noble
Roles: Conceptualization, Funding Acquisition, Supervision, Writing – Review & Editing

Shinjini Bhatnagar
Roles: Conceptualization, Funding Acquisition, Supervision, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

This article is included in the AI in Medicine and Healthcare collection.

Abstract

Artificial intelligence (AI) has significant potential to transform healthcare and improve patient care. However, successful development and integration of AI models requires careful consideration of study designs and sample size calculations for development and validation of models, publishing standards, prototype development for translation and collaboration with stakeholders. As the field is relatively new and rapidly evolving there is a lack of guidance and agreement on best practices for most of these steps. We engaged stakeholders in the form of clinicians, researchers from academia and industry, and data scientists to discuss various aspects of the translational pipeline and identified the challenges researchers in the field face and potential solutions to them. In this viewpoint, we present the summary of our discussions as a brief guide on the process of developing AI-based applications for healthcare using medical images. We organized the entire process into six major themes (i.e., The gaps AI can fill in healthcare, Development of AI models for healthcare: practical and important things to consider, Good practices for validation of AI models for healthcare: study designs and sample size calculation, Points to consider when publishing AI models, Translation towards products, Challenges and potential solutions from a technical perspective) and presented important points as a rule of thumb. We conclude that successful integration of AI in healthcare requires a collaborative approach, rigorous validation, adherence to best practices as described and cited, and consideration of technical aspects.

Keywords

Medical imaging, Artificial intelligence, AI in healthcare, Guide , Introduction, Rules of thumb

Corresponding authors: J. Alison Noble, Shinjini Bhatnagar

Competing interests: Alison Noble consults for Intelligent Ultrasound Ltd.

Grant information: AN was funded by the UK Global Challenges Research Fund in partnership with the Engineering and Physical Sciences Research Council (EP/RO13853/1). SB is funded by Department of Biotechnology, Ministry of Science and Technology, Govt. of India (BT/PR9983/MED/97/194/2013).BKD’s fellowship was supported by DBT Wellcome Trust India Alliance (IA/CPHE/18/1/503947)
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright: © 2024 Desiraju BK et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Desiraju BK, Thiruvengadam R, Wadhwa N et al. Perspectives and guidance for developing artificial intelligence-based applications for healthcare using medical images [version 1; peer review: 2 not approved]. F1000Research 2024, 13:954 (https://doi.org/10.12688/f1000research.152426.1) First published: 23 Aug 2024, 13:954 (https://doi.org/10.12688/f1000research.152426.1) Latest published: 23 Aug 2024, 13:954 (https://doi.org/10.12688/f1000research.152426.1)

Introduction

Artificial intelligence (AI) in healthcare is an interdisciplinary field that requires expertise across several disciplines including clinical medicine, engineering, computer science, and statistics.¹^,² As this field evolves, workshops and discussions can serve an important purpose to educate early career researchers and all those new to AI. We, an interdisciplinary and international research collaborative team (The University of Oxford, UK and the Translational Health Science and Technology Institute (THSTI), India) organized three workshops on AI in healthcare attended by clinicians, physician-scientists, biologists, computer vision scientists, engineering and medical students. We summarize here the key messages from the thought-provoking discussions held in the six apriori identified themes of the three workshops (Figure 1). We hope that this article serves both as an introductory guide and as a compendium of rules of thumb checklist for researchers (publicly available on our YouTube channel).

Figure 1. Different steps in the translational pipeline discussed in this viewpoint.

The first workshop focused on introducing this interdisciplinary field to the participants, with discussions on specific use cases of AI in clinical practice. This was followed by the identification of problem areas (research questions) in maternal and child health where AI may be used to find sustainable solutions. In the second workshop, we discussed in detail the process of development, validation, and reporting strategies for AI models. In the third and final workshop, we discussed challenges of integrating AI-enabled solutions into public health, and potential solutions and strategies for addressing bias in data, generalizability, and data sharing. We also explored the difficult task of collecting and managing multidimensional clinical data and how to address the associated challenges.

Summary of the discussions

Theme 1- The gaps AI can fill in healthcare

Different classes of AI models support different types of healthcare tasks. Understanding the use case is key and can influence the study designs of development and validation. This ensures that appropriate evidence is generated to enable clinical application.

• Assistive models help physicians automate non-trivial but repetitive tasks. One important example is of models which can automatically identify anatomy of interest in an ultrasound scan and measure relevant parameters such as fetal biometry.³^,⁴ These models might reduce the time taken and enable consistency of quality for healthcare practitioners to complete their tasks.
• Diagnostic models are intended to automatically diagnose a disease. Some examples are models that screen large numbers of chest X-rays for tuberculosis⁵ or retinal images for diabetic retinopathy.⁶ These facilitate preliminary risk stratification (screen) before a medical expert confirms the diagnosis.
• Prognostic models are designed to predict a patient’s prognosis, e.g cancer recurrence,⁷ 5-year survival, etc. It is prudent to note that unlike the previous two examples, these types of models predict future events and therefore a physician cannot identify and correct a model’s mistake.

Theme 2 - Development of AI models for healthcare: practical and important things to consider

At the outset of the research process, there is an element of context - the research question. We strongly recommend thinking about a few questions like “What are we endeavoring to predict?”, “How important is the problem at the level of application?” and “How would clinicians use the tool in their workflow?”. We opine that there should be a level of engagement from the beginning, right at the point of study design, with all stakeholders. This includes regulators, clinicians, patients, and the investigators who will conduct the study.

From the outset, we recommend planning for validation and appropriate metrics for evaluating the model. A few relevant examples would be discussions about ground truth, study design for validation and appropriate metrics (for example sensitivity or specificity) and to optimize for a given clinical context. There are some specific challenges in finding the benchmarks against which these AI models can be compared. The most intuitive way is to compare against a panel of clinical experts. However, it is advisable to keep some caveats in mind, while considering this mode of comparison. When clinicians make a diagnosis, they have more contextual or corroborating information about patients, such as their symptoms, investigations, and clinical history. In contrast, AI models are typically provided with information limited to images and therefore a direct comparison of models with clinicians maybe not appropriate. It is recommended that such contextual information is incorporated while developing the AI models. Quality assurance of the input data, manual annotations (ground truth) and the deidentification process are other pertinent factors. Questions like “How do you measure intra- and inter-annotator agreement?”, “What should be considered the gold standard?”, “How much error is tolerable?” must be discussed while designing the study. In addition, we suggest that both automated data collection pipelines and quality assurance checks should be in place before data collection starts. Automation in both quality assurance and annotation is essential as investigators cannot manually check all the data points, for large-scale studies involving several thousands of images or videos. All these considerations clearly emphasize the importance of an interdisciplinary team and constant engagement from the start with all multidisciplinary stakeholders.

Theme 3 - Good practices for validation of the AI models for healthcare: study designs and sample size calculation

It is well-known that the models perform optimistically when evaluated or tested in the population in which they were developed. The demand for external validation along with development studies has increased from the scientific community. A recent review of studies that developed AI models for diagnostic analysis of medical images found that only 6% of 516 studies had conducted external validation.⁸ Another study reported that there were a relatively small number of prospective studies on medical imaging. Randomized clinical trials conducted in the past have been found to be at high risk of bias and deviating from existing reporting standards.⁹^,¹⁰ Hence, it is essential to carefully consider validation methods and appropriate practices to follow for external validation studies of AI models. In the literature there is sometimes confusion between internal and external validations. During internal validation, the model is evaluated for accuracy and robustness. In contrast, external validation measures the model’s generalizability and clinical effectiveness. We reiterate that testing the model on data collected along with the training data but kept separate from the training process is not external validation. True external validation is when the model is tested on data collected at a site (or setting) completely different from the site (or setting) at which data is collected and used for the development of the model. External validation should have a different setting, population, geographical location, or time and should depend on the context in which the model is meant to be used. Ideally, external validation would be conducted with data from more than one setting.

For models that predict a future outcome or event, the ideal study design for validation of AI models would be a cohort study. A cross-sectional study will be appropriate if the model is intended to diagnose a disease. We suggest that interventional study, like a randomized controlled trial should be considered only when clinical effectiveness of a model is being evaluated. Validation on retrospective data is simpler and quicker but needs to be done keeping in mind that investigators may not have had control over how the data was collected; these validations would have a high risk of bias. The aim of prediction model development studies is to estimate the coefficients robustly, but in external validation studies the focus shifts to estimation of model performance. Since the two steps focus on two different but important aspects, validation studies should not be an afterthought. Critical to good validation study design is sample size estimation. We recommend investigators to follow recently developed guidelines and best practices for sample size estimation for prediction model validation.¹¹^,¹²

While discrimination is the ability of the model to distinguish between the groups, calibration, on the other hand, measures the alignment between predicted probabilities and the actual frequency of events, ensuring that the model’s confidence scores are reliable. The model performance should be measured in terms of discrimination, calibration, and clinical utility on external validation data sets. Discrimination should be measured area under the curve of receiver operating curve (AUROC) for classification models, Calibration slope and intercept should be reported to assess model bias. Finally, to assess clinical utility, a decision curve analysis is recommended.¹³

Theme 4 - Points to consider when publishing AI models

This theme identifies key considerations while writing a manuscript for peer review and publication. A recent systematic review of the diagnostic accuracy of deep learning and medical imaging found that there is high heterogeneity between studies. This is due to varying methods, terminology, and outcome measures, indicating a need for developing reporting and methodological guidelines for significant issues in this field. Reporting guidelines are not new in the field of medical and epidemiological research (EQUATOR network) The domain of AI in medical imaging is multidisciplinary and mandates that reporting should be in a way that is easily comprehensible by the end users of these models. While a broad consensus towards such reporting is evolving, there are existing publishing guidelines and checklists such as the CLAIM¹⁴ and TRIPOD.¹⁵ The publication of the model is an important intermediary step between the development and validation of the model. Authors should consider sharing all essential information required for reproducibility and transparent implementation of the model by other researchers. Reproducibility is greatly facilitated by releasing the model and implementation code. We recognize that this is not always possible in healthcare AI, especially for stakeholders in industry and where data governance restrictions forbid release of data used to build a model. In such situations, wherever possible, release of a web-based implementation of a model, which would keep intellectual property secure would be important. We also recognize that grand challenges, open competitions with data and code, play an important role in model benchmarking. Although they typically capture only real-world scenarios in a limited way they can encourage “data engineering gaming” to incrementally beat prior work. Overall, researchers need to take the responsibility to report their work in a scientifically rigorous way with clear descriptions of the data set, eligibility criteria, the context in which the model is to be used, appropriate metrics, and an implementation plan to facilitate the next steps in the translation pipeline. Peer review needs to become more rigorous to ensure that AI model reporting standards are elevated.

Theme 5 - Translation towards products

The final step in translating any AI model is to produce a prototype that can be used by healthcare workers. Academic publication plays a crucial role in technology translation and how authors present the information is of paramount importance. Authors should maintain a balance between sharing information and retaining the details that help commercialization of their technology. We emphasize that authors should define the technology performance level clearly and report limitations so that the end-user can utilize it effectively. A major obstacle to adoption of AI is because benefits of AI are not clear to clinicians.¹⁶ It would be advisable that instead of simply handing the stakeholders any new AI-based technology, they should be trained on use of model in real life situations which will also allow them to appreciate the improvement the technology brings to their daily workflow. There is an urgent need for low-cost (or zero extra cost) AI solutions requiring minimal clinical expertise for healthcare workers. In summary we recommend four stages that any AI prototype should undergo before being accepted for clinical use: peer-reviewed publication of the model, external validation, regulatory approval, and recommendation by professional societies. A clear plan of the deployment scenario while developing the product will facilitate translation.

Theme 6 – Challenges and potential solutions from technical perspective

To date, the black-box nature of current AI models has led to slow adoption of this technology in medicine where the cost of a mistake is high. Explainable artificial intelligence (also called interpretable AI) has been suggested to generate trust among the health-care professionals, to bring transparency into AI decision-making, and mitigate biases. Currently, there are several methods to test the explainability of AI models. A heat map (also known as a saliency map) is a popular method that uses activations of convolutional layers to demonstrate the extent to which each region of the image contributed to the model’s decision. They are illustrative and are easy to understand but do not always correspond to human intuition for important in decision-making. Class activation maps (CAM)¹⁷ and its extension Grad-CAM¹⁸ are among the popular methods that are used to generate these explanations. Besides heat maps, locally interpretable model-agnostic explanations (LIME)¹⁹ and Shapley values (SHAP)²⁰ seek to understand decisions at the individual level by altering the input example and identifying the alterations that contributed to the decision. In the case of image analysis, this is done by occluding parts of the image. While the above approaches aid in understanding the clinical perspective, other approaches such as feature visualization are used by machine learning engineers. Feature visualization involves producing synthetic inputs that activate specific parts of a machine learning model strongly. Each model decision can then be described as a combination of a series of features that were detected in the input. Nevertheless, all these methods are just approximations and do not include explanations in terms of medical findings. This in turn makes the interpretation of AI models rather subjective. This area of research is still in its nascent stage and more work is needed to develop truly explainable AI.²¹ In the absence of suitable explainability methods, we advocate for rigorous internal and external validation of AI models as a more direct means of achieving the goals.

The ability of the model to give accurate predictions on an external data set collected separately from the training data set is called generalizability. Despite a large amount of published work on AI applications in medicine, only a relatively small number of these models are implemented in clinical practice primarily due to the lack of generalizability of models, though this number is also growing.²²^,²³ Site to site customization in image acquisition, different standards of implementation in clinical practice, differences in patient demographics across centers, genotypic and phenotypic characteristics of patients, and tools and methods used to process and develop medical data are just some of the factors that may affect generalizability. Another significant factor that affects generalizability is training the model on biased data. Some of these factors can be adjusted using recent methodological advances such as domain adaptation techniques. In domain adaptation, the goal is to adapt models to target data sets based on labeled samples from the source environment as well as a limited set of unlabeled samples from the target environment.

Probably the best solution is to collect training data from different centers and incorporate all the real-world variations in it. However, collecting a huge medical data set from multiple sites is time-consuming, expensive, and poses data sharing challenges. Federated learning,²⁴ an emerging methodological advancement, may aid researchers in overcoming the data sharing challenge. This framework helps researchers to train AI models while retaining data at individual sites. To summarize, different models of the same architecture are trained separately at each site using data from the site, and then these partial models are combined to create a global model.

Conclusion

In this article, we present the discussions held during the three workshops as useful take home messages. The shared insights offer a foundational guide for researchers aiming to embark on their journey in this rapidly advancing and transformative field. In summary, to ensure successful integration of AI models in clinical practice, researchers must engage with all stakeholders, including clinicians, regulators, and patients, from the outset and use robust study designs for validation ‘The crucial steps are validating AI models in external data sets using adequate sample sizes for estimation of robust performance metrics. While publishing these models, transparency, reproducibility, and following reporting guidelines are strongly emphasized. Clinician training on AI technologies is vital for their understanding of the benefits these models bring in their clinical workflows for effective adoption. From a technical standpoint, explainability and generalizability are major challenges. Overcoming variations in data collection and ensuring models’ applicability across different settings are essential for generalizability.

Contributions

BKD, RT, NW, AK, AP, AN & SB have designed and coceptualised the scientific content of the workshops. BKD, RT, AN & SB have written the viewpoint.

Ethics and consent

Ethical approval and consent were not required.

Data availability

No data are associated with this article.

Acknowledgments

We thank all speakers and participants of our workshops for their valuable inputs.

References

1. Haug CJ, Drazen JM: Artificial Intelligence and Machine Learning in Clinical Medicine, 2023. N. Engl. J. Med. 2023; 388(13): 1201–1208. Publisher Full Text
2. Rajpurkar P, Lungren MP: The Current and Future State of AI Interpretation of Medical Images. N. Engl. J. Med. 2023; 388(21): 1981–1990. PubMed Abstract | Publisher Full Text
3. Chen H, Wu L, Dou Q, et al.: Ultrasound Standard Plane Detection Using a Composite Neural Network Framework. IEEE Trans Cybern. 2017; 47(6): 1576–1586. PubMed Abstract | Publisher Full Text
4. Sharma H, Droste R, Chatelain P, et al.: Spatio-Temporal Partitioning and Description of Full-Length Routine Fetal Anomaly Ultrasound Scans. Proc IEEE Int Symp Biomed Imaging. 2019; 16: 987–990. PubMed Abstract | Publisher Full Text
5. Qin ZZ, Ahmed S, Sarker MS, et al.: Tuberculosis detection from chest x-rays for triaging in a high tuberculosis-burden setting: an evaluation of five artificial intelligence algorithms. Lancet Digit Health. 2021; 3(9): e543–e554. PubMed Abstract | Publisher Full Text
6. Abramoff MD, Folk JC, Han DP, et al.: Automated analysis of retinal images for detection of referable diabetic retinopathy. JAMA Ophthalmol. 2013; 131(3): 351–357. PubMed Abstract | Publisher Full Text
7. Kim JY, Lee YS, Yu J, et al.: Deep Learning-Based Prediction Model for Breast Cancer Recurrence Using Adjuvant Breast Cancer Cohort in Tertiary Cancer Center Registry. Front. Oncol. 2021; 11: 596364. PubMed Abstract | Publisher Full Text | Free Full Text
8. Kim DW, Jang HY, Kim KW, et al.: Design Characteristics of Studies Reporting the Performance of Artificial Intelligence Algorithms for Diagnostic Analysis of Medical Images: Results from Recently Published Papers. Korean J. Radiol. 2019; 20(3): 405–410. PubMed Abstract | Publisher Full Text | Free Full Text
9. Nagendran M, Chen Y, Lovejoy CA, et al.: Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies. BMJ. 2020; 368: m689. Publisher Full Text
10. O’Shea RJ, Sharkey AR, Cook GJR, et al.: Systematic review of research design and reporting of imaging studies applying convolutional neural networks for radiological cancer diagnosis. Eur. Radiol. 2021; 31(10): 7969–7983. PubMed Abstract | Publisher Full Text | Free Full Text
11. Snell KIE, Archer L, Ensor J, et al.: External validation of clinical prediction models: simulation-based sample size calculations were more reliable than rules-of-thumb. J. Clin. Epidemiol. 2021; 135: 79–89. PubMed Abstract | Publisher Full Text | Free Full Text
12. Collins GS, Ogundimu EO, Altman DG: Sample size considerations for the external validation of a multivariable prognostic model: a resampling study. Stat. Med. 2016; 35(2): 214–226. PubMed Abstract | Publisher Full Text | Free Full Text
13. Steyerberg EW, Vergouwe Y: Towards better clinical prediction models: seven steps for development and an ABCD for validation. Eur. Heart J. 2014; 35(29): 1925–1931. PubMed Abstract | Publisher Full Text | Free Full Text
14. Mongan J, Moy L, Kahn CE Jr: Checklist for Artificial Intelligence in Medical Imaging (CLAIM): A Guide for Authors and Reviewers. Radiol Artif Intell. 2020; 2(2): e200029. Publisher Full Text
15. Correction: transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Ann. Intern. Med. 2015; 162(8): 600. PubMed Abstract | Publisher Full Text
16. Khairat S, Marc D, Crosby W, et al.: Reasons For Physicians Not Adopting Clinical Decision Support Systems: Critical Analysis. JMIR Med. Inform. 2018; 6(2): e24. PubMed Abstract | Publisher Full Text | Free Full Text
17. Zhou B, Khosla A, Lapedriza A, et al.: Learning Deep Features for Discriminative Localization.2015 December 01, 2015. [arXiv:1512.04150 p.]. Reference Source
18. Selvaraju RR, Cogswell M, Das A, et al.: Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization.2016 October 01, 2016. [arXiv:1610.02391 p.]. Reference Source
19. Tulio Ribeiro M, Singh S, Guestrin C: "Why Should I Trust You?": Explaining the Predictions of Any Classifier.2016 February 01, 2016. [arXiv:1602.04938 p.]. Reference Source
20. Rozemberczki B, Watson L, Bayer P, et al.: The Shapley Value in Machine Learning.2022 February 01, 2022. [arXiv:2202.05594 p.]. Reference Source
21. Sadeghi Z, Alizadehsani R, Akif Cifci M, et al.: A Brief Review of Explainable Artificial Intelligence in Healthcare.2023 April 01, 2023. [arXiv:2304.01543 p.]. Reference Source
22. Yang J, Soltan AAS, Clifton DA: Machine learning generalizability across healthcare settings: insights from multi-site COVID-19 screening. NPJ Digit Med. 2022; 5(1): 69. PubMed Abstract | Publisher Full Text | Free Full Text
23. Eche T, Schwartz LH, Mokrane FZ, et al.: Toward Generalizability in the Deployment of Artificial Intelligence in Radiology: Role of Computation Stress Testing to Overcome Underspecification. Radiol Artif Intell. 2021; 3(6): e210097. Publisher Full Text
24. Rieke N, Hancox J, Li W, et al.: The future of digital health with federated learning. NPJ Digit Med. 2020; 3: 119. PubMed Abstract | Publisher Full Text | Free Full Text

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 23 Aug 2024

Author details Author details

¹ MCH, Translational health science and technology institute, New Delhi, Faridabad, 121001, India
² Biochemistry, Pondicherry institute of medical sciences, Pondicherry, Pondicherry, 605014, India
³ The Ultrasound Lab, New Delhi, New delhi, India
⁴ Nuffield Department of Women’s and Reproductive Health, University of Oxford, Oxford, UK
⁵ Institute of Biomedical Engineering, University of Oxford, Oxford, UK

Bapu Koundinya Desiraju
Roles: Conceptualization, Funding Acquisition, Writing – Original Draft Preparation

Ramachandran Thiruvengadam
Roles: Conceptualization, Writing – Original Draft Preparation

Nitya Wadhwa
Roles: Conceptualization, Supervision, Writing – Review & Editing

Ashok Khurana
Roles: Supervision

Aris T Papageorghiou
Roles: Supervision

J. Alison Noble
Roles: Conceptualization, Funding Acquisition, Supervision, Writing – Review & Editing

Shinjini Bhatnagar
Roles: Conceptualization, Funding Acquisition, Supervision, Writing – Review & Editing

Competing interests

Alison Noble consults for Intelligent Ultrasound Ltd.

Grant information

AN was funded by the UK Global Challenges Research Fund in partnership with the Engineering and Physical Sciences Research Council (EP/RO13853/1). SB is funded by Department of Biotechnology, Ministry of Science and Technology, Govt. of India (BT/PR9983/MED/97/194/2013).BKD’s fellowship was supported by DBT Wellcome Trust India Alliance (IA/CPHE/18/1/503947)
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Article Versions (1)

version 1

Published: 23 Aug 2024, 13:954

https://doi.org/10.12688/f1000research.152426.1

Copyright

© 2024 Desiraju BK et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

0

SEE MORE DETAILS

CITE

how to cite this article

Desiraju BK, Thiruvengadam R, Wadhwa N et al. Perspectives and guidance for developing artificial intelligence-based applications for healthcare using medical images [version 1; peer review: 2 not approved]. F1000Research 2024, 13:954 (https://doi.org/10.12688/f1000research.152426.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Version 1

VERSION 1

PUBLISHED 23 Aug 2024

Views

8

Reviewer Report 31 Dec 2024

Michał Strzelecki, Lodz University of Technology, Łódź, Poland

Not Approved

https://doi.org/10.5256/f1000research.167182.r339980

The reviewed paper fits into the works discussing the direction of development of artificial intelligence in medicine, where its presence is becoming increasingly noticeable. The work was created as a summary of the discussion held during several workshops organized by ... Continue reading

The reviewed paper fits into the works discussing the direction of development of artificial intelligence in medicine, where its presence is becoming increasingly noticeable. The work was created as a summary of the discussion held during several workshops organized by the authors of this publication with representatives of various environments potentially interested in implementing AI in clinical practice. It should be noted here that the paths for validating such models are already well-developed and used in practice. See e.g. the Federal Drug Administration, which has already approved over 1,000 AI algorithms to support diagnostics in various medical specialties. The principles for publishing scientific papers, which are generally correct, are also presented. Please remember, however, that companies that develop AI for medical applications are interested in making money, so they will not share their knowledge or describe the details of the solutions developed.
It is a pity that the work did not address important issues, such as the need to educate patients and doctors about the implemented AI solutions used or the legal and ethical problems related to the need for access to multimodal and multi-center medical data.
To sum up, the paper lacks a fresh and inspiring perspective on the implementation of AI in medicine. Therefore, the scientific and didactic value of this work is very limited.

Is the topic of the opinion article discussed accurately in the context of the current literature?

No
Are all factual statements correct and adequately supported by citations?

Partly
Are arguments sufficiently supported by evidence from the published literature?

No
Are the conclusions drawn balanced and justified on the basis of the presented arguments?

Partly

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: medical imaging, image analysis, AI applications

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

CITE

Report a concern

Respond or Comment

Views

21

Reviewer Report 10 Sep 2024

Piotr Szczypinski, Lodz University of Technology, Łódź, Poland

Not Approved

https://doi.org/10.5256/f1000research.167182.r320762

The language of the article is confusing, unclear and imprecise. The text uses buzzwords without explaining their meanings. Lack of in-depth presentation of the problems. Lack of discussion and critical approach to the presented problems. The text of the article ... Continue reading

The language of the article is confusing, unclear and imprecise. The text uses buzzwords without explaining their meanings. Lack of in-depth presentation of the problems. Lack of discussion and critical approach to the presented problems. The text of the article seems to treat the topic superficially without a deeper analysis of the problem.
The text is not didactic in nature because it does not clearly explain the presented problems and their causes. The text is also not scientific in nature because it does not present substantive argumentation, problem analysis, discussion and critical evaluation. The text presents the opinions of the authors or a larger group of people participating in three unspecified workshops, but its form and content are more like a blog or organized notes from a meeting, rather than an article meeting the requirements of a scientific report. I therefore estimate that the group of recipients of this text will be relatively narrow.
According to the authors, the text is to be a guide to the transformation of medical services using AI. However, in my opinion it does not meet this condition either. The presented opinions and recommendations do not have a deeper justification and explanation. It is difficult to agree with some of the opinions, they are formulated in a general way and in such a context they become untrue. They may be true in relation to specific cases, but this is not clear from the text. The authors' recommendations are therefore questionable because they are not preceded by a substantive analysis of the problem, discussion and critical assessment.
The text does not present a comparison of different artificial intelligence technologies. Expert systems, feature engineering methods or deep machine learning have completely different meanings and applications in medicine. These technologies are so different that they cannot be lumped together. Their use in medical applications gives rise to completely different problems and doubts.
Therefore, the text in its current form is not ready for indexing as a full-fledged scientific paper. By presenting opinions unsupported by justification, it has the character of a blog or maybe a popular science report. It should be noted, however, that the text lists and tries to organize issues related to the application of AI in medicine. As such, it can be an introduction to a larger study in this topic. I hope that the authors will prepare such a reliable study.

Is the topic of the opinion article discussed accurately in the context of the current literature?

Partly
Are all factual statements correct and adequately supported by citations?

No
Are arguments sufficiently supported by evidence from the published literature?

No
Are the conclusions drawn balanced and justified on the basis of the presented arguments?

Partly

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Image and signal processing, medical and agrophysics applications, algorithms and software development

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

CITE

Report a concern

Respond or Comment

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 23 Aug 2024

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2
Version 1 23 Aug 24	read	read

Piotr Szczypinski, Lodz University of Technology, Łódź, Poland
Michał Strzelecki, Lodz University of Technology, Łódź, Poland

Comments on this article

All Comments(0)

Add a comment

Sign up for content alerts

Browse by related subjects

Back to all reports

Reviewer Report

8 Views

31 Dec 2024 | for Version 1

Michał Strzelecki, Lodz University of Technology, Łódź, Poland

8 Views Cite this report Responses(0)

Not Approved

The reviewed paper fits into the works discussing the direction of development of artificial intelligence in medicine, where its presence is becoming increasingly noticeable. The work was created as a summary of the discussion held during several workshops organized by the authors of this publication with representatives of various environments potentially interested in implementing AI in clinical practice. It should be noted here that the paths for validating such models are already well-developed and used in practice. See e.g. the Federal Drug Administration, which has already approved over 1,000 AI algorithms to support diagnostics in various medical specialties. The principles for publishing scientific papers, which are generally correct, are also presented. Please remember, however, that companies that develop AI for medical applications are interested in making money, so they will not share their knowledge or describe the details of the solutions developed.
It is a pity that the work did not address important issues, such as the need to educate patients and doctors about the implemented AI solutions used or the legal and ethical problems related to the need for access to multimodal and multi-center medical data.
To sum up, the paper lacks a fresh and inspiring perspective on the implementation of AI in medicine. Therefore, the scientific and didactic value of this work is very limited.

Is the topic of the opinion article discussed accurately in the context of the current literature?

No
Are all factual statements correct and adequately supported by citations?

Partly
Are arguments sufficiently supported by evidence from the published literature?

No
Are the conclusions drawn balanced and justified on the basis of the presented arguments?

Partly

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

medical imaging, image analysis, AI applications

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

21 Views

10 Sep 2024 | for Version 1

Piotr Szczypinski, Lodz University of Technology, Łódź, Poland

21 Views Cite this report Responses(0)

Not Approved

The language of the article is confusing, unclear and imprecise. The text uses buzzwords without explaining their meanings. Lack of in-depth presentation of the problems. Lack of discussion and critical approach to the presented problems. The text of the article seems to treat the topic superficially without a deeper analysis of the problem.
The text is not didactic in nature because it does not clearly explain the presented problems and their causes. The text is also not scientific in nature because it does not present substantive argumentation, problem analysis, discussion and critical evaluation. The text presents the opinions of the authors or a larger group of people participating in three unspecified workshops, but its form and content are more like a blog or organized notes from a meeting, rather than an article meeting the requirements of a scientific report. I therefore estimate that the group of recipients of this text will be relatively narrow.
According to the authors, the text is to be a guide to the transformation of medical services using AI. However, in my opinion it does not meet this condition either. The presented opinions and recommendations do not have a deeper justification and explanation. It is difficult to agree with some of the opinions, they are formulated in a general way and in such a context they become untrue. They may be true in relation to specific cases, but this is not clear from the text. The authors' recommendations are therefore questionable because they are not preceded by a substantive analysis of the problem, discussion and critical assessment.
The text does not present a comparison of different artificial intelligence technologies. Expert systems, feature engineering methods or deep machine learning have completely different meanings and applications in medicine. These technologies are so different that they cannot be lumped together. Their use in medical applications gives rise to completely different problems and doubts.
Therefore, the text in its current form is not ready for indexing as a full-fledged scientific paper. By presenting opinions unsupported by justification, it has the character of a blog or maybe a popular science report. It should be noted, however, that the text lists and tries to organize issues related to the application of AI in medicine. As such, it can be an introduction to a larger study in this topic. I hope that the authors will prepare such a reliable study.

Is the topic of the opinion article discussed accurately in the context of the current literature?

Partly
Are all factual statements correct and adequately supported by citations?

No
Are arguments sufficiently supported by evidence from the published literature?

No
Are the conclusions drawn balanced and justified on the basis of the presented arguments?

Partly

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Image and signal processing, medical and agrophysics applications, algorithms and software development

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

Respond to this report

Responses (0)

[1] 1. Haug CJ, Drazen JM: Artificial Intelligence and Machine Learning in Clinical Medicine, 2023. N. Engl. J. Med. 2023; 388(13): 1201–1208. Publisher Full Text

[2] 2. Rajpurkar P, Lungren MP: The Current and Future State of AI Interpretation of Medical Images. N. Engl. J. Med. 2023; 388(21): 1981–1990. PubMed Abstract | Publisher Full Text

[3] 3. Chen H, Wu L, Dou Q, et al.: Ultrasound Standard Plane Detection Using a Composite Neural Network Framework. IEEE Trans Cybern. 2017; 47(6): 1576–1586. PubMed Abstract | Publisher Full Text

[4] 4. Sharma H, Droste R, Chatelain P, et al.: Spatio-Temporal Partitioning and Description of Full-Length Routine Fetal Anomaly Ultrasound Scans. Proc IEEE Int Symp Biomed Imaging. 2019; 16: 987–990. PubMed Abstract | Publisher Full Text

[5] 5. Qin ZZ, Ahmed S, Sarker MS, et al.: Tuberculosis detection from chest x-rays for triaging in a high tuberculosis-burden setting: an evaluation of five artificial intelligence algorithms. Lancet Digit Health. 2021; 3(9): e543–e554. PubMed Abstract | Publisher Full Text

[6] 6. Abramoff MD, Folk JC, Han DP, et al.: Automated analysis of retinal images for detection of referable diabetic retinopathy. JAMA Ophthalmol. 2013; 131(3): 351–357. PubMed Abstract | Publisher Full Text

[7] 7. Kim JY, Lee YS, Yu J, et al.: Deep Learning-Based Prediction Model for Breast Cancer Recurrence Using Adjuvant Breast Cancer Cohort in Tertiary Cancer Center Registry. Front. Oncol. 2021; 11: 596364. PubMed Abstract | Publisher Full Text | Free Full Text

[8] 8. Kim DW, Jang HY, Kim KW, et al.: Design Characteristics of Studies Reporting the Performance of Artificial Intelligence Algorithms for Diagnostic Analysis of Medical Images: Results from Recently Published Papers. Korean J. Radiol. 2019; 20(3): 405–410. PubMed Abstract | Publisher Full Text | Free Full Text

[9] 9. Nagendran M, Chen Y, Lovejoy CA, et al.: Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies. BMJ. 2020; 368: m689. Publisher Full Text

[10] 10. O’Shea RJ, Sharkey AR, Cook GJR, et al.: Systematic review of research design and reporting of imaging studies applying convolutional neural networks for radiological cancer diagnosis. Eur. Radiol. 2021; 31(10): 7969–7983. PubMed Abstract | Publisher Full Text | Free Full Text

[11] 11. Snell KIE, Archer L, Ensor J, et al.: External validation of clinical prediction models: simulation-based sample size calculations were more reliable than rules-of-thumb. J. Clin. Epidemiol. 2021; 135: 79–89. PubMed Abstract | Publisher Full Text | Free Full Text

[12] 12. Collins GS, Ogundimu EO, Altman DG: Sample size considerations for the external validation of a multivariable prognostic model: a resampling study. Stat. Med. 2016; 35(2): 214–226. PubMed Abstract | Publisher Full Text | Free Full Text

[13] 13. Steyerberg EW, Vergouwe Y: Towards better clinical prediction models: seven steps for development and an ABCD for validation. Eur. Heart J. 2014; 35(29): 1925–1931. PubMed Abstract | Publisher Full Text | Free Full Text

[14] 14. Mongan J, Moy L, Kahn CE Jr: Checklist for Artificial Intelligence in Medical Imaging (CLAIM): A Guide for Authors and Reviewers. Radiol Artif Intell. 2020; 2(2): e200029. Publisher Full Text

[15] 15. Correction: transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Ann. Intern. Med. 2015; 162(8): 600. PubMed Abstract | Publisher Full Text

[16] 16. Khairat S, Marc D, Crosby W, et al.: Reasons For Physicians Not Adopting Clinical Decision Support Systems: Critical Analysis. JMIR Med. Inform. 2018; 6(2): e24. PubMed Abstract | Publisher Full Text | Free Full Text

[17] 17. Zhou B, Khosla A, Lapedriza A, et al.: Learning Deep Features for Discriminative Localization.2015 December 01, 2015. [arXiv:1512.04150 p.]. Reference Source

[18] 18. Selvaraju RR, Cogswell M, Das A, et al.: Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization.2016 October 01, 2016. [arXiv:1610.02391 p.]. Reference Source

[19] 19. Tulio Ribeiro M, Singh S, Guestrin C: "Why Should I Trust You?": Explaining the Predictions of Any Classifier.2016 February 01, 2016. [arXiv:1602.04938 p.]. Reference Source

[20] 20. Rozemberczki B, Watson L, Bayer P, et al.: The Shapley Value in Machine Learning.2022 February 01, 2022. [arXiv:2202.05594 p.]. Reference Source

[21] 21. Sadeghi Z, Alizadehsani R, Akif Cifci M, et al.: A Brief Review of Explainable Artificial Intelligence in Healthcare.2023 April 01, 2023. [arXiv:2304.01543 p.]. Reference Source

[22] 22. Yang J, Soltan AAS, Clifton DA: Machine learning generalizability across healthcare settings: insights from multi-site COVID-19 screening. NPJ Digit Med. 2022; 5(1): 69. PubMed Abstract | Publisher Full Text | Free Full Text

[23] 23. Eche T, Schwartz LH, Mokrane FZ, et al.: Toward Generalizability in the Deployment of Artificial Intelligence in Radiology: Role of Computation Stress Testing to Overcome Underspecification. Radiol Artif Intell. 2021; 3(6): e210097. Publisher Full Text

[24] 24. Rieke N, Hancox J, Li W, et al.: The future of digital health with federated learning. NPJ Digit Med. 2020; 3: 119. PubMed Abstract | Publisher Full Text | Free Full Text

Perspectives and guidance for developing artificial intelligence-based applications for healthcare using medical images

Abstract

Keywords

Introduction

Figure 1. Different steps in the translational pipeline discussed in this viewpoint.

Summary of the discussions

Theme 1- The gaps AI can fill in healthcare

Theme 2 - Development of AI models for healthcare: practical and important things to consider

Theme 3 - Good practices for validation of the AI models for healthcare: study designs and sample size calculation

Theme 4 - Points to consider when publishing AI models

Theme 5 - Translation towards products

Theme 6 – Challenges and potential solutions from technical perspective

Conclusion

Contributions

Ethics and consent

Data availability

Acknowledgments

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated