ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Study Protocol

Tracking the Evolving Role of Artificial Intelligence in Implementation Science: Protocol for a Living Scoping Review of Applications, Evaluation Approaches and Outcomes

[version 1; peer review: awaiting peer review]
PUBLISHED 17 Oct 2025
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS AWAITING PEER REVIEW

Abstract

Background

Artificial intelligence (AI) offers significant opportunities to improve the field of implementation science by supporting key activities such as evidence synthesis, contextual analysis, and decision-making to promote the adoption and sustainability of evidence-based practices. This living scoping review aims to: (1) map applications of AI in implementation research and practice; (2) identify evaluation approaches, reported outcomes, and potential risks; and (3) synthesize reported research gaps and opportunities for advancing the use of AI in implementation science.

Methods

This scoping review will follow the Joanna Briggs Institute (JBI) methodology and the Cochrane guidance for living systematic reviews. A living scoping review is warranted to keep up with the rapid changes in AI and its growing use in implementation science. We will include empirical studies, systematic reviews, grey literature, and policy documents that describe or evaluate applications of AI to support implementation science across the steps of the Knowledge-to-Action (KTA) Model. AI methods and models of interest include machine learning, deep learning, natural language processing, large language models, and related technologies and approaches. A search strategy will be applied to bibliographic databases (MEDLINE, Embase, CINAHL, PsycINFO, IEEE Xplore, Web of Science), relevant journals, conference proceedings, and preprint servers. Two reviewers will independently screen studies and extract data on AI characteristics, specific implementation task according to the KTA Model, evaluation methods, outcome domains, risks, and research gaps. Extracted data will be analyzed descriptively and synthesized narratively using a mapping approach aligned with the KTA Model.

Discussion

This living review will consolidate the evidence base on how AI is applied across the spectrum of implementation science. It will inform researchers, policymakers, and practitioners seeking to harness AI to improve the adoption, scale-up, and sustainability of evidence-based interventions, while identifying areas for methodological advancement and risk mitigation.

Review registration

Open Science Framework, May 2025: https://doi.org/10.17605/OSF.IO/2Q5DV

Keywords

machine learning; deep learning; natural language processing; large language models; generative AI; ChatGPT; sentiment analysis; implementation research; decision support; health systems

Introduction

Health systems grapple with the overuse of harmful, wasteful, or ineffective interventions, commonly referred to as “low-value care,” while underutilizing evidence-based interventions, leading to gaps in the delivery of “high-value care.”1,2 Implementation science, defined as the study of methods and strategies that facilitate the integration of evidence-based interventions, programs, and policies into health systems,3 holds considerable promise in addressing these challenges across a range of clinical contexts.410 It seeks to understand how, why, and under what conditions implementation succeeds or fails across varying contexts, and how best to support healthcare provider behaviour and system change.11,12 Implementation research refers to the rigorous investigation of these methods and strategies, while implementation practice concerns their application by practitioners, health system leaders, and policymakers in real-world settings.13 However, implementation efforts to adopt and sustain evidence-based interventions are constrained by the time- and resource-intensive processes required to identify, synthesize, and apply implementation evidence, including context-specific barriers, facilitators, and strategies.14,15 Additional challenges, including the heterogeneity of implementation data, variability in outcomes, and the underrepresentation of key populations, hinder timely, equitable, and context-responsive implementation.16 These limitations also undermine sustainability.17

In recent years, the integration of artificial intelligence (AI) in science and practice has rapidly advanced across sectors.18 AI has long been implemented in healthcare across diagnostics, treatment, population health management, patient care, and healthcare professional training and decision support, utilizing a wide range of AI innovations.19,20 It is helpful to distinguish between the major categories of AI and the methods that power them. Machine learning (ML) is a foundational approach in which systems learn patterns from data, with deep learning (DL) being a specialized subset that uses multi-layered neural networks for complex pattern recognition. Natural language processing (NLP) is a field of AI focused on enabling machines to understand and generate human language, often powered by DL-based models such as large language models (LLMs).19,20 LLMs are also an example of generative AI, which encompasses models capable of creating new content, such as text or images.2123

Most healthcare-related systems use ML, DL and generative AI methods.19 ML and DL models have improved diagnostic accuracy by analyzing medical images and large datasets, reducing human error in disease detection24,25; for example, in breast cancer screening, they can lower both false positives and false negatives.26,27 ML-based methods are advancing personalized medicine, particularly in oncology, where it aids in genomic analysis to predict drug responses and disease predispositions.2830 ML-driven predictive analytics support population health management by identifying at-risk individuals and enabling early interventions, thereby reducing hospital readmissions and healthcare costs.31 Virtual assistants powered by NLP are automating routine tasks, providing continuous support, and even offering mental health support through web-based cognitive-behavioral therapy.32,33

The recent momentum in AI-driven healthcare is largely propelled by advancements in generative AI in the form of LLMs, such as Open AI’s GPT,21 Meta AI’s LLaMA22 and Google DeepMind’s Gemma,23 which introduce new possibilities in medical documentation, patient risk assessment, and clinical decision-making.34 LLMs are AI models trained on vast amounts of textual data to process, understand, and generate human-like language.2123 These models are based on deep learning architectures, such as transformers, which allow them to analyze and predict text patterns effectively.34 Their applications are wide-ranging; for example, it is claimed that they can assist healthcare professionals by summarizing clinical encounters, automating medical notetaking, and generating real-time responses to complex medical queries, providing support that can increase efficiency and allow healthcare professionals to focus more on patient care.34 They may also contribute to the education and training of health professionals and patients.35,36 LLMs like ChatGPT may benefit medical education by supporting differential diagnosis brainstorming and providing interactive clinical cases for practice.34 LLMs have also shown effectiveness in patient education by delivering accurate answers to questions, enriching and tailoring existing educational resources, and simplifying complex medical language into more accessible terms.35,36

Given this momentum, interest in harnessing AI for implementation research and practice is rapidly growing.37 AI offers new opportunities to improve the speed and efficiency of all steps of the KTA Model,38 from conducting the synthesis of implementation evidence to planning for sustainability and scale-up (see Figure 1). Recent advances in AI-driven evidence synthesis and decision-making support for human behavior change and implementation science39,40 are exemplified by the Human Behaviour-Change Project (HBCP), which employs AI and ML to extract, synthesize, interpret, and predict findings from behavior change interventions, thereby guiding practitioners, policymakers, and researchers on what works, for whom, under which conditions.39,41,42 AI has also been leveraged to explore contextual factors influencing clinician adherence to guidelines.43 Additionally, NLP has been applied to qualitative data analysis, identifying codes and major themes.42,44,45 Overall, AI can help address critical challenges in implementation science by enabling rapid evidence synthesis, enhancing data analysis, supporting complex decision-making, and improving the translation of evidence into practice.

6a68b9a7-8e87-4ae0-944c-ea35cb044d37_figure1.gif

Figure 1. Knowledge to action model, adapted from Graham et al.38

Despite its potential, AI remains susceptible to a range of ethical, clinical, technical, and environmental risks. Ethically, algorithmic bias can perpetuate or exacerbate health disparities when training datasets underrepresent certain populations, while data privacy concerns regarding the storage and sharing of sensitive patient data also poses ethical challenges.46,47 AI’s “black box” nature also complicates interpretability and accountability, posing challenges for both clinicians and regulators.48 Clinically, overdiagnosis and overtreatment may occur when AI systems identify ambiguous findings or produce erroneous recommendations.46,47 Technically, LLMs and other DL algorithms can generate “hallucinations”.34 While all outputs generated by these algorithms are synthetic and probabilistic constructions, “hallucinations” refer to outputs that are convincing but factually inaccurate, which may mislead healthcare providers.34 Finally, from an environmental perspective, energy-intensive AI training and deployment contribute to a sizeable carbon footprint.49 Addressing these intersecting risks will be essential to harness AI’s benefits while mitigating potential harms in the context of implementation research and practice.

This living scoping review aims to provide a comprehensive mapping of the applications of AI relevant to implementation research and practice. The findings will offer a foundation for guidance on harnessing AI to accelerate the adoption of evidence-based practices in healthcare.

Objectives

The primary objective of this living scoping review is to systematically map and characterize how AI is used to support implementation research and practice. Specifically, we will:

  • 1. Map applications of AI across implementation research and practice activities, including their features, characteristics and implementation contexts, using the KTA Model as an organizing framework.

  • 2. Identify the evaluation approaches, outcomes, and risks reported in AI-enabled implementation research and practice, with attention to technical performance, equity considerations, and unintended consequences.

  • 3. Synthesize evidence gaps and future directions for advancing the responsible and equitable use of AI in implementation science.

A secondary objective is to maintain an up-to-date evidence base using a living scoping review approach, enabling continuous integration of new findings in this fast-evolving field.

Methods

Scoping review design

The proposed review will be conducted following the Joanna Briggs Institute (JBI) methodology for scoping reviews,50 and Cochrane’s guidance for living systematic reviews.51 This topic lends itself to a living scoping review approach for several reasons. First, AI technologies and methods are evolving at an unprecedented pace, leading to rapid shifts in the evidence base. Second, implementation science is inherently dynamic and context-specific, necessitating regular updates to capture emerging data, methods, and applications. Third, bridging AI and implementation science is still an emerging area, and a living review ensures that newly published insights are promptly synthesized and integrated. Finally, maintaining an up-to-date map of AI’s potential contributions to implementation science can guide researchers, policymakers, and practitioners as they refine methodologies, prioritize resource allocation, and incorporate novel AI tools into practice.

This protocol addresses the first three steps of JBI’s nine-step approach: first, defining the review objectives and questions; second, developing and aligning the inclusion criteria with these objectives and questions; and third, specifying the methods for evidence searching, selection, data extraction, and presentation. The next four steps will involve evidence searching, selection, extraction, and analysis. The eighth step focuses on presenting the results, while the ninth and final step involves summarizing the evidence, drawing conclusions, and discussing the implications of the findings.50 The reporting of the scoping review will follow the Preferred Reporting Items for Systematic Reviews and Meta-Analyses for Scoping Reviews (PRISMA-ScR)52 and its extension for living systematic reviews (PRISMA-LSR).53

Review team

Our interdisciplinary team brings extensive, internationally recognized expertise in implementation science, AI, behavioral science, and evidence synthesis. Team members have led or contributed to major advancements in the use of AI in implementation science. For example, Susan Michie and Janna Hastings have been central to the Human Behaviour-Change Project, which pioneered the integration of machine learning and ontology-informed modeling to synthesize and interpret behavior change intervention evidence.41 James Thomas at the EPPI Centre has developed AI tools and living evidence platforms for automating literature screening and synthesis, contributing extensively to methodological innovation in systematic review automation.54 Our team also includes researchers with expertise in applying AI to automate the extraction of implementation-relevant data such as barriers, facilitators, and strategies from qualitative and quantitative sources (e.g., Chan, Taylor).40 Several team members (e.g., Abbasgholizadeh-Rahimi, Légaré, Graham, Welch, Presseau, Straus) are leading experts in informatics, data science, and implementation evaluation, and bring deep experience in using implementation science frameworks (e.g., CFIR, KTA, RE-AIM, NASSS) in both high-income and resource-limited settings. Several investigators (e.g., Fontaine, Welch, Straus, Graham, Taylor) have led large-scale knowledge syntheses, scoping reviews, and methodological studies that shape the implementation science evidence base. Others (e.g., Graham, Powell, Michie, Presseau) have co-developed or refined key frameworks and taxonomies for implementation strategies and behavior change techniques that underpin this review’s analytic structure. Together, our team has the methodological, technical, and domain-specific expertise to conduct a comprehensive, high-quality, and policy-relevant review that will support the responsible and equitable integration of AI in implementation research and practice.

Eligibility criteria

The eligibility criteria for this scoping review are designed to align with JBI’s Population, Concept, and Context (PCC).50 Studies and reports will be included if they meet the following eligibility criteria.

Population

We will consider studies involving any individuals or organizations actively engaged in implementation research and practice. This includes researchers, practitioners, administrators, and other stakeholders who focus on any of the KTA key steps including, but not limited to AI-driven synthesis of implementation evidence, identifying and prioritizing the problem, adapting evidence to local contexts, assessing barriers and facilitators, selecting, tailoring and operationalizing implementation strategies, monitoring outcomes and fidelity, evaluating impact on practice and population, and ultimately sustaining and scaling knowledge use. We will include studies only if the focus is on using AI to facilitate these steps, rather than on AI as the intervention itself.

Concept

The concept for inclusion requires that studies or reports specifically address the use of AI to support implementation research or practice. As presented in Table 1, we will consider studies or reports that describe AI-assisted approaches encompassing ML, DL, NLP, LLMs or other technologies across all steps of the KTA Model. We will also include cross-cutting features of AI technologies, such as data analysis or predictive modeling, that may not align precisely with a single KTA step but facilitate implementation activities throughout the cycle.

Table 1. Potential applications of AI to implementation research and practice activities.

Step along the KTA modelDefinition
1. Synthesize Implementation Evidence Using AI to automate the identification, extraction, and synthesis of implementation evidence (e.g., barriers, facilitators, or implementation strategies) from large datasets, including research articles, guidelines, and reports. This might include linked decision support tools.
2. Identify & Prioritize the Problem, Select Evidence-Based Intervention (EBI) Using AI-driven approaches to identify evidence–practice gaps, emerging clinical needs, or high-impact problems by analyzing large volumes of data (e.g., research articles, electronic health records). This may include anomaly detection, trend detection, and predictive modeling to highlight priority issues.
3. Adapt or Tailor EBI Using AI to adapt/tailor evidence-based guidelines, tools, or interventions for local contexts, including automated translations, reading-level adjustments, or context-sensitive content generation.
4. Assess Barriers & Facilitators (Contextual Analysis) Using AI-powered data analysis (e.g., sentiment analysis, topic modeling) to identify challenges, constraints, and enablers within an organization or community. These insights can stem from qualitative sources (interviews, focus groups) and quantitative data.
5. Select, Tailor and Operationalized Implementation Strategies Using AI-based decision support systems or recommendation engines to map identified barriers and facilitators to evidence-based implementation strategies. Deploying AI tools to manage, schedule, or coordinate resources and personnel during rollout.
6. Monitor EBI Implementation Using AI to assess whether an intervention is delivered as intended (fidelity), measure its uptake (reach, acceptability), and collect its real-time performance data. This may include applying behavioural analytics to data sources such as tracking logs or digital footprints.
7. Evaluate Impact of EBI on Practice & Population Using AI to assess intervention impact on clinical, service, or implementation outcomes, or clarify which specific elements of the intervention drive effectiveness, enabling more targeted refinements and better resource allocation.
8. Sustain & Scale EBI Using AI to support the long-term embedding and expansion of successful interventions. This may include predictive models that generalize from existing data to new contexts, as well as equity-focused algorithms that detect disparities in reach or outcomes.

We will exclude studies in which AI is purely employed as the primary clinical or public health intervention (e.g., a clinical decision support system used directly for patient diagnosis, an AI-driven therapy tool) without a clear focus on supporting or studying the implementation process itself. We will exclude studies that do not address steps in the KTA Model, or do not articulate outcomes related to implementation processes (e.g., fidelity, adoption, reach) or implementation-related impact (e.g., changes in practice, sustainability). We will exclude documents that do not present empirical evidence or methodological details about the use of AI in supporting implementation (e.g., purely theoretical or opinion-based articles without data or systematic description of AI application).

Context

AI applications relevant to various domains of implementation science, including but not limited to healthcare settings, health policy implementation, and community-based healthcare interventions, will be included.

Evidence sources

We will include primary research studies, scoping reviews, systematic reviews, case reports, grey literature, conference abstracts, and policy documents that describe or evaluate AI applications in implementation science. Both quantitative and qualitative studies are eligible, as well as mixed-methods studies that examine AI’s impact on implementation processes or outcomes.

Language and date restrictions

Studies and reports will be limited to those available in English and French, and published within the last 15 years, given the recent advancements in AI technologies.

Literature search

Information sources

The bibliographical databases to be searched include CINAHL, Embase, IEEExplore, MEDLINE, PsycINFO and Web of Science. Furthermore, we will hand-search relevant journals and conference proceedings to identify additional records. Examples of journals may include: BMJ Quality & Safety, Implementation Science, Implementation Science Communications, BMC Health Services Research, Implementation research and Practice, and Health Research Policy and Systems. Examples of relevant conferences include EMNLP/ICML. We will screen the reference list of included records to identify additional records. We will also search relevant pre-print servers (e.g., arXiv). Finally, we will identify a limited number of ‘core papers’ and perform a citation search.

Search strategy

Our search strategy has been developed in collaboration with a research librarian (RL) and AI specialists (JC, JH, SAR). It uses a combination of MeSH terms and search terms structured around two core concepts: “artificial intelligence technologies” AND “implementation science activities.” The Medline search strategy is presented in Table 2.

Table 2. Medline search strategy.

#Search terms Results
Medline search strategy (Implementation science keywords)
1exp Artificial Intelligence/ OR exp Natural Language Processing/OR (artificial intelligence or Natural Language Processing or “ai” or “machine learning” or “deep learning” or “neural network” or “large language model” or “generative model*” or LLM* or “transformer model*” or “language model*” or “generative AI” or “foundation model*” or “predictive model*” or “supervised learning” or “unsupervised learning” or “reinforcement learning” or “expert system*” or “pattern recognition” or “text mining” or “literature mining” or “evidence extraction” or “automated review” or “sentiment analysis” or “topic modeling” or “text classification” or “counterfactual analysis” or “scenario analysis” or “bias detection” or “ChatGPT” or “GPT-” or BERT or RoBERTa or Gemma or LLaMA).af.
2exp Translational Research, Biomedical/ or Quality Improvement/ or Health Services Research/ or Learning Health System/ or exp Organizational Innovation/ or exp Models, Theoretical/ or exp Implementation Science/ or exp “Diffusion of Innovation”/or (Organizational Innovation or Translational Research or diffusion of innovation or “implementation science” or “implementation research” or implementation practice* or “quality improvement” or “improvement science” or “learning health system” or “learning healthcare system” or implementation strateg* or implementation process* or “knowledge translation” or “knowledge to action” or “intervention uptake” or “intervention adoption” or intervention outcome* or “program implementation” or “behavior change” or “behaviour change” or “dissemination and implementation” or “practice change” or "real-world implementation” or “translation of evidence” or framework* or model or models or theory or theories or implementation outcome*).af. or ((implementation or intervention) adj30 (fidelity or “scale-up” or acceptability or feasibility or penetration or adoption or appropriateness or implementability or adoptability or sustainability or spread)).ab,ti,hw,kf,sh. or (implementation adj30 (feasibility or automation or sustain* or barrier* or enabler* or facilitator*)).ab,ti,hw,kf,sh.
3“evidence based”.af. or exp Evidence-Based Medicine/
4(application or applying or using or utilization or utilizing or leveraging or leverage or “role of” or “impact of” or integrating or integrate or integrated or improving or enhancing or optimizing or optimize or optimization or facilitating or accelerating or supporting or streamlining or streamlined or automating or automate or predicting or personalizing or personalized).ab,ti. adj10 (implementation or translation or knowledge or diffusion).ti,ab.
51 and 2 and 3 and 4329
6Limit to 2010-present309
Medline search strategy (Implementation science journals)
1exp Artificial Intelligence/ OR exp Natural Language Processing/OR (artificial intelligence or Natural Language Processing or “ai” or “machine learning” or “deep learning” or “neural network” or “large language model” or “generative model*” or LLM* or “transformer model*” or “language model*” or “generative AI” or “foundation model*” or “predictive model*” or “supervised learning” or “unsupervised learning” or “reinforcement learning” or “expert system*” or “pattern recognition” or “text mining” or “literature mining” or “evidence extraction” or “automated review” or “sentiment analysis” or “topic modeling” or “text classification” or “counterfactual analysis” or “scenario analysis” or “bias detection” or “ChatGPT” or “GPT-” or BERT or RoBERTa or Gemma or LLaMA).af.
2("Implementation Science” or “JBI Evidence Implementation” or “Global Implementation Research and Applications” or “Translational Behavioral Medicine” or “Implementation Science Communications” or “Implementation Research and Practice”).jn.
31 AND 2
4limit 3 to yr="2010 -Current"
5After Removing duplicates in Endnote from previous search124 left
Medline search strategy (Other journals)
1exp Artificial Intelligence/ OR exp Natural Language Processing/OR (artificial intelligence or Natural Language Processing or “ai” or “machine learning” or “deep learning” or “neural network” or “large language model” or “generative model*” or LLM* or “transformer model*” or “language model*” or “generative AI” or “foundation model*” or “predictive model*” or “supervised learning” or “unsupervised learning” or “reinforcement learning” or “expert system*” or “pattern recognition” or “text mining” or “literature mining” or “evidence extraction” or “automated review” or “sentiment analysis” or “topic modeling” or “text classification” or “counterfactual analysis” or “scenario analysis” or “bias detection” or “ChatGPT” or “GPT-” or BERT or RoBERTa or Gemma or LLaMA).af.
2("BMC Health Services Research” or “BMJ Quality & Safety” or “Health Research Policy and Systems” or “Annual of Review of Public Health” or “American Journal of Public Health” or “American Journal of Preventive Medicine”).jn.
31 AND 2
4limit 3 to yr="2010 -Current"
5After Removing duplicates in Endnote from previous search837 left

Update frequency

We will perform search updates every six months to identify newly published peer-reviewed studies, preprints, and grey literature. The frequency may be adjusted based on the volume of newly identified records and available team capacity. Each update cycle will include re-running the database searches, screening, full-text review, and data extraction following the same procedures as the original review.

Versioning and reporting

All updates will be tracked and versioned. New findings and changes in key concepts, classifications, or gaps will be reported in a cumulative manner and noted clearly in any published outputs. A dedicated section will be added to the online supplementary materials or review platform (if hosted) to indicate the date of the last update and planned date for the next update. We will consider developing an interactive database and data visualization if resources allow it.

Team roles and governance

GF, NT and other team members be responsible for overseeing the living component of the review. Team meetings will be scheduled at each update point to discuss inclusion of new evidence and refine the approach if needed.

Triggers for substantive review revision

A full update of the review (including potential resubmission for publication) will be triggered if: (i) there is a critical mass of new studies (e.g., >20% increase in included records); (ii) stakeholder priorities or core concepts in implementation science shift meaningfully; (iii) major AI-related methodological and technological breakthroughs occur; or (iv) regulatory/policy developments occur (e.g., WHO guidance on AI in public health).

Source of evidence selection

After completing the search, all identified citations will be compiled and uploaded into Covidence, where duplicates will be removed. Two independent reviewers will screen all titles and abstracts independently to determine eligibility based on the inclusion criteria for the review. The full texts of selected citations will then be thoroughly evaluated against the inclusion criteria by two independent reviewers. Any disagreements during the selection process will be resolved through discussion or, if necessary, by consulting an additional reviewer. The search results and study selection process will be fully detailed in the final scoping review and displayed in a PRISMA-ScR flow diagram.52

Data extraction

We will systematically extract detailed information from each included study or other relevant sources to address the review objectives. A structured data extraction form will be developed and piloted to ensure consistent data collection across studies covering article characteristics, evaluation methodology, AI application, comparator, outcomes, adverse effects, and research gaps. Two independent reviewers will conduct the data extraction, after a calibration exercise on 10 articles. Any inconsistencies will be discussed and resolved, and the extraction guide adapted as needed.

Article characteristics

We will first extract key article characteristics, including article type, author(s), health and social care categories, year of publication, country of origin, population(s) and setting.

Evaluation methodology

We will extract the evaluation methodologies used to assess AI-supported implementation science activities, including quantitative (e.g., RCTs, observational, simulation), qualitative (e.g., interviews, focus groups), and mixed-methods approaches. We will also capture human-centered evaluations (e.g., usability testing), AI-specific techniques (e.g., cross-validation, explainability assessments), and use of implementation science frameworks (e.g., RE-AIM, CFIR, NASSS). This will inform a taxonomy of evaluation approaches for AI-enabled implementation science activities.

AI application

We will extract detailed information on the AI application. We will use established classifications, such as the Living Map of Generative LLM-Based Tools for Health and Social Care Applications54 (developed by JT), to guide data collection across the following dimensions:

  • (i) Application class (es) (e.g., clinical service delivery, public health, or policy implementation);

  • (ii) AI technology (e.g., knowledge-based or rules-based AI using explicit knowledge representation and reasoning, traditional [“shallow”] machine learning [e.g., logistic regression, SVMs], deep learning but without any generative capability or transfer learning, transfer learning including fine-tuning pre-trained foundation models, generative AI, potentially with in-context learning, but without significant additional fine-tuning);

  • (iii) Model(s), ontology(ies), platform(s) or tool(s) used (e.g., decision trees, neural networks, Bayesian networks; GPT-4, BERT);

  • (iv) Mode(s) of model use;

  • (v) Model version;

  • (vi) Maturity level of the AI application (e.g., MSc thesis prototype vs. commercial tool);

  • (vii) Degree of testing and deployment of AI application;

  • (viii) Implementation science task type(s) of the AI application, categorized according to core implementation science activities (e.g., as per the KTA Model in Table 1).

Comparator

For studies that include a comparator, we will document the type of comparator used (e.g., human researcher or clinician, standard manual process, non-AI technology, or another AI model), and the function being compared (e.g., diagnostic accuracy, decision-making, time to task completion). This information will help contextualize the performance of AI applications and support future benchmarking efforts.

Outcomes

We will extract information on performance indicators and outcome types reported in relation to AI-supported implementation science activities. While we will not extract specific effect sizes or interpret the direction of effects, we aim to comprehensively map and categorize the types of outcomes assessed across studies. These outcomes will inform the development of a future taxonomy for evaluating AI in implementation science. Outcome types may include:

  • (i) Time-related outcomes: Time to complete tasks, time to implementation, delays, etc.

  • (ii) Cost-related outcomes: Development costs, operational costs, cost-effectiveness metrics, etc.

  • (iii) Accuracy: Concordance with gold standard or expert judgement, reduction in errors, etc.

  • (iv) Task-specific technical metrics, depending on the nature of the AI model:

    • a. Classification tasks: Precision, recall, F1-score, AUC-ROC, specificity, sensitivity.

    • b. Regression tasks: Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), R2.

    • c. Generative tasks: BLEU, ROUGE, METEOR, GLEU, perplexity scores.

    • d. Ranking or recommendation tasks: NDCG, MAP, MRR.

    • e. Human factors: Usability, user satisfaction, acceptability, and trust in the system.

  • (v) Implementation-specific outcomes: Adoption, fidelity, reach, sustainability, feasibility, etc.

  • (vi) Equity-related outcomes: Disparities in performance across subgroups, etc.

  • (vii) Clinical and health system outcomes (where relevant): Patient satisfaction, clinical workflow improvements, adherence to guidelines, or patient safety markers.

Adverse effects

We will extract any reported or potential adverse effects or unintended consequences associated with AI use in implementation science. This includes clinical or patient harms, such as treatment errors; systemic issues like increased clinician workload, workflow disruptions, or exacerbation of disparities caused by existing biases in training data; and AI-specific risks, including algorithmic drift, hallucinations in generative models, or overreliance on automated systems. We will also document user-level effects such as reduced trust, cognitive overload, decision fatigue, or de-skilling of professionals. All identified harms will be classified and mapped to support future risk assessment and mitigation efforts.

Research gaps

Finally, we will identify both explicitly stated and inferred research gaps to enhance the role of AI in implementation science. These may include gaps in knowledge or evidence related to AI’s effectiveness, scalability, or sustainability in implementation contexts; underexplored domains of application (e.g., underrepresented populations, low-resource settings), methodological gaps (e.g., lack of robust evaluation, absence of longitudinal studies), and conceptual or theoretical gaps (e.g., insufficient use of implementation science frameworks, lack of interdisciplinary integration). We will also capture recommendations made by study authors for future AI development, evaluation, or use in implementation science, needs for standards, reporting guidelines, or regulatory frameworks to support responsible AI use. These gaps will inform a future research agenda and highlight opportunities to enhance the value and equity of AI in implementation science.

Data analysis and synthesis

We will conduct a structured analysis of included studies to address the review’s objectives. First, we will generate a descriptive summary capturing key characteristics such as publication year, country of origin, study design, setting, population, and area of application. This will allow us to identify trends in how AI is being used within implementation science. AI applications will be categorized by the implementation activity they support, the type of technology used, the specific tools or models described, and their level of maturity. Outcomes will be grouped and summarized based on their relevance to performance, implementation, human factors, equity, and system-level impact. We will describe how outcomes are measured and reported, but not interpret effect sizes. Findings will be integrated into a narrative synthesis that links AI applications, implementation activities, outcomes, and research gaps.

Conclusion

This living scoping review will offer a comprehensive overview of how AI is being applied across the spectrum of implementation science activities. It will map the current landscape, synthesize reported outcomes, and identify key research gaps. The findings will serve as a foundation for advancing the responsible and equitable integration of AI in implementation research and practice, with the potential to accelerate the adoption, scale-up, and sustainability of evidence-based interventions. Target audiences include implementation scientists, applied researchers, funding agencies, computer scientists seeking to engage with real-world challenges, implementation practitioners, and policymakers.

Disclosures

ChatGPT 4o (OpenAI, 2025) was used to enhance the coherence and readability of some sections of this manuscript. The authors have reviewed all sections of the article and take full responsibility for its contents.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 17 Oct 2025
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Fontaine G, Di Lalla O, Michie S et al. Tracking the Evolving Role of Artificial Intelligence in Implementation Science: Protocol for a Living Scoping Review of Applications, Evaluation Approaches and Outcomes [version 1; peer review: awaiting peer review]. F1000Research 2025, 14:1135 (https://doi.org/10.12688/f1000research.171774.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status:
AWAITING PEER REVIEW
AWAITING PEER REVIEW
?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 17 Oct 2025
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.