Keywords
Artificial Intelligence, Computer-Assisted Image Interpretation, Dental Caries, Dental Radiography, Diagnostic Imaging, Treatment Planning, Clinical Decision-making, Clinical Decision Support System
This article is included in the Health Services gateway.
The emergence of AI technologies has revolutionised dentistry, with intraoral imaging being a key area for innovation. Despite advances and growing interest in applying AI algorithms to intraoral x-rays, the methodological quality, diagnostic validity, and clinical applicability of existing studies remain unclear.
To synthesise and critically appraise the current evidence on AI integrated with intraoral digital radiographic imaging for detecting dental caries in adults, focusing on diagnostic accuracy compared with gold-standard methods and examining methodological quality, clinical applicability, and implementation challenges.
Following the JBI scoping review framework and PRISMA-ScR reporting guidelines, a comprehensive literature search was conducted across the PubMed, Scopus, and IEEE Xplore databases from January 2015 to May 2025. Studies that met the predefined eligibility criteria were included. Thematic analysis, combining inductive and deductive approaches following Braun and Clarke’s framework, identified five themes. The CASP quality appraisal was performed to ensure methodological rigour.
Ten peer-reviewed studies were included in the final data analysis. AI systems detected a greater number of carious lesions than human clinicians, particularly in early-stage caries, with representative metrics including 88% sensitivity, 91% specificity, and 89% accuracy. Other models reported F1-scores up to 89% and AUC ≈95%. Methodological diversity was notable, with histology-validated designs providing the strongest evidence. Implementation challenges included limited external and real-world validation, clinician oversight, ethical/regulatory considerations, and inadequate model interpretability.
AI exhibits strong potential to enhance early caries detection on intraoral radiographs and support clinical decision-making in adults. Fully realising AI’s clinical potential requires overcoming implementation and methodological challenges. Standardised validation methods across diverse populations and settings are crucial to ensure AI diagnostic reliability and generalisability. Current AI applications in dentistry are primarily designed to assist clinicians in detecting caries; however, their greatest potential lies in a future where they can independently guide treatment planning decisions.
Artificial Intelligence, Computer-Assisted Image Interpretation, Dental Caries, Dental Radiography, Diagnostic Imaging, Treatment Planning, Clinical Decision-making, Clinical Decision Support System
Proper and timely detection of dental caries, along with appropriate decision-making on whether to intervene, use preventive measures, or monitor, is critical for maintaining oral health and preventing disease progression. It is the necessary first step of the process to preserve natural tooth structure, reduce the need for invasive and costly treatments, and minimise complications such as pain, infection, and abscesses.1–3 From a public health perspective, accurate detection and sound treatment planning reduce the financial burden on healthcare systems by minimising the need for complex interventions. At the individual level, they support long-term function, improved quality of life, and alignment with the principles of minimally invasive dentistry.1,3–5
Missed or incorrect diagnoses can lead to both over-treatment and under-treatment, with consequences for patient well-being, imposed costs, healthcare resources, and clinician liability.6,7 Delayed or inappropriate referrals increase the likelihood of advanced disease requiring more invasive and costly treatments, while unnecessary referrals can expose patients to repeat diagnostic procedures, additional costs, and radiation exposure.8–10 Improving diagnostic precision is therefore crucial for both patient care and system-level efficiency.
Current diagnostic methods, which include the combination of visual-tactile examinations and viewing two-dimensional radiographs, face well-recognised limitations.11–13 While more sophisticated techniques, such as 3-D scans, are not practically appropriate for general dental practice use.14 As a result, artificial intelligence (AI) has emerged as an attractive solution. Machine learning and deep learning algorithms have shown the ability to analyse intraoral radiographs with high accuracy.15,16 Intraoral radiographs, such as bitewing and periapical, have been a key focus, with AI systems demonstrating strong diagnostic metrics—high sensitivity, specificity, and AUC values—while also reducing the time required for interpretation.17–19
Integrating AI into dental diagnostics offers multiple benefits that extend beyond improved accuracy. AI systems consistently demonstrate higher sensitivity for subtle lesions, provide real-time decision support, and reduce inter- and intra-observer variability.16,20 They also classify lesion severity, align with frameworks such as ICDAS or ICCMS, and contribute to evidence-based treatment planning. Significantly, automated analysis improves efficiency by reducing interpretation time and freeing clinicians to focus on patient communication and holistic care.21,22
The rapid advancement of AI in dental diagnostics has led to a growing body of literature on its application for caries detection from intraoral radiographs. However, the evidence remains fragmented and methodologically heterogeneous. Studies vary widely in imaging modalities, dataset quality, annotation protocols, lesion definitions, and AI architectures. Many rely on small, single-centre samples, internal cross-validation, and non-standardised outcome reporting, with limited attention to calibration, decision thresholds, or model interpretability. Such methodological heterogeneity restricts comparability and reduces confidence in generalisability.23–25
Most, if not all, of the literature focuses on the increased number of caries detection when AI is integrated. Limited attention has been given to whether AI might lead to over-detection of lesions and, consequently, unnecessary treatment. It also remains unclear whether existing studies have adequately addressed external validity through independent, multicentre validation or considered the ethical, legal, and regulatory readiness of AI systems for safe clinical deployment.
This scoping review aims to map and synthesise the current literature on AI integrated with intraoral digital radiographic imaging for detecting dental caries in adults, with emphasis on studies comparing AI diagnostic performance against established gold-standard diagnostic methods. Specifically, this review (i) identify and summarise studies applying AI to bitewing and periapical radiographs; (ii) identify and categorise AI model types (e.g., machine learning, deep learning, convolutional and segmentation networks) and intraoral imaging systems; (iii) evaluate AI diagnostic accuracy in comparison to histology, expert clinician assessment, or validated diagnostic criteria; (iv) assess reported impact on clinical decision-making—including treatment planning, diagnostic confidence, and workflow efficiency; and (v) Identify evidence gaps and implementation challenges to guide future research priorities, clinical integration approaches, and policy for AI-assisted caries diagnostics.
This study employed a scoping review methodology, guided by the Joanna Briggs Institute (JBI) framework for evidence synthesis,26 and reported in accordance with the PRISMA-ScR checklist to ensure transparency and reproducibility.27,50 The JBI approach, which builds on the foundational framework of Arksey and O’Malley (2005) and is further refined by Levac et al. (2010),28,29 provides structured yet flexible guidance, making it particularly suited to broad and emerging research areas. This methodology was selected to systematically map and appraise the diverse and heterogeneous body of literature on AI applied to intraoral radiography for caries detection in adults. Unlike systematic reviews, which typically address narrowly defined effectiveness questions, scoping reviews allow the inclusion of varied study designs, methodologies, and outcomes, thereby capturing the breadth of existing research.30 This approach is especially appropriate given the rapidly evolving, multidisciplinary nature of AI in dentistry, where studies differ in terms of AI models, imaging systems, validation strategies, and clinical applications.
This scoping review protocol was developed in advance in November 2024 to ensure a transparent and systematic methodological approach. It outlines the review’s aim, objectives, eligibility criteria, search strategy, study selection process, data extraction, and data analysis and synthesis. Additionally, although the protocol was not formally registered, all steps were thoroughly documented and consistently applied throughout the review process.
The research question was developed in line with the JBI methodology and framed using the Population–Concept–Context (PCC) framework to ensure clarity and transparency. The central question guiding this review is: “Among adults undergoing dental examination for caries (Population), what is the current evidence on the use of artificial intelligence (AI) integrated with intraoral digital imaging systems for caries detection and its impact on diagnostic accuracy, treatment planning, and clinical decision-making (Concept), in clinical or research settings using intraoral radiography (Context)?”
2.2.1 Eligibility criteria
P (Population): Adults (≥18 years) undergoing dental examination or treatment.
C (Concept): Studies investigating the integration of artificial intelligence (AI), including machine learning, deep learning, or neural networks, with intraoral digital radiographic imaging (bitewing and periapical X-rays) for:
• Detection of dental caries.
• Comparison of AI diagnostic accuracy with established gold standard methods (e.g., histological validation, clinical visual-examination or expert consensus).
• Evaluating the influence of AI on treatment planning, diagnostic confidence, and clinical decision-making.
• Diagnostic accuracy measures (e.g., sensitivity, specificity, accuracy, precision, AUC, F1 score, PPV, NPV).
C (Context): Clinical or research-based diagnostic settings using intraoral radiography (bitewing or periapical), and examining the application of AI in caries management, including both primary and specialist care environments.
Original diagnostic accuracy studies, comparative designs, in vivo or in vitro investigations, randomised controlled clinical trials, and reviews of these original studies were included if published in English between January 2015 and May 2025.
Relevant studies were identified through a comprehensive search of three major electronic databases: PubMed, Scopus, and IEEE Xplore. Search strategies were developed in consultation with an expert librarian at Queen Mary University of London (QMUL) to ensure methodological rigour and comprehensive coverage. A combination of controlled vocabulary, Medical Subject Headings (MeSH) terms for PubMed, and corresponding subject headings for Scopus, as well as free-text keywords and their synonyms, is used to cover the full scope of literature related to the research question, structured across five concepts groups: (i) artificial intelligence, (ii) intraoral digital radiography, (iii) dental caries, (iv) detection and diagnostic accuracy, and (v) clinical outcomes.
The search strategy used Boolean operators (OR, AND) to link different concept groups; (OR) connected synonymous terms within each concept, and (AND) linked different thematic areas. Truncation symbols and wildcards were also applied, with database-specific syntax tailored for each database. The full search strategy is provided in the supplementary file in the external data repository.50
Preliminary searches were piloted and iteratively refined to optimise sensitivity and specificity, with strategies re-visited. Backwards citation searching and manual hand-searching were also conducted. The retrieved records formed the basis of evidence for subsequent screening against predefined eligibility criteria.
The selection of studies was conducted systematically and transparently, in line with predefined eligibility criteria, to ensure the inclusion of only relevant, high-quality evidence in data extraction and subsequent phases. At each stage, screening decisions were documented using a standardised form, with reasons for exclusion recorded to maintain transparency and reproducibility.
2.4.1 Selection Step 1: Identifying duplicates
After completing the database searches, all identified publications were imported into EndNote 21 (Clarivate, USA, 2023) reference management software.31 The software’s automatic deduplication feature was initially used to remove duplicate records. The remaining references were exported from EndNote into the Rayyan screening AI tool (Rayyan Systems Inc., Qatar, 2025).32 This tool automatically identified additional potential duplicates and flagged them, letting the researchers to choose the most complete or appropriate version and delete the other(s).
2.4.2 Selection Step 2: Title and abstract screening
A thorough, systematic screening of the titles and abstracts of all records was conducted independently by two researchers (SK, AG) using the Rayyan AI tool, against the predefined eligibility criteria. Studies that clearly indicated the relevance of AI integration with intraoral imaging for caries detection were retained, while those focused on unrelated topics were excluded. Additionally, ambiguous titles or abstracts that lacked sufficient clarity were retained for further assessment at the subsequent stage. Any discrepancies at this stage were resolved through consensus discussions among the researchers to maintain methodological rigour and ensure robust decision-making.
2.4.3 Selection Step 3: Critical appraisal
The full texts of all remaining studies were obtained. Where access was restricted, requests were made through QMUL library services. Each study was then critically appraised using the Critical Appraisal Skills Programme (CASP) checklist appropriate to its study design, shown in Tables (1.1-3). The critical appraisal was conducted independently by GPT-4o Team (OpenAI, USA, 2025) and one researcher (SK), both of whom were trained and calibrated by the co-author (AG). A standardised reporting format was used, comprising a study summary, CASP scoring table, and overall recommendation. Discrepancies in reviewer judgments were resolved by the co-author (AG) to ensure consistency and methodological rigour.
| ID | Title of the Paper | Authors/Year of Publication | Journal | Study Design | Country | Sample size | Imaging System | AI Methodology |
|---|---|---|---|---|---|---|---|---|
| 1 | Accuracy Assessment of Human and AI-Assisted Bitewing Radiography and NIRI-Based Methods for Interproximal Caries Detection: A Histological Validation | Rodrigues et al., 202534 | Caries Research | In-vitro diagnostic accuracy study | Spain | A total of 171 proximal surfaces from 100 extracted posterior teeth | Intraoral Bitewing radiographs and NIRI intraoral scans | AI-assisted bitewing radiography assessment using a Deep CNN-based software Denti.AI; an AI model integrated with a radiographic interpretation tool |
| 2 | From inconsistent annotations to ground truth: aggregation strategies for annotations of proximal carious lesions in dental imagery | Klein et al., 202535 | Journal of Dentistry | In-vitro diagnostic performance evaluation study | Germany and the Czech Republic | A total of 1007 proximal surfaces from 522 extracted posterior teeth | Orthoradial radiographs and Near-Infrared Light Transillumination (NILT) | Evaluation of annotation aggregation strategies: Majority Voting (MV), Weighted Majority Voting (WMV), Dawid-Skene (DS), Multi-Annotator Competence Estimation (MACE) |
| 3 | Performance comparison of multifarious deep networks on caries detection with tooth X-ray images | Ying et al., 202436 | Journal of Dentistry | Comparative diagnostic accuracy study | China | A total of 392 periapical radiographs (346 training and validation dataset, 46 testing dataset); 135 teeth in the testing dataset | Periapical digital radiographs | Four deep networks types:
|
| 4 | Developing the Benchmark: Establishing a Gold Standard for the Evaluation of AI Caries Diagnostics | Boldt et al., 202437 | Journal of Clinical Medicine | In vitro diagnostic accuracy study | Germany | A total of 1071 bitewing radiographs from 179 extracted permanent human teeth | Standardised bitewing radiographs using the parallel technique | Evaluation of the performance of an AI algorithm model against a histology-based gold standard benchmark |
| 5 | Evaluating the Accuracy of AI-Based Software vs Human Interpretation in the Diagnosis of Dental Caries Using Intraoral Radiographs: An RCT | Das et al., 202438 | Journal of Pharmacy and Bioallied Sciences | Randomised controlled trial (RCT) | India and Saudi Arabia | 200 intraoral radiographs were obtained from patients aged 18 to 65 years seeking dental care | Two bitewings and two periapical radiographs per participant using digital intraoral X-ray equipment; anonymised and standardised radiographs collected prospectively | Deep learning-based AI software to detect carious lesions on intraoral radiographs |
| 6 | Artificial intelligence for caries detection: a novel diagnostic tool using deep learning algorithms | Liu et al., 202439 | Oral Radiology | Diagnostic accuracy study using deep learning | China | 4278 periapical radiographs (12,524 single-tooth images) | Digital periapical radiographs from clinical settings | ResNet-based CNN with Segment Anything Model (SAM); integrated Grad-CAM for visual support |
| 7 | Diagnosis of Interproximal Caries Lesions in Bitewing Radiographs Using a Deep Convolutional Neural Network-Based Software | García-Cañas et al., 202240 | Caries Research | Analytical, observational, and cross-sectional study | Spain | 300 digital bitewing radiographs of posterior teeth taken from 150 patients aged 16-85 years | Digital bitewing radiographs | Deep CNN-Based Software (Denti.Ai) with different caries detection thresholds (Model 1 to Model 4) |
| 8 | Detecting Proximal Caries on Periapical Radiographs Using Convolutional Neural Networks with Different Training Strategies on Small Datasets | Lin et al., 202241 | Diagnostics | Diagnostic accuracy study | China | 800 periapical radiographs (600 training/validation, 200 testing) from 3165 initial periapical radiographs taken from 385 men and 415 women (mean age: 45.3 years) | Periapical radiographs (BMP format) from PACS system, acquired via the paralleling technique | Pretrained Cifar-10Net CNN with three training strategies: IR (image recognition), EE (edge extraction), IS (image segmentation); trained using transfer learning and fine-tuning |
| 9 | Detection of Proximal Caries Lesions on Bitewing Radiographs Using Deep Learning Method | Chen et al., 202242 | Caries Research | Diagnostic accuracy study | China | 978 bitewing radiographs;10,899 proximal surfaces analysed | Digital bitewing radiographs | Faster R-CNN deep learning object detection framework for caries localisation and classification |
| 10 | The ADEPT study: a comparative study of dentists’ ability to detect enamel-only proximal caries in bitewing radiographs with and without the use of AssistDent artificial intelligence software | Devlin et al., 202143 | British Dental Journal | RCT-Comparative diagnostic accuracy study | United Kingdom | 24-bitewing radiographs.23 dentists (11 in the control group and 12 in the experimental group) | Digital bitewing radiographs | AssistDent AI software (machine learning algorithm) |
A structured scoring system was applied to the quality appraisal process, with responses coded as Yes = 1, No = 0, and Unclear/Maybe/Not applicable = 0.5. Studies achieving full or full-minus-one scores were rated as high quality and were included. Those with full minus two scores were rated as medium quality, with inclusion determined on a case-by-case basis depending on relevance. Studies with less than full minus two scores were considered low quality and were excluded. All CASP scores were documented with justifications for inclusion, potential inclusion, or exclusion. This dual-review approach ensured transparency, reproducibility, and accountability, while allowing flexibility to retain studies of potential value despite minor quality limitations.
2.4.4 Selection Step 4: Full-text assessment
The full texts of all remaining studies were independently reviewed by the researcher (SK) to assess methodological rigour, with particular attention to the use of appropriate gold-standard validation methods. Those that mixed adults with children or adolescents, lacked inter-rater reliability assessment and calibration standards, lacked clinical or external validation, and had a high or unclear risk of bias were excluded. Conference proceedings and Duplicate records of the same studies with overlapping datasets were also excluded. Ambiguous cases were discussed in detail with the co-author (AG) until consensus was reached. All studies agreed upon through this structured process formed the final sample for data extraction and synthesis. The review process was documented in a standardised Excel spreadsheet, recording reasons for exclusion to ensure transparency and reproducibility.
A standardised data charting form was developed to ensure systematic and consistent extraction of key information across all included studies. Ultimately, it supported a coherent synthesis and presentation of findings across the diverse body of literature. The predefined categories captured study characteristics (title, authors, year of publication, country), journal and study design, sample size, dental focus and dental setting, AI methodology, imaging system, gold standard reference, diagnostic accuracy measures, validation approaches, key findings, limitations and bias, study conclusions, and recommendations for research or practice. Data were extracted independently by two authors (SK & AG), with discrepancies resolved through discussion. Citations were managed in EndNote 21 and transferred to Microsoft Excel, where the extracted data were recorded for organisation and analysis.
Following data extraction, the findings were collated and summarised to provide an overview of the included evidence. A descriptive numerical analysis, consistent with the JBI framework, was conducted to map key study characteristics. The analysis quantified the number and types of included studies, their geographical distribution, the AI models used, the intraoral imaging modalities, the validation methods, the gold standards, and the reported diagnostic performance metrics (e.g., sensitivity, specificity, AUC). This approach not only mapped the scope and distribution of existing evidence but also highlighted gaps in the literature, providing a foundation for the subsequent thematic synthesis. The descriptive analysis thus established a structured understanding of the evidence base, supporting the review’s objectives and informing practice, policy, and future research.
A narrative synthesis and thematic analysis were conducted to identify and integrate key patterns across the included studies, following Braun and Clarke’s six-phase thematic analysis framework to ensure transparency and rigour.33
Included studies were reviewed in full and coded using a combined deductive–inductive approach, guided by the review objectives. Codes capturing methodological features, diagnostic performance, clinical relevance, validation strategies, and implementation barriers were organised into thematic categories. These themes were iteratively refined, supported by representative data, and presented through narrative synthesis alongside visual outputs (tables, matrices, heatmaps, and a thematic concept map). Coding and synthesis were performed manually using structured Excel tools, enabling integration of qualitative and quantitative insights with implications for research, practice, and policy.
The PRISMA-ScR flowchart ( Figure 1) illustrates the screening and selection process. Database searches retrieved 414 records (238 from PubMed, 69 from Scopus, 107 from IEEE Xplore) and five from manual searching, yielding 419 in total. After deduplication, 369 records remained for title and abstract screening, of which 288 were excluded for being irrelevant ( Table 2). Eighty-one articles progressed to full-text review with CASP appraisal. Seventy-one were excluded—five due to poor methodological quality and 66 because of the absence of a gold-standard comparator, inter-rater calibration, or adequate validation ( Table 2). Ten studies met all criteria and were included in the final analysis.
Ten peer-reviewed studies were included in this scoping review ( Figure 2). The key characteristics of these studies are summarised in Table 3. The included studies represented diverse geographical settings: China (n = 4), Spain (n = 2), Germany (n = 2), the United Kingdom (n = 1), and a multinational collaboration between India and Saudi Arabia (n = 1). All were published in well-known dental and medical journals.34–43
3.2.1 Study designs, settings and sample size
The ten included studies comprised diagnostic accuracy studies (n = 4), in vitro diagnostic performance studies (n = 3), randomised controlled trials (n = 2), and one cross-sectional observational study. Most were conducted in laboratory or simulated environments using extracted human teeth, while others took place in university hospitals or private dental practices.34–43 The two RCTs directly examined AI’s impact on dentists’ diagnostic performance in both teaching and general practice settings, supporting AI’s potential role in clinical decision-making.38,43 Sample sizes ranged widely, from 171 proximal surfaces in extracted teeth to over 12,000 tooth-level images from clinical radiographs. Clinical datasets included adults aged 16–85 years, with examples such as García-Cañas et al. (n = 150 patients, 300 bitewings),40 Lin et al. (n = 800 periapicals),41 and Das et al. (n = 200 intraoral radiographs).38 One study (Devlin et al.) uniquely explored AI’s role in education by involving 23 dentists in the interpretation of 24 bitewings.43 While sample sizes supported both proof-of-concept and large-scale validation, detailed demographic reporting was generally absent.
3.2.2 Diagnostic approaches and AI integration
Most studies focused on detecting proximal carious lesions (n = 9), with one specifically addressing enamel-only caries. Some, such as Klein et al., targeted primary proximal lesions, while others (e.g., Ying et al., Chen et al.) examined both enamel and dentine involvement, and several proposed frameworks for grading lesion severity to improve benchmarking. All studies used intraoral radiographic imaging, most commonly digital bitewings (n = 5) and periapicals (n = 3), with one study combining both and another employing orthoradial radiographs. Acquisition techniques were generally standardised, though sensor brands were inconsistently reported ( Figure 3).34–43
AI methodologies were dominated by deep learning and convolutional neural networks, with tools such as Denti.AI and AssistDent integrated into radiographic interpretation. Object detection models (YOLOv5, DETR, Faster R-CNN) were applied for lesion localisation, while segmentation approaches (UNet, Trans-UNet, Segment Anything Model) supported precise lesion mapping. Some studies employed Grad-CAM to improve interpretability, and training strategies frequently included transfer learning, fine-tuning, and the use of pre-trained networks. Annotation aggregation techniques (e.g., majority voting, weighted voting, Dawid–Skene, MACE) were also used to enhance labelling reliability.34–43
3.2.3 Gold standards and validation
All studies employed a gold standard to evaluate AI accuracy in detecting dental caries from intraoral radiographs. Histological validation through thin-section microscopy was employed in three studies, while ICDAS-based clinical examination with cavity opening was used in two, and expert consensus with high calibration and inter-rater reliability was utilised in the remaining five ( Figure 4). While internal validation with expert panels, clinical examination, or blinded reviewers was most common, only one study conducted external benchmarking using a histology-based dataset.37 Performance and agreement were assessed using statistical methods, including Cohen’s and Fleiss’ kappa, ICC, ROC curve and AUC analyses, as well as significance testing (Chi-square, McNemar’s, Wilcoxon, and Z-tests), with 95% confidence intervals typically reported. Some studies additionally employed Grad-CAM visualisation or benchmark thresholds to support interpretability and validate findings.34–43
The analysis identified five overarching themes that capture key insights on AI integration with intraoral radiographic imaging for caries detection in adults: (i) diagnostic accuracy performance, (ii) clinical relevance and implications, (iii) imaging-related factors, (iv) methodological considerations, and (v) recommendations for future integration. Each theme is described below with illustrative examples. The distribution of themes across studies and their primary outcomes is summarised in Table 4. Figure 5 presents a heatmap illustrating theme coverage and the strength of evidence, while Figure 6 displays the thematic concept map.
3.3.1 Theme 1: AI effectiveness: Diagnostic accuracy and comparison with the gold standard
Across the included studies, AI consistently demonstrated strong diagnostic potential, often matching or surpassing clinician performance, particularly in detecting early-stage and proximal caries. Reported sensitivity, specificity, accuracy, and F1-scores were generally higher for AI models than for human examiners. For instance, Devlin et al.’s study found that dentists using AI achieved 75.8% sensitivity and 85.4% specificity, compared with 44.3% and 96.3% without AI in detecting enamel-only caries.43 In comparison, Das et al.’s RCT reported AI software performance of 88% sensitivity, 91% specificity, and 89% accuracy, exceeding human interpretation.38 Chen et al.’s study also observed AI superiority over dental students, with 87% accuracy, 72% sensitivity, 93% specificity, and an F1-score of 0.74, compared with 82%, 47%, 94%, and 0.57, respectively.42 Benchmarking against gold standards varied: histology-based studies reported the most stringent results, with specificity consistently >90%34,35,37; ICDAS and cavity-opening studies, such as Liu et al. and García-Cañas et al., demonstrated accuracies of 82–88% and AUCs of 77–95%39,40; and expert consensus–based studies, such as Ying et al. and Lin et al., confirmed AI’s higher sensitivity than human eyes for both enamel and dentine caries (p < 0.05).36,41 Collectively, these findings highlight AI’s consistent diagnostic reliability across diverse methodologies, settings, and comparator standards, underscoring its potential clinical utility in caries detection.
3.3.2 Theme 2: Clinical implications and relevance: AI as a clinical decision support tool and impact on treatment planning
AI integration with intraoral radiographs carries significant clinical implications, consistently enhancing diagnostic sensitivity, clinician confidence, and decision-making. Devlin et al.’s study demonstrated a 71% increase in sensitivity for enamel-only proximal caries when dentists used AI prompts (AssistDent), enabling earlier intervention and minimally invasive treatment planning.43 Similarly, Chen et al.’s study demonstrated that AI enhanced the detection of early enamel and outer dentine lesions without substantially increasing false positives, particularly benefiting less experienced practitioners and highlighting its potential in dental education.42 Across studies, AI was positioned as a decision-support tool rather than a substitute for clinicians, with the capacity to reduce overtreatment, support preventive care planning, and alleviate clinician workload through precise and timely detection of early lesions.34–43 Moreover, Devlin et al.’s study proposed that AI-supported sensitivity could serve as the basis for an audit standard for caries detection, underscoring AI’s transformative potential to promote evidence-based, patient-centred dental care.43
3.3.3 Theme 3: Imaging modalities and diagnostic variation by radiograph type and lesion severity
Digital bitewing radiographs emerged as the predominant modality (n = 5), reflecting their widespread clinical use for detecting proximal caries. The remaining studies utilised periapical radiographs, with one combining bitewing and periapical images and another employing an orthoradial radiograph. Diagnostic performance varied according to lesion depth, severity, and radiograph type, with reduced accuracy frequently observed on images with tilted or low-quality radiographs. Studies by Liu et al. (2024) and Lin et al. (2022) highlighted how artefacts, anatomical variability, and positioning inconsistencies influenced outcomes, emphasising the importance of methodological refinement and improved standardisation of imaging and preprocessing.39,41 Several studies also reported that manual preprocessing tasks (e.g., image rotation or cropping) limited workflow efficiency, underscoring the need for automated solutions and more robust AI models capable of handling variations in real-world clinical imaging.
3.3.4 Theme 4: Methodological considerations: AI model strategies, technical design, validation approaches, and limitations
Substantial methodological diversity influenced the robustness and reliability of the included studies. A variety of convolutional neural networks (CNNs) were employed, including YOLOv5, Faster R-CNN, and ResNet variants, often enhanced with techniques such as edge extraction, transfer learning, and annotation aggregation strategies (e.g., MACE, Dawid–Skene) to optimise performance on small or imbalanced datasets. Validation approaches varied considerably: studies using histological ground truth provided the most objective benchmarks, while those relying on expert-labelled annotations introduced greater subjectivity and variability in results. Common limitations included the risk of overfitting, absence of multicentre validation, and reliance on imbalanced datasets, all of which restrict the generalisability of findings and underscore the need for more rigorous, standardised methodologies in future research.34–43
3.3.5 Theme 5: Implementation challenges, recommendations for practical integration, and future research directions
Implementation considerations featured prominently across the studies, with AI consistently framed as a supportive adjunct rather than an autonomous tool, reinforcing the principle that clinicians retain ultimate responsibility for diagnostic decisions. Central to effective integration were the development of interpretability mechanisms (e.g., Grad-CAM) to build clinician trust, alongside clear boundaries of clinical accountability, ethical safeguards, and regulatory clarity. Barriers to real-world adoption included the need for explainability, clinician training, and seamless incorporation of AI into established diagnostic frameworks such as ICDAS and ICCMS. To address these challenges, studies recommended comprehensive pilot testing, longitudinal and multi-centre validation, integration with risk-based caries management frameworks, and the use of AI for education and quality improvement feedback.34–43 Collectively, these recommendations emphasise the importance of establishing structured pathways to ensure the reliable, ethical, and effective clinical adoption of AI in caries detection.
Across the ten included studies, AI consistently demonstrated diagnostic performance comparable to, and in many cases exceeding, that of clinicians, particularly in detecting early-stage enamel and outer dentine caries.34–43 Metrics such as sensitivity, specificity, and AUC confirmed this trend. For instance, Das et al.’s study reported AI sensitivity and specificity above 88%, outperforming clinicians,38 while Chen et al.’s study highlighted AI’s superior accuracy compared with dental students.42 These findings align with broader systematic reviews, which show pooled CNN accuracies of 73–99% and sensitivities of 72–95%.44 Nevertheless, performance varied considerably across AI model types, datasets, and lesion severity, with some models underperforming on subtle enamel lesions despite high specificity.34–43 This reinforces AI’s value as a diagnostic adjunct but highlights its current limitations for the most challenging lesion types.
A significant source of variability was the type of gold standard used for diagnostic comparison. Studies employing histology-based validation provided the most objective benchmarks, with specificity consistently reported to exceed 90%.34,35,37 In contrast, reliance on expert panel consensus annotations alone introduced subjectivity and moderate uncertainty, potentially inflating performance estimates.36,41,42
Inconsistent reference standards undermine comparability, echoing concerns raised by the STARD guidelines on diagnostic research.45 Furthermore, annotation protocols varied widely—ranging from single experts to multi-clinician panels with consensus methods—introducing heterogeneity in “ground truth” labelling and creating inherent limitations in validation accuracy. Since AI learns from the quality of its input data, variability in annotation reduces reproducibility and may embed systematic human errors into AI algorithms.35,46 This inconsistency in the gold standard, validation methods, and outcome measures used raises concerns about the generalisability and comparability of results across the evidence base.
Moreover, the absence of consistent and reliable gold-standard methods for evaluating the accuracy of AI in detecting dental caries also exist as a significant gap in the current literature across numerous studies, this critical limitation undermines the validity and reliability of reported AI performance metrics, resulting in a series of issues that impact clinical translation, regulatory approval, and meaningful comparison between different AI models.34,35,37 The lack of robust reference standards is one of the most significant methodological challenges in contemporary dental AI research.
This review highlights significant clinical implications, particularly the role of AI as a decision-support tool. AI-enhanced radiographs improved clinicians’ sensitivity for early lesions, supported preventive interventions, and increased confidence in treatment planning. Devlin et al.’s study demonstrated a 71% increase in sensitivity for enamel-only lesions when clinicians utilised AI prompts.43 In contrast, Chen et al.’s study reported improved performance with fewer false positives, particularly benefiting less experienced practitioners.42 These outcomes align with evidence that AI augments diagnostic confidence, reduces variability, and facilitates minimally invasive dentistry.34,39,40,42 Importantly, AI was consistently positioned as an adjunct, not a replacement for clinical expertise, reinforcing its role in supporting evidence-based, patient-centred care.
Beyond diagnosis, AI demonstrated value in treatment planning, workflow efficiency, and education. The RCTs confirmed that clinicians supported by AI reduced false negatives and improved decision-making between preventive and operative strategies.38,43 Devlin et al.’s study also suggested AI could serve as an audit tool for caries detection.43 This aligns with the broader literature; for instance, Pul et al.’s study highlighted its benefits for junior dentists, including improved diagnostic confidence and reduced overtreatment.47 These findings suggest that AI may standardise diagnostic quality across different levels of experience, reducing disparities in dental care. However, evidence linking AI-supported diagnosis to long-term clinical outcomes (e.g., lesion progression, patient satisfaction) remains limited, highlighting an essential gap for future research.
Bitewing radiographs were the most widely used modality and consistently demonstrated higher diagnostic accuracy than periapical, particularly for proximal and early lesions. For instance, García-Cañas et al.’s study confirmed the superiority of bitewings for detecting enamel lesions.40 In contrast, Lin et al.’s study reported lower sensitivity for periapical images due to angulation and anatomical overlap.41 These findings align with those of Takahashi et al.’s study, who found that sensitivity for enamel caries was more than double in bitewings compared to periapical radiographs.48 However, AI performance declined with low-quality, tilted, or artefact-affected images, often requiring manual preprocessing. This raises concerns about efficiency and standardisation, as real-world imaging rarely achieves the laboratory-quality standards. Hence, automating preprocessing and testing cross-modality generalisability remains a critical priority.
The methodological diversity across studies significantly influenced reported outcomes. Convolutional neural networks (CNN) architectures such as YOLOv5, ResNet, DETR, UNet, and SAM were employed, often enhanced with transfer learning, edge extraction, and aggregation strategies (e.g., MACE) to address dataset limitations. While some models achieved superior performance (e.g., YOLOv5 outperforming DETR and UNet), external benchmarking was rare. Most studies relied on internal validation, which limited generalisability.34–43 Methodological reviews in medical AI confirm that internal validation alone risks overfitting and inflated performance claims. Furthermore, inconsistent reporting of model parameters and outcome definitions undermines reproducibility. These issues reflect broader calls, such as those from the FUTURE-AI Consortium, for transparent reporting and external validation.49
Lastly, practical integration faces significant barriers. While AI shows strong diagnostic promise, studies consistently emphasised its supportive role under clinician oversight. Barriers include a lack of standardised interpretability tools, unclear regulatory pathways, ethical considerations around patient consent, and infrastructural demands in smaller practices. Tools like Grad-CAM were proposed to enhance trust by visualising AI reasoning; however, real-world deployment remains limited. Furthermore, integrating AI into established frameworks, such as ICDAS and ICCMS, was recommended to align AI outputs with risk-based caries management; however, evidence of feasibility remains limited. Hence, pilot studies and clinician training in AI literacy are essential to ensure responsible adoption, prevent over-reliance, and establish robust regulatory frameworks.
This scoping review has several notable strengths. The rigorous methodological design ensured transparency, reproducibility, and comprehensiveness throughout study identification, selection, data charting, and synthesis, which enhanced the robustness and reliability of the findings. The inclusion of both technical and clinical studies provided broad coverage, ranging from in vitro validations of extracted teeth to in vivo evaluations in practical settings. This breadth offers a holistic understanding of AI integration into intraoral radiographic imaging for caries detection, bridging technical innovation with clinical relevance. Additionally, thematic synthesis enabled effective mapping across diverse study designs, methodologies, and outcomes, providing cross-disciplinary insights into areas of consensus, divergence, research gaps, and practical implications for clinical care. Additionally, all included studies were recent peer-reviewed publications (2021–2025), ensuring that the findings reflect the most current evidence on AI applications, validation standards, and emerging diagnostic trends.
In contrast, several limitations should also be acknowledged. Restriction to English-language publications may have introduced language and publication bias, potentially excluding relevant evidence. The small number of included studies (n = 10) and their methodological heterogeneity—particularly in definitions of gold standards, outcome measures, and study design—limited comparability and prevented the conduct of a meta-analysis. Furthermore, most studies were preclinical or early diagnostic accuracy trials, with few addressing patient-centred outcomes such as lesion progression, treatment effectiveness, or long-term impacts on caries management, thereby limiting clinical relevance. Furthermore, small sample sizes, reliance on single-centre datasets, and lack of multicentre validation further restrict generalisability. Finally, potential bias may have arisen from the use of overlapping datasets or developer involvement in multiple studies, which could inflate diagnostic accuracy estimates. These limitations highlight the need for independent, multicentre studies employing standardised methods to strengthen the evidence base for AI-assisted dental diagnostics.
The findings of this scoping review highlight essential implications for dental practice. The integration of AI into intraoral radiographic diagnostics supports minimally invasive dentistry by enabling the earlier and more accurate detection of carious lesions, particularly at incipient stages. This allows clinicians to prioritise timely preventive interventions over invasive restorative treatments. AI-assisted diagnostics also offer the potential to standardise clinical decision-making, reducing inter-examiner variability and enhancing consistency in patient care across practitioners with differing levels of experience.
Successful adoption of AI in practice, however, requires robust clinician training and oversight to ensure that these tools are used as adjuncts to, rather than replacements for, clinical judgment. Embedding AI literacy into dental education and continuing professional development is therefore essential. Training should equip clinicians to interpret AI outputs critically, recognise potential biases, and address the ethical and practical challenges of AI-assisted care. Such educational investment will be pivotal to optimising patient outcomes, fostering clinician confidence, and ensuring responsible integration of AI into routine dental diagnostics.
This review highlights key priorities for advancing AI integration into dental caries detection. Standardising validation protocols is crucial; future research should include robust, universally accepted benchmarks such as histological gold standards, multicentre validation, and longitudinal follow-up in clinical settings. Such consistency would enhance comparability across studies and strengthen conclusions about diagnostic accuracy. Furthermore, prospective real-world clinical trials are necessary to assess AI systems in routine practice, considering feasibility, performance, and impacts across diverse populations, imaging techniques, and workflows to ensure generalisability.
Beyond diagnostic accuracy, future research should evaluate cost-effectiveness, usability, clinician and patient acceptability, and patient-centred outcomes, including lesion progression, treatment effectiveness, and quality of life. Exploring patient trust and satisfaction with AI-driven diagnostics represents a remarkably underexplored dimension. Comparative analyses of various AI architectures are also necessary to determine the most effective ones for radiographic caries detection, with semantic segmentation and explainable AI methods, such as Grad-CAM, showing potential. Research into AI as a primary screening tool or as an adjunct, supporting dental professionals in capturing and interpreting radiographs with less reliance on direct supervision, could guide its integration into diagnostic pathways, enhancing workflow efficiency while maintaining diagnostic standards.
This scoping review highlights the substantial diagnostic potential of AI integrated with intraoral digital radiographic imaging systems for detecting dental caries in adults. AI has shown promise, particularly in identifying early-stage and proximal lesions, thereby supporting minimally invasive and preventive treatment strategies.
However, the full realisation of AI’s clinical potential depends on overcoming key limitations, such as the lack of standardised external validation across diverse populations and clinical settings, and the need for comprehensive clinician training to ensure accurate interpretation of AI outputs and foster professional trust. AI should be regarded as a supportive tool that augments, rather than replaces, clinical expertise. Adopting this collaborative model, where AI enhances diagnostic precision, standardises care, and enables earlier interventions, offers a pathway to advancing minimally invasive dentistry and improving patient outcomes. Ultimately, the integration of AI into intraoral radiographic diagnostics represents a transformative step towards more accurate, efficient, and patient-centred dental care.
All supplementary files can be found in the external repository: “Data set and Prisma checklist-Artificial Intelligence Integrated with Intra-oral Digital Imaging in Dental Caries Detection, Treatment Planning, and Clinical Decision-Making: A Scoping Review” https://qmro.qmul.ac.uk/xmlui/handle/123456789/113231.
Data are available under the terms of the Creative Commons Zero “No rights reserved” data waiver (CC0 1.0 Public domain dedication)
This research was conducted as part of Sarah Kayali’s MSc dissertation at Queen Mary University of London. The review was completed under the supervision of Dr. Ali Golkari, whose guidance and support were invaluable throughout the project.
| Views | Downloads | |
|---|---|---|
| F1000Research | - | - |
|
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)