Keywords
anxiety, artificial intelligence, depression, interventions, systematic review
This article is included in the Artificial Intelligence and Machine Learning gateway.
Depression and anxiety remain major global mental health challenges that continue to increase across populations. Conventional treatments are often limited by cost, accessibility, stigma, and the availability of professionals. Artificial intelligence (AI)-based interventions have emerged as a potential approach to address these gaps. However, the growing body of evidence across diverse contexts calls for further synthesis. This study aims to examine research characteristics, evaluate effects, and analyse the implementation issues of AI-based interventions for depression and anxiety.
This systematic review was conducted in accordance with guidelines. Fourteen randomised controlled trials (RCTs) were identified from major databases, including Scopus, Web of Science, PubMed, and EBSCO, within the period from 6 November 2015 to 6 November 2025. Study quality was assessed using the Cochrane Risk of Bias 2 tool, and findings were synthesised using a narrative approach.
The findings indicate that AI-based interventions, such as chatbots, large language models, and integrated platforms, generally demonstrate effects in reducing symptoms of depression and anxiety across various populations. However, results remain heterogeneous, with some studies showing outcome-specific or within-group improvements only. Implementation issues were identified, including limited human support, recruitment bias, and short follow-up periods, which may reduce adherence, generalisability, and the assessment of long-term effects.
AI-based interventions may be potentially accessible and scalable mental health solutions, with outcomes comparable to conventional care in certain contexts. However, their effects are shaped by implementation-related challenges, including variability in engagement, technological limitations, and ethical considerations. Future research should prioritise more standardised methodologies, longer intervention durations with follow-up, and greater attention to implementation design and sustainability.
Registered in PROSPERO on 16 February 2026 (Registration number CRD420261308648). Available from: https://www.crd.york.ac.uk/PROSPERO/view/CRD420261308648.
anxiety, artificial intelligence, depression, interventions, systematic review
Depression and anxiety have become two of the most pervasive global mental health challenges, with prevalence rates continuing to rise across age groups and regions.1 These conditions not only diminish quality of life but are also associated with increased risks of chronic illness,2 impaired social functioning,3 and substantial economic burden on healthcare systems.4 Conventional treatment approaches, such as face-to-face therapy and psychiatric services, are often constrained by limited availability of mental-health professionals, high costs, stigma, and, not to mention, geographical barriers.1,5 These complexities underscore the urgent need for innovative strategies to expand the reach, accessibility, and effects of mental health interventions.6
Depression and anxiety are mental disorders that contribute to a significant portion of the global disease burden.7 The National Health Interview Survey shows that one in five adults experienced symptoms of depression (21.4%) and anxiety (18.2%) during the past two weeks.8 These disorders are caused by multiple factors, including biological, psychological, and social factors.9 These disorders often co-occur with physical problems, such as chronic physical pain, migraines, insomnia, low pain tolerance, extreme fatigue, and worsening physical and mental conditions.10 Conventional therapies such as Cognitive Behavioural Therapy (CBT) and medications such as antidepressants and anxiolytics are often used as treatment strategies.11 However, stigma, high costs, limited availability of mental health services, and long waiting times often lead individuals to seek self-help.12 To address this gap, AI offers 24/7 services, anonymity, and low costs. Through integration with CBT approaches, AI can help track mood, provide psychoeducation, and develop problem-solving skills through conversational interactions that mimic human interaction.13
Artificial Intelligence (AI)-based interventions have emerged as potential solutions to address the limitations of traditional mental-health services.14 These technologies can take various forms, including mobile applications, text-based chatbots, conversational agents, and even passive digital-behavior monitoring systems that help detect early signs of psychological stress.15 Studies have shown that AI-driven tools can deliver emotional support, psychoeducation, and cognitive-behavioral exercises in a consistent, scalable, and personalized manner.16 In cases of depression and anxiety, technology-based interventions are used to deliver more interactive and empathetic digital Cognitive Behavioural Therapy (CBT), such as AI chatbots (Therabot, ChatGPT, Psy-Bot, Woebot), Facebook Messenger, and mobile health applications (TEO).17–21 Early evidence indicates that certain AI-based approaches can produce clinically meaningful improvements and, in some contexts, perform comparably to or even better than conventional interventions,6,14,22 providing strong justification for further scientific investigation.
However, although previous studies have shown that AI-based conversational agents have a significant impact on reducing symptoms of depression and emotional distress,6,16,18,23 further research is needed to synthesise the effects of AI in reducing depression and anxiety. A systematic review conducted by Joshi et al.1 highlights AI-based interventions for anxiety and depression involving individuals with psychological problems as the population. However, the article search was conducted only through 2024 and included articles not indexed in Scopus. A systematic review of AI Chatbots was also conducted by Nyakhar & Wang,13 which focused on improving students’ psychological well-being, including anxiety and depression. However, there has been no comprehensive synthesis evaluating the effects of AI-based interventions in simultaneously reducing depression and anxiety in various populations. This study included research articles published in reputable Scopus-indexed journals, indicating high scientific quality.
The rapid and widespread integration of AI into digital health systems worldwide has accelerated its development. As AI-based mental health tools become increasingly integrated into telehealth platforms, they assist with patient monitoring, technology-enabled healthcare, diagnostic support, and data analysis,24 thereby enhancing their clinical impact. Although previous reviews have highlighted the effects of AI for depression and anxiety,13,25 a new systematic review across diverse populations and contexts is needed to assess the effects of AI-based interventions.26 This systematic review aims to determine the effects of AI-based interventions in reducing depression and anxiety, as well as to examine the implementation issues associated with these interventions.
This systematic review study was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines.27 Page et al.27 state that PRISMA reflects advances in methods for identifying, selecting, appraising, and synthesising studies. This study applied the PRISMA guidelines through four stages, namely (1) identifying research questions, (2) identifying literature sources, (3) conducting a literature search that answered the research questions, and (4) analysing the findings. The protocol was registered in the PROSPERO (International Prospective Register of Systematic Reviews) (CRD420261308648).
This systematic review will answer research questions regarding the effects of AI-based interventions on anxiety and depression. The three research questions include:
RQ 1. What are the effects of AI-based interventions in reducing depression and anxiety?
RQ 2. What are the characteristics and patterns of research on AI-based interventions for anxiety and depression in the last 10 years?
RQ 3. What are the Implementation Issues of AI-Based Interventions for Depression and Anxiety?
Eligibility criteria were developed based on the PICOS (Population, Intervention, Outcome, Comparator, and Study Design) framework.28 The population in this study is the general population, such as students, parents, patients, adults, and workers, considering that depression and anxiety are psychological conditions that are cross-demographic and not limited to specific clinical groups, thus enabling the evaluation of the effects of AI interventions in various real-world contexts for various users. The intervention criteria focus on AI, including AI chatbots, ChatGPT, AI-based platforms, and other AI-based interventions, to distinguish AI’s effects from that of conventional interventions. The comparator in this study was non-AI interventions or alternative AI-based designs used as a control condition. Outcome measures focus on psychological problems, namely anxiety and depression, as these two conditions are the most common psychological disorders and are most often targeted in AI-based interventions in the literature. We filtered the article based only on the study using a Randomised Controlled Trial (RCT) to maximise internal validity. Criteria excluded from the study were research that focused solely on reducing depression or anxiety; interventions that were not AI-based, such as digital or conventional interventions; cross-sectional studies; experimental studies other than RCTs; quasi-studies; case reports; non-empirical articles; or articles written in languages other than English.
The literature search was conducted across four main databases: Scopus, CINAHL (EBSCO), Web of Science (WoS), and MEDLINE (PubMed), all of which are indexed in Scopus. The search was conducted over the last 10 years (6 November 2015–6 November 2025) to ensure the studies remain relevant today. The literature search focused on articles relevant to the PICOS framework that discussed the population (general), intervention (AI-based), outcomes (depression and anxiety), and study design (experimental, RCT). The Boolean operators used were (“artificial intelligence” OR AI) AND (depression) AND (anxiety) AND (“randomised controlled trial”). In addition, the article search was limited to English-language articles. The inclusion and exclusion criteria are described in Table 1.
Study selection was conducted by five reviewers (NMAPP, IGAAIRPD, RSA, KPNT, and MWNTN) using Rayyan AI tools. Articles obtained from four databases were then imported into Rayyan and automatically deduplicated. Initial screening was conducted by four reviewers (NMAPP, KPNT, IGAAIRPD, and MWNTN), filtering abstracts for suitability to the PICOS framework. This was followed by full-text screening, yielding 14 final articles. When disagreements arose, the senior reviewer made the final decision on which articles were included (RSA).
The extraction stage was carried out using Notebook LM and manual extraction by the author. Data extraction was carried out by identifying 14 selected articles and extracting publication information (author name, year of publication, country, study design, sample size, Scopus quartile), population characteristics, sampling techniques, interventions (duration and type), and findings. Data synthesis in this study was conducted using a narrative approach to interpret and integrate the findings, due to the conceptual and methodological diversity of the included studies.29 The synthesis involved organizing study characteristics and results into comparative tables, identifying similar and recurring patterns, and examining relationships among interventions across the literature.29
Data synthesis was performed narratively by categorising the findings into three research questions: the characteristics and patterns of research on AI interventions for anxiety and depression over the last 10 years; the effects of AI-based interventions; and the implementation of AI-based interventions for anxiety and depression. The extraction was presented in tables and descriptive narratives. Reliability assessment was carried out by ensuring the suitability of articles according to the inclusion-exclusion criteria, conducting systematic selection using PRISMA guidelines, extracting data using a standard format, and providing direct citations for all articles. The 14 selected papers are described in Table 2.
| Citation | Author name and year | Publisher | Scopus quartile |
|---|---|---|---|
| 32 | (Akdogan et al., 2025) | Elsevier | Q1 |
| 38 | (Chen et al., 2025) | JMIR Publications | Q1 |
| 17 | (Wang et al., 2025) | JMIR Publications | Q1 |
| 36 | (Xu & Ma, 2025) | Elsevier | Q1 |
| 18 | (Heinz et al., 2025) | NEJM AI | Q1 |
| 33 | (Sharp et al., 2025) | JMIR Publications | Q1 |
| 37 | (Gan et al., 2025) | Wolters Kluwer | Q1 |
| 35 | (Zhao et al., 2024) | Wiley | Q1 |
| 19 | (Karkosz et al., 2024) | JMIR Publications | Q2 |
| 6 | (Sadeh-Sharvit et al., 2023) | JMIR Publications | Q1 |
| 20 | (Suharwardy et al., 2023) | Elsevier | Q2 |
| 21 | (Danieli et al., 2022) | JMIR Publications | Q1 |
| 40 | (Klos et al., 2021) | JMIR Publications | Q2 |
| 34 | (Fulmer et al., 2018) | JMIR Publications | Q1 |
The risk of bias in the selected studies was assessed using the Cochrane Risk of Bias 2 (RoB 2), which is designed to assess randomised controlled trials (RCTs). The assessment was conducted on five main domains, namely: (D1) bias from the randomisation process, (D2) bias due to deviations from the planned intervention, (D3) bias due to missing outcome data, (D4) bias in outcome measurement, and (D5) bias in the selection of reported results. For each domain, each study was then categorised as low risk, some concerns, or high risk.30
The classification was determined based on the signalling questions in RoB 2, including the information available in the research report. Then, the overall risk-of-bias assessment was conducted in accordance with the official RoB 2 guidelines, ensuring that the final decision remained consistent and accountable. To maintain consistency in the assessment, the assessment process was carried out systematically by recording the reasons behind each decision in each domain (e.g., whether the randomisation procedure was described, whether there was potential for intervention deviation, and so on). If unclear information was found in an article, it was noted as a consideration in determining the relevant domain category, in accordance with the principle of caution in RoB 2.31
Artificial intelligence (AI) holds potential for reducing symptoms of depression and anxiety due to its accessibility and complementary role in conventional care. Various AI tools, such as Large Language Model (LLM)-based agents, chatbots, and mobile applications have shown effects in reducing symptoms of depression and anxiety.26 Although AI shows significant effects in addressing mental health issues, it is not a substitute for professionals or therapists; rather, it is a complement, while safety and long-term effects still need to be considered.26 The effects of AI from the 14 studies included in this systematic review will be outlined in Table 4.
Most studies suggest that AI-based interventions may help reduce symptoms of depression and anxiety, although the evidence remains heterogeneous.6,18,32–36 Several controlled trials reported improvements in both outcomes; for example, ChatGPT-4.0 in digital counselling for cancer patients was associated with significant reductions in depression and anxiety compared with a control group.32 However, findings were not consistent across all studies. Some interventions were effects for only one outcome, such as Psy-Bot for depression but not anxiety,17 while ChatGPT in preoperative education reduced anxiety but not depression.37 In addition, several studies reported improvements only within groups, with no significant differences compared with control conditions.21,38 Overall, these findings indicate that AI-based interventions may be potential, but their effects appear to vary depending on population, intervention type, and study context.
Primary outcomes
The primary focus of this study was to measure the effects of AI interventions in reducing symptoms of mental disorders. Thus, the main outcomes were as follows: (1) reduction in depression symptoms measured using clinical scales such as the Patient Health Questionnaire-9 (PHQ-9), Centre for Epidemiologic Studies Depression Scale (CES-D), Hospital Anxiety and Depression Scale (HADS-depression), and Edinburgh Postnatal Depression Scale (EPDS), (2) reduction in anxiety symptoms measured using instruments such as the Generalised Anxiety Disorder-7 (GAD-7), HADS-Anxiety, State-Trait Anxiety Inventory (STAI), and perioperative anxiety, (3) clinically meaningful changes, namely by assessing whether AI-based testing tools can provide improvements in symptoms that are equal to or even greater than those achieved with conventional interventions.
Secondary outcomes
Secondary outcomes in this study extended beyond the use of AI in reducing symptoms of depression and anxiety, providing additional insights into improving life satisfaction and general well-being, as reported by 14.2% (2/14) of studies. A study conducted by Karkosz et al.19 revealed that the “Fido” application was not only effective in relieving anxiety symptoms but also significantly helped participants feel more satisfied with their daily lives. Other reported outcomes in this study included reduced loneliness, improved mood regulation, and enhanced social functioning, primarily among students following a short-term chatbot intervention.17,19,34,35
A systematic literature search was conducted27 using four sources, namely Web of Science, Scopus, PubMed, and EBSCO, yielding 355 articles. Deduplication was performed using Rayyan AI. 287 articles underwent title and abstract screening, of which 270 articles were excluded for not meeting the inclusion criteria, such as inappropriate study design (n = 59), irrelevant interventions (n = 124), not focusing on depression and anxiety (n = 39), retracted articles (n = 1), and review articles (n = 47). A total of 17 articles were read in full, but 3 articles with irrelevant study designs (n = 2) and high risk of bias (n = 1) did not meet the criteria. Thus, 14 studies were included in the final analysis, as illustrated in Figure 1. PRISMA Flow diagram of study selection.

This figure presents the study selection process conducted in accordance with the PRISMA guidelines. It shows the number of records identified through abstract screening, full-text articles assessed, and the final paper included in this review. The flowchart provides the process of the identification, screening, and inclusion stages of the review process.
This section presents the main characteristics of the studies included in the systematic review, providing an overview of the research context analysed. The presentation of these characteristics is an important component of systematic reviews, helping to understand variations in study design, population, and interventions that underlie the interpretation of results,27 as well as to explain the results of individual studies, as highlighted in previous systematic reviews.39 Table 3 summarises the research design, sample size, population characteristics, and sampling techniques, types of AI-based interventions, duration, and main outcomes reported.
| Citation | Author, year of publication | Study design | Sample size | Population characteristics | Sampling techniques | Intervention type | Duration | Primary outcomes |
|---|---|---|---|---|---|---|---|---|
| 32 | Akdogan et al, 2025) | Two-Center RCT | n = 150 (75 control, 75 intervention) | Chemotherapy-naïve cancer patients. Median age: 64 years; 53.3% female | Randomized 1:1 (ChatGPT vs control) | Chat GPT 4.0 | 3 months | Reduction in anxiety (HADS-anxiety) and depression (HADS-depression) score |
| 38 | (Chen et al., 2025) | Pilot RCT | n = 103 | Parents (general population) | Block randomization | AI Chatbot | 5 months | Reduction of anxiety (GAD-7) and depression (PHQ-9) levels |
| 17 | (Wang et al., 2025) | RCT | n = 100 (50 control, 50 intervention) | University students. Mean age = 20.8; 62% female | Randomized 1:1 (Intervention vs Waitlist) | AI Chatbot named “Psy-Bot” | 7 days | Depression (CES-D) and loneliness (UCLA Loneliness scale) and anxiety (GAD-7) |
| 36 | (Xu & Ma, 2025) | Open-label RCT | n = 84 (HSC vs LSC chatbot) | College students; aged 18–28 years; 51,2% male | SPSS random number generator | Neil, an Artificial Intelligence (AI)-driven chatbot | 16 weeks | Reduction in depression (PHQ-9) and anxiety (GAD-7) scores, including WAI-SR and CSQ-8 |
| 18 | (Heinz et al., 2025) | RCT | n = 210 (intervention 106, waitlist control group 104) | Mean age 33.86 years; 59,52% female; positive CHR-FED | Computer-generated sequence | Therabot, a text-based multithreaded chat | 4 weeks, with follow up 8 weeks | Changes in symptoms of MDD (PHQ-9), GAD (GAD-7), and CHR-FED (WCS) |
| 33 | (Sharp et al., 2025) | Two-arm RCT | n = 60 (intervention 30, control 30) | People on waitlists for eating disorder treatment. Age: ≥ 16 years | This multicenter 2-armed RCT | The ED ESSI chatbot | 4 months and three days | Eating disorder pathology |
| 35 | (Zhao et al., 2025) | RCT | n = 865 (intervention 269, control 388) | Mean age 20.59 years; 61,8% female | Simple randomization | Douyin companion bot | 28 days | Depression, anxiety, positive and negative moods |
| 37 | (Gan et al., 2025) | Single-blind, pilot RCT | n = 55 (intervention 27, control 28) | Patients with knee osteoarthritis. Age: 45–80 years | Single-blind, randomized controlled pilot study | ChatGPT 4.0 | 3 months | Perioperative anxiety and patient satisfaction |
| 19 | (Karkosz et al., 2024) | Two-arm, open-label RCT | n = 81 (intervention 40, control 41) | Participants with subclinical depression or anxiety | Two-arm, open-label RCT | Fido chatbot | 2 weeks intervention and 1 month follow up | Depression (CESD-R, PHQ-9), anxiety (STAI), worry tendencies (PSWQ), satisfaction with life (SWLS), and loneliness (R-UCLA) |
| 6 | (Sadeh-Sharvit et al., 2023) | RCT | n = 47 total adult consented (AI group n = 23; TAU group n = 24) | Adults with depression or anxiety. Mean age = 30.64 years; 72% female | Therapist-level randomization | AI Platform (Eleos Health) | 2 months | Feasibility and acceptability of AI platform, changes in depression (PHQ-9) and anxiety (GAD-7) symptoms |
| 20 | (Suharwardy et al., 2023) | Single center RCT | n = 192 (intervention 96, control 96) | Postpartum women aged ≥18 years: mean age 34 years | Block randomization | Woebot (mental health chatbot) | 6 weeks | Depression measured by PHQ-9 and EPDS |
| 21 | (Danieli et al., 2022) | RCT | n = 60 (SMT-CBT 16, SMT-CBT PHA 16, PHA 14, test only 14) | Active workers with stress and anxiety. Age ≥ 55 years: 78% female | RCT random number generator | Traditional psychotherapy CBT, AI agent, and TEO | 8 weeks | Symptoms related to stress, anxiety, and depression |
| 40 | (Klos et al., 2021) | Pilot RCT | n = 181 (82 control, 99 intervention), completers is 34 control and 39 intervention | College students. Age 18–33 years; 87,2% female | Simple randomization | Tess, an Artificial Intelligence (AI)-based chatbot | 8 weeks | Preliminary data comparison of depression (PHQ-9) and anxiety (GAD-7) symptoms, focusing on viability and acceptability |
| 34 | (Fulmer et al., 2018) | RCT | n = 74 (2 test n = 50, 1 control n = 24) | College students. Mean age 22.9 years; 70% female | Computer-based randomization | Tess, an Artificial Intelligence (AI)-based chatbot | group 1: 2 weeks, group 2: 4 weeks | Reduction of symptoms of depression (PHQ-9) and anxiety (GAD-7) and measured PANAS |
| Citation | Author, year | Intervention | Comparator | Primary outcomes | Outcome interpretation |
|---|---|---|---|---|---|
| 32 | Akdogan et al, 2025) | Chat GPT 4.0 | Standard clinician-led education group | Anxiety (HADS-anxiety) and depression (HADS-depression) | Effective for both outcomes |
| 38 | (Chen et al., 2025) | AI Chatbot | Nurse hotline | Anxiety (GAD-7) and depression (PHQ-9) | Significant within-group |
| 17 | (Wang et al., 2025) | AI Chatbot “Psy-Bot” | Waitlist control | Depression (CES-D) and loneliness (UCLA Loneliness scale) and anxiety (GAD-7) | Effective for depression only |
| 36 | (Xu & Ma, 2025) | Neil, AI- chatbot (text + voice + animations) | LSC group (text only) | Depression (PHQ-9) and anxiety (GAD-7) | Effective for both outcomes |
| 18 | (Heinz et al., 2025) | Therabot, a text-based multithreaded chat | Waitlist | MDD (PHQ-9), GAD (GAD-7), and CHR-FED (WCS) | Effective for both outcomes |
| 33 | (Sharp et al., 2025) | The ED ESSI chatbot | Web-based information | Eating disorder pathology (EDE-Q), Psychosocial impairment (CIA), depression, anxiety, stress (DASS-21) | Effective for both outcomes |
| 35 | (Zhao et al., 2025) | Douyin companion bot | Waiting list group | Depression (PHQ-9), anxiety (GAD-7), positive and negative moods (PANAS) | Effective for both outcomes |
| 37 | (Gan et al., 2025) | ChatGPT 4.0 | Traditional physician explanation | Anxiety/Depression (HADS), Perioperative Apprehension Scale-7 (PAS-7), and Visual Analogue Scales for Anxiety (VAS-A, VAS-P) | Effective for anxiety only |
| 19 | (Karkosz et al., 2024) | Fido chatbot | Self-help book | Depression (CESD-R, PHQ-9), anxiety (STAI), worry tendencies (PSWQ), satisfaction with life (SWLS), and loneliness (R-UCLA) | Both groups improved; null between groups effect |
| 6 | (Sadeh-Sharvit et al., 2023) | AI Platform (Eleos Health) | Treatment as usual | Depression (PHQ-9) and anxiety (GAD-7) symptoms | Effective for both outcomes |
| 20 | (Suharwardy et al., 2023) | Woebot (mental health chatbot) | Usual postpartum care | Depression measured by PHQ-9 and EPDS | Effective for depression only |
| 21 | (Danieli et al., 2022) | AI agent and TEO | Traditional therapy | Stress, anxiety, and depression | Null between-group; some within-group improvements |
| 40 | (Klos et al., 2021) | Tess, (AI)-based chatbot | Psychoeducation book | Depression (PHQ-9) and anxiety (GAD-7) | Null between-group; anxiety decreased within group |
| 34 | (Fulmer et al., 2018) | Tess, (AI)-based chatbot | The information-only | Depression (PHQ-9), anxiety (GAD-7), and PANAS | Effective for both outcomes |
The study designs of the 14 articles were predominantly randomised controlled trials (RCTs) (n = 14), encompassing variations such as two-centre RCTs,32 single-centre RCTs,20 and two-arm RCTs.19,33 Some studies were designed as quasi-RCT or pilot RCT designs,37,38,40 while other studies used RCTs.6,17,18,21,34,35
Fourteen articles published between 2015 and 2025 consistently examined the effects of AI-based interventions in reducing depression and anxiety. The distribution of publications across years was as follows: 2018 (7%), 2021 (7%), 2022 (7%), 2023 (15%), 2024 (7%), and the majority in 2025 (57%). Four studies were conducted in the United States6,18,20,34 and one study was conducted in Argentina.40 Studies in Europe were conducted in Poland19 and Italy.21 Studies in Asia were conducted in Turkey,32 Hong Kong,38 and China.17,35–37 Oceania was represented by one study in Australia.33 Figure 2(A) presents the distribution of publication years based on 14 selected journals from 2015 to 2025. Meanwhile, Figure 2(B) illustrates the geographical distribution of studies on AI-based interventions for depression and anxiety between 2015 and 2025.

This figure illustrates the distribution of the 14 studies included in the review according to their year of publication between 2015–2025. (B). Geographical distribution of studies. This figure shows the countries in which the included studies on AI-based interventions for depression and anxiety were conducted.
The sample sizes across the 14 studies ranged from small to moderate. Moderate-sized samples included more than 500 participants (n = 865),35 while other studies involved fewer than 500 participants. Gender distribution varied across studies, with most studies stated that females were the dominant population. The age range spanned from adolescents and young adults (university students) to adults and the elderly. The populations included were heterogeneous, such as students, patients with specific medical conditions (e.g., cancer, knee osteoarthritis, postpartum mothers), individuals with specific psychological problems (eating disorders, depression and anxiety, and work-related stress), and general populations such as parents, adults, and workers.
These digital interventions take various forms and are designed to address the limitations of traditional mental health services. The identified digital interventions include: (1) chatbots and conversational agents, which are the most common forms, including text-based applications such as AI Chatbot, Tess, Woebot, Psy-Bot, Fido, Therabot, and ED ESSI17–20,33,34,36,38,40; (2) large language models (LLM), which are technologies such as ChatGPT (version 4.0) which are used as digital counselling agents or companions to provide medical information and emotional support32,35,37; (3) integrated AI platforms, such as the Eleos Health system, which supports conventional therapy by monitoring patient progress and improving therapist efficiency6; and4 passive behaviour monitoring systems, which detect early signs of psychological stress through passive digital behaviour tracking.21 The visualisation of interventions from the selected articles is presented in Figure 3.
The duration of AI-based tool use in the review was categorised into three time frames: (1) short term (7 days to 4 weeks), in which interventions were designed to provide rapid emotional support or triage; for example, Psy-Bot was used for 7 days, Tess for 3–4 weeks, and LLM-based chatbots for 28 days. Interventions using Socratic questioning and Therabot were also conducted for a duration of 2–4 weeks. (2) Medium term (6 weeks to 3 months), which is typically used to assess more stable clinical effects; for example, Woebot was used for 6 weeks, Tess and the TEO platform for 8 weeks, and the Eleos Health platform for 2 months. The use of ChatGPT 4.0 in a medical context (e.g., cancer and orthopaedic patients) was generally used for 3 months. (3) Long-term (more than 4 months) which involves more complex or monitoring-based with longer durations; for example, the Neil chatbot (16 weeks), and the ED ESSI chatbot (over 4 months). The duration of interventions from the 14 selected articles is visualised in Figure 4.
Figure 5(B) summarises the risk-of-bias assessment for the 14 trials included in this review. Overall, 9 studies were classified in the “some concerns” category (64.3%), 3 studies were assessed as low risk (21.4%), and 2 studies (14.3%) were judged to have a high risk of bias. These findings indicate that although the available evidence is generally potential, several studies still present methodological limitations that should be interpreted with caution.

This figure summarizes the risk of bias assessments for each included study across the evaluated domains. (B). Distribution of risk of bias across studies. This figure illustrates the proportion of studies rated as low risk, some concerns, or high risk across each risk of bias domain and overall risk of bias.
Across domains, the most notable limitation was bias arising from deviations from intended interventions (D2), which was the only domain contributing to the high-risk ratings in this review. In contrast, bias due to missing outcome data (D3) was less problematic, with most studies classified as low risk in this domain. For the remaining domains—bias arising from the randomisation process (D1), bias in measurement of the outcome (D4), and bias in selection of the reported result (D5)—the most common judgement was “some concerns”, generally reflecting incomplete or unclear reporting of methodological procedures rather than clear evidence of serious bias. The detailed distribution of risk-of-bias judgements is presented in Figures 5(A) and 5(B).
Most studies employed passive control conditions (e.g., usual care or waitlist), which may limit the ability to control for placebo effects. Only a small number of studies used active control groups (e.g., psychoeducation, books, or nurse hotlines),19,38,40 which may reduce inferential strength. Moreover, only one study involved a therapist in a face-to-face setting when delivering AI-based CBT,6 while another study involved direct responses from a physician.37 Human support, however, appears to play an important role in influencing adherence and intervention effects.
Digital and social media–based recruitment methods tend to attract self-selected, technologically literate populations. As a result, there is a risk of selection bias, whereby the findings may not fully represent populations with lower digital literacy or those with limited access to devices due to economic constraints. In addition, study samples are often drawn from a single population segment. For example, postpartum studies may include only women without severe depression, while eating disorder studies may recruit only adolescents on waiting lists, thereby limiting generalisability. Furthermore, follow-up periods are relatively short, typically ranging from 2–8 weeks, making it difficult to assess long-term effects.
Heinz et al.18 highlighted another issue related to engagement, characterised by a decline in user participation over time (i.e., low retention). This pattern is often attributed to a “novelty effect”, which diminishes after the initial sessions. In addition, Sharp et al.33 indicated challenges in integrating chatbots into standard clinical care systems, particularly for patients on waiting lists. Other findings suggest that the transition from rule-based chatbots to those based on large language models (LLMs), such as ChatGPT, introduces new challenges related to personalisation and safety. Although generative AI enables more natural interactions, concerns regarding data privacy and the potential for medical “hallucinations” remain prominent in implementation within formal healthcare settings.32
This review suggests that AI-based interventions have the potential to reduce symptoms of depression and anxiety in both general and clinical populations. Several studies reported short-term symptom improvement, indicating that AI may be considered a supportive tool in mental health services.6,18,34,38 However, these findings should be interpreted with caution due to the limited number of studies and the substantial heterogeneity observed. While some interventions demonstrated greater symptom reduction compared to standard care, others reported non-significant results. For example, text-based interventions did not consistently provide additional benefits compared to established self-help approaches.19
Design factors play an important role in determining intervention effects. Approaches that incorporate richer social cues, such as voice or visual elements, tend to produce better outcomes than purely text-based approaches.35,38 The sustainability of intervention effects also warrants attention. Several studies have reported a decline in effects during follow-up periods, particularly in the absence of human support.17 In addition, technical limitations, such as repetitive responses and failures in user intent recognition, may disrupt the therapeutic alliance and reduce user adherence.19
This review has several limitations that should be considered. The relatively small number of included studies, combined with substantial heterogeneity in intervention types, study designs, and outcome measures, limits the generalisability of the findings. In particular, the reviewed studies encompassed a wide range of AI approaches, including rule-based chatbots such as Tess, large language models (LLMs) such as ChatGPT-4.0 and Therabot, AI platforms that support clinical practice such as Eleos Health, and passive behavioural monitoring systems that detect or estimate psychological conditions through digital behavioural data. These differences suggest that each technology operates through distinct mechanisms, varies in its level of autonomy, and is applied to different clinical purposes. Accordingly, the core components contributing to intervention effects may not be consistent across studies. As a result, differences in effect outcomes between studies may not solely reflect whether an intervention is effective, but also the diversity of technologies being evaluated. Therefore, this review is more appropriately understood as an examination of diverse forms of AI-based mental health interventions, rather than an evaluation of a single uniform AI model.
Despite the effect of AI-based interventions in reducing symptoms of depression and anxiety, their real-world implementation remains constrained by several practical challenges. User engagement is a recurring concern, as many interventions demonstrate strong short-term outcomes but declining adherence over time, suggesting a novelty effect and limited sustained interaction. In addition, technological limitations, particularly in simpler chatbot systems, such as repetitive responses, limited contextual understanding, and failures in intent recognition, may weaken user trust and reduce the quality of the therapeutic experience. These issues highlight that effects observed under controlled conditions does not always translate directly into consistent real-world use. Another limitation of the statistical findings in most of the studies was the absence of reported confidence intervals, which limited the accuracy of interpretations regarding effect sizes. In addition, recruitment procedures lacked standardisation, allowing for the possibility of confounding variables that may have influenced the findings. Although all 14 studies were published in Scopus Q1–Q2 indexed journals, their findings should still be interpreted with caution due to methodological limitations, as the majority were rated as having “some concerns” regarding risk of bias. Additionally, two of the fourteen studies were classified as having a high risk of bias. Therefore, future research is recommended to employ more rigorous selection and randomisation procedures in order to strengthen the findings.
Beyond technical and behavioral factors, implementation is further shaped by clinical, ethical, and contextual constraints. The integration of AI into existing healthcare workflows remains limited, with unclear role definitions between AI systems and human practitioners, often positioning AI as a supplementary rather than a fully embedded tool. At the same time, concerns related to data privacy, clinical safety, and accountability persist, particularly in high-risk situations where AI may not adequately respond to severe psychological distress. Furthermore, the predominance of studies conducted in digitally literate populations raises questions about generalisability across broader and more diverse contexts. These findings suggest that successful implementation depends not only on technological capability but also on sustained engagement design, ethical safeguards, and alignment with clinical practice.
This systematic review has several clear strengths. First, its compilation follows PRISMA guidelines, making the review more transparent, organised, and easier to trace. Second, this review deliberately focuses only on RCTs, yielding higher-quality evidence than when study designs are mixed. Third, the review focuses not only on mental health in general, but specifically examines AI interventions that target two outcomes simultaneously: depression and anxiety. Fourth, the scope of AI interventions is also quite broad, ranging from chatbots and large language models to integrated AI platforms, machine learning-based prediction systems, and passive behaviour monitoring. This prevents the interpretation of results from being too “narrow” to a single technology type. Finally, this review does not stop at summarising the results; it also includes a formal assessment of study quality using the Cochrane Risk of Bias 2 (RoB 2) tool. By assessing potential bias across five domains, the reported results are more “fair” to read, making it easier to identify which findings are strong and which require more careful interpretation. This summary of strengths is then clarified through the visualisation in Figure 6.
Although this review has several strengths, it also has several limitations that researchers need to acknowledge. First, the included studies show considerable heterogeneity in the types of AI interventions, exposure durations, outcome measurement instruments, and participant characteristics. Second, although all studies used an RCT design, implementation quality was not always consistent. The risk of bias assessment results show that most trials remain in the ‘some concerns’ category, and a small number are even at high risk. Third, the scope for generalising the findings also appears to be limited. The majority of trials were conducted in middle- and high-income countries, with participants who tended to be younger, mostly female, and well-versed in digital literacy. Fourth, most studies had relatively short to medium follow-up periods, so the sustainability of the effects has not been fully addressed. Finally, this review included only English-language publications, so there may be a language bias, and some relevant studies in other languages may not have been accessible. The study’s limitations will be illustrated in Figure 7.
Several recommendations were made in the studies included in this review. First, future research should involve larger samples and more diverse populations. Second, the effects of conventional therapy should be compared with that of technology-based, face-to-face complementary therapies, such as integrating virtual reality, teletherapy, website-based therapy, and other AI interventions. Third, longer intervention durations are accompanied by follow-up sessions to assess the intervention’s long-term effects. Fourth, attention to participant safety and to effects testing procedures conducted in accordance with strict protocols. Fifth, providing interventions for higher-risk and more severe clinical disorders. Sixth, exploring the sustainable impact of the interventions provided. Finally, increasing human involvement with AI to enhance treatment impact, user satisfaction, and intervention usefulness. Recommendations from the 14 articles included in the study are explained in Figure 8.
AI-based interventions show potential effects for reducing symptoms of depression and anxiety; however the current evidence remains preliminary and heterogeneous. The reviewed studies varied substantially in intervention type, study design, population, and implementation context, and several raising concerns regarding risk of bias. Accordingly, the findings should not be interpreted as evidence that AI is broadly superior to standard care. Rather, AI appears to be a potentially useful supportive approach, with effects dependent on context, therapeutic design, and implementation quality. Future studies should employ more rigorous and standardised methods, include more diverse populations, and report long-term, safety, and implementation outcomes more clearly.
All data and materials supporting the findings of this systematic review, including the PRISMA flow diagram, PRISMA checklist, and extracted data, are openly available in Open Science Framework (DOI: https://doi.org/10.17605/OSF.IO/7FAUP)41 under a CC-By Attribution 4.0 license.
The author would like to express their gratitude to the Indonesia Endowment Fund for Education (LPDP) and the Ministry of Finance, Republic of Indonesia, for funding these master’s and doctoral studies. The author also sincerely thanks Adhan Efendi, M.Pd., for his valuable insights and constructive feedback during the preparation of this manuscript.
| Views | Downloads | |
|---|---|---|
| F1000Research | - | - |
|
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)