Depression diagnosis using Artificial Intelligence: a systematic review

Martín Di Felice; Ilan Trupkin; Ariel Deroche; María Florencia Pollo Cattaneo; Parag Chatterjee

doi:10.12688/f1000research.158443.1

Home Browse Depression diagnosis using Artificial Intelligence: a systematic review

ALL Metrics

-

Views

-

Downloads

Get PDF

Get XML

Export

▬

✚

Systematic Review

Depression diagnosis using Artificial Intelligence: a systematic review

[version 1; peer review: 2 approved with reservations]

Martín Di Felice¹, Ilan Trupkin¹, Ariel Deroche¹, María Florencia Pollo Cattaneo ¹, Parag Chatterjee ^1,2

Martín Di Felice¹, Ilan Trupkin¹, [...] Ariel Deroche¹, María Florencia Pollo Cattaneo ¹, Parag Chatterjee ^1,2

PUBLISHED 20 Dec 2024

Author details Author details

¹ GEMIS, Universidad Tecnológica Nacional, Buenos Aires, Argentina
² Department of Biological Engineering, Universidad de la República, Montevideo, Uruguay

Martín Di Felice
Roles: Data Curation, Formal Analysis, Investigation, Methodology, Software, Validation, Visualization, Writing – Original Draft Preparation

Ilan Trupkin
Roles: Formal Analysis, Investigation, Validation, Visualization

Ariel Deroche
Roles: Formal Analysis, Investigation, Validation, Visualization

María Florencia Pollo Cattaneo
Roles: Conceptualization, Investigation, Methodology, Project Administration, Supervision, Writing – Review & Editing

Parag Chatterjee
Roles: Conceptualization, Investigation, Methodology, Project Administration, Supervision, Writing – Original Draft Preparation, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

This article is included in the AI in Medicine and Healthcare collection.

Abstract

Background

Depression is a prevalent mental health disorder that affects a significant proportion of the global population, posing a major public health challenge. In recent years, the application of Artificial Intelligence (AI) to mental health diagnosis has garnered increasing attention. This systematic review aims to provide a comprehensive overview of the current state of research on AI-based approaches for depression diagnosis, identifying both advancements and gaps in the literature that can guide future studies.

Methods

A comprehensive search was conducted across leading research databases to identify relevant studies published up to July 2024. A combination of automated and manual filtering was employed to refine the initial set of records. Eligibility criteria were applied to ensure that only studies directly addressing the use of AI for depression diagnosis were included in the final analysis.

Results

The initial search yielded 1,179 records. Following a rigorous selection process, 145 studies were deemed eligible for inclusion in the review. These studies represent a diverse array of AI techniques and data sources, with a predominant focus on supervised learning algorithms. The most common data sources were social networks, followed by clinical data integrated with psychological assessments.

Conclusion

The results highlight the growing interest in leveraging AI for depression diagnosis, particularly through the use of supervised learning methods. Social network data has emerged as the most frequently used data source, though clinical data combined with validated psychological tests remains a key area of focus. Despite these advancements, several challenges persist, including data availability and quality, which present opportunities for future research to improve diagnostic accuracy and generalizability.

Keywords

Artificial Intelligence, depression, systematic review, machine learning, mental health

Corresponding authors: María Florencia Pollo Cattaneo, Parag Chatterjee

Competing interests: No competing interests were disclosed.

Grant information: The author(s) declared that no grants were involved in supporting this work.

Copyright: © 2024 Di Felice M et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Di Felice M, Trupkin I, Deroche A et al. Depression diagnosis using Artificial Intelligence: a systematic review [version 1; peer review: 2 approved with reservations]. F1000Research 2024, 13:1549 (https://doi.org/10.12688/f1000research.158443.1) First published: 20 Dec 2024, 13:1549 (https://doi.org/10.12688/f1000research.158443.1) Latest published: 20 Dec 2024, 13:1549 (https://doi.org/10.12688/f1000research.158443.1)

Introduction

Depression is one of the most prevalent mental health disorders globally, as evidenced by recent studies, with an estimated lifetime prevalence affecting up to 17% of the population.^1,2 The World Health Organization (WHO) defines depression as a condition characterized by a persistently low mood or loss of pleasure in activities, lasting for extended periods of time.³ This duration is generally accepted to be at least two weeks or longer, according to several sources.^4,5 While depression manifests in various forms—such as persistent depressive disorder, perinatal depression, and seasonal affective disorder⁶—the term “depression” typically refers to Major Depressive Disorder (MDD), which is the primary focus of this study.

MDD often goes undiagnosed, especially in regions with limited healthcare resources or a shortage of mental health professionals.^7–9 However, even in areas with sufficient access to healthcare, depression can remain untreated as individuals may downplay or hide their symptoms.¹⁰ In such cases, automated tools that can assist in accurate diagnosis may play a crucial role in improving detection and early intervention.

Artificial Intelligence (AI) represents a promising avenue for addressing these diagnostic challenges. AI, a field of computer science dedicated to solving complex, nonlinear problems,¹¹ has experienced rapid growth in recent years. Although AI algorithms have existed for decades,¹² their recent surge can be attributed to two major factors: the significant increase in computational power and the widespread availability of digital data.¹³ These algorithms are capable of processing vast quantities of data to identify patterns and relationships, leading to breakthroughs in many scientific and technological fields, including healthcare and mental health.¹⁴

AI can leverage different types of data, including multimedia data (such as images, audio, and video) and alphanumeric data (such as written text or clinical records). This review focuses on AI applications that utilize alphanumeric data for diagnosing MDD. This focus is driven by the accessibility of alphanumeric data compared to multimedia data, which has resulted in a greater availability of relevant datasets. Additionally, the widespread availability of these datasets facilitates the potential development of AI tools for real-world diagnostic applications. Among the alphanumeric datasets, sources include unstructured text from social media posts, mobile text messages, and psychological interviews, as well as structured data from electronic health records, standardized psychological questionnaires, and data automatically extracted from mobile devices.

The objective of this systematic review is to provide a comprehensive overview of the current state-of-the-art in AI-based diagnostic tools for MDD using alphanumeric data. By mapping existing research, this review aims to identify gaps in the field, offering insights for future research directions. The research questions outlined in Table 1 guide the analysis of the selected studies.

Table 1. Research questions.

ID	Question
RQ1	What types of datasets are used?
RQ2	How are those datasets classified?
RQ3	What AI algorithms are currently being used?
RQ4	What is the typical performance of the algorithms?
RQ5	What are the future directions for research in this area?
RQ6	What are the key drawbacks or disadvantages mentioned by authors?

Following this introduction, the work proceeds with the Methods section, detailing the systematic review process. The Results section presents key findings from the analyzed studies. Finally, the Discussion section provides conclusions and addresses the implications of the results.

Methods

The selection of articles for this systematic review was conducted in two stages: an automated search phase followed by a manual screening process. The first phase involved the use of a custom-developed computational tool to retrieve all relevant research studies that met predefined inclusion criteria. The second phase, requiring manual intervention, applied exclusion criteria that could not be automated.

Automated search

The automated tool was programmed to search across three widely-used academic databases:

• IEEE Xplore
• PubMed
• Scopus

The search was conducted on July 1, 2024, and focused on articles published since 2015. The following keyword combinations were used:

• “artificial intelligence” AND “depression diagnosis”
• “artificial intelligence” AND “depression detection”
• “artificial intelligence” AND “depression diagnostic”
• “artificial intelligence” AND “depression estimation”
• “machine learning” AND “depression diagnosis”
• “machine learning” AND “depression detection”
• “machine learning” AND “depression diagnostic”
• “machine learning” AND “depression estimation”
• “deep learning” AND “depression diagnosis”
• “deep learning” AND “depression detection”
• “deep learning” AND “depression diagnostic”
• “deep learning” AND “depression estimation”
• “artificial intelligence” AND “depressive disorder diagnosis”
• “artificial intelligence” AND “depressive disorder detection”
• “artificial intelligence” AND “depressive disorder diagnostic”
• “artificial intelligence” AND “depressive disorder estimation”
• “machine learning” AND “depressive disorder diagnosis”
• “machine learning” AND “depressive disorder detection”
• “machine learning” AND “depressive disorder diagnostic”
• “machine learning” AND “depressive disorder estimation”
• “deep learning” AND “depressive disorder diagnosis”
• “deep learning” AND “depressive disorder detection”
• “deep learning” AND “depressive disorder diagnostic”

The search targeted a broad range of publication types, including conference papers, journal articles, and book chapters, across the title, abstract, and keyword fields of each document.

Manual screening

Following the automated search, a manual review process was undertaken to apply exclusion criteria that could not be addressed programmatically. This step involved a detailed review of each study by three independent reviewers. The following exclusion criteria were applied:

• The study must propose a method to diagnose Major Depressive Disorder (MDD). If the method also addressed other disorders, only the portion relevant to depression was considered.
• The study must employ Artificial Intelligence (AI) for diagnostic purposes.
• The dataset utilized must consist solely of alphanumeric data.
• The study must be written in English.
• Duplicate studies were excluded, with preference given to the most recent version in the case of multiple similar publications.

Data extraction

Once the inclusion and exclusion criteria were applied, the remaining studies were independently reviewed by three screeners to extract relevant data for analysis. In addition to metadata automatically retrieved by the computational tool (e.g., publication year, authors, keywords), a full reading of each article was conducted to collect the following key information:

Datasets

• Number of datasets used in each study.
• Type of datasets, classified into five categories: Social Networks, Clinical Data, Mobile Data, Interviews, and Text Messages.
• Whether the datasets were balanced or unbalanced.

Algorithms

• The number and types of AI algorithms employed.
• The learning techniques (e.g., supervised, unsupervised) used in each study.

Validation methods

• The method used to validate whether the subject had depression, categorized into: Questionnaires, Interviews, Keywords, Self-reported symptoms, and Sentiment Analysis.

Geographic distribution

• The country of origin, determined by the primary author’s nationality or the research institution to which they belong.

Results

• A summary of the results, including metrics used to evaluate the performance of the algorithms.

Future research directions

• Suggestions for future research, as indicated by the authors.

This data collection process aimed to provide a comprehensive overview of the current state of research on AI-based diagnosis of MDD using alphanumeric data.

Results

The different steps of the systematic review are summarized in Figure 1, which outlines the process of narrowing down the initial 1179 studies returned by the automatic tool to the final set of 145 studies included in this research. This filtering process followed the PRISMA guidelines, which ensure a transparent and systematic approach to study selection, incorporating eligibility criteria such as relevance, methodological rigor, and duplication removal.

Figure 1. Flow diagram of the study identification.

Studies by year

An analysis of the selected studies reveals trends over time. In Figure 2, the number of studies grouped by year is presented, showing an increasing interest in the use of AI for depression diagnosis over the last decade. The rising peak of publications coincides with the rise of social media platforms, mobile health applications, and significant advancements in AI techniques like machine learning and natural language processing.

Figure 2. Studies per year.

Studies by country

Figure 3 depicts the countries from which the selected studies originate, determined by the first author’s institution. The distribution highlights a concentration of research in countries like the United States, India, and China. Some other countries in different continents also feature prominently, reflecting regional efforts to tackle mental health challenges through AI. However, there is a notable gap in research from lower-income countries, which may reflect the disparity in resources and infrastructure for AI and mental health research.

Figure 3. Studies per country.

Dataset analysis

One of the key areas of analysis focused on the datasets used in the reviewed studies. These datasets were categorized based on the source and type of data they contained, as illustrated in Figure 4. The classification was performed based on the authors’ declarations, given that many datasets are not publicly available. This often presents a challenge for reproducibility in future studies.

Figure 4. Dataset types.

The dataset types were grouped into five categories:

• Social Networks: These datasets typically consist of unstructured public data from user-generated content such as posts, comments, and reactions. Social networks offer a rich source of spontaneous user behavior, though the data’s unstructured nature requires significant preprocessing. Despite their availability, the reliability of these datasets is sometimes questioned due to ethical concerns regarding user consent.
• Clinical Data: This includes structured data obtained from electronic health records (EHRs), which often combine medical information with demographic details. Such datasets are generally more reliable and precise but are limited in availability due to privacy concerns and data access restrictions.
• Mobile Devices: These datasets include data collected through mobile devices, such as smartphones and wearable technology. The rise of mobile health (mHealth) apps has contributed to this category, where continuous tracking and health monitoring data are utilized to infer depressive tendencies.
• Interviews: Unlike clinical data, interview-based datasets are unstructured but provide qualitative insights. They are usually transcriptions of interactions with healthcare professionals, offering rich context but requiring manual labeling and interpretation.
• Text Messages: These datasets include private text communications, often shared voluntarily. Although these resemble social network data, the private nature of text messages means they are not publicly available. This limits their widespread use but provides a direct line of communication that can capture candid expressions of depressive symptoms.

Figure 4 shows how the studies that are part of this review use the different dataset types.

The analysis reveals that most of the datasets used in the reviewed studies come from social networks, followed by clinical data. However, only a few studies made their datasets publicly available, underscoring a challenge in research replication and model validation.

Dataset validation

In terms of validation techniques, the reviewed studies adopted various methods depending on the nature of the dataset, as summarized in Figure 5. The datasets were validated using five primary methods:

• Questionnaires: Structured questionnaires, such as the PHQ-9 or BDI, were used to label data. This method was the most commonly used (41.9% of studies), ensuring consistency in diagnosing depression but potentially limiting real-world generalizability.
• Experts: In cases where data were unstructured (e.g., social media posts), mental health professionals validated the data by classifying whether a record could be associated with depressive symptoms. This approach was employed in 21.6% of the studies.
• Keywords: Some studies employed keyword-based validation, where the presence of specific terms associated with depression was used to label data. While simple, this method lacks the nuance of more sophisticated validation techniques
• Sentiment analysis: Advanced sentiment analysis tools were used in some studies to gauge the emotional tone of text data. This approach moves beyond keyword matching to provide a more comprehensive analysis of emotional states.¹⁵
• Self-reported: In studies with large datasets, participants self-reported their depression status, often in conjunction with using mobile health apps. While scalable, this method may introduce biases due to the subjective nature of self-reports.

Figure 5. Validation types.

Depressive class distribution

A third analysis focused on the distribution of depressive versus non-depressive subjects within the datasets, as shown in Figure 6. This analysis is crucial for understanding the challenges of imbalanced datasets, a common issue in depression diagnosis where the proportion of depressive individuals in the population is much lower than the non-depressive. The analysis revealed:

• 0-15%: The majority of datasets in this range reflect real-world imbalances, as the prevalence of depression in the general population is often below 15%.
• 15-40%: These datasets still show a majority of non-depressive cases but with a more balanced distribution.
• 40-60%: A few datasets are nearly balanced in terms of depressive and non-depressive subjects.
• >60%: Some datasets contain an artificial overrepresentation of depressive cases, often achieved through oversampling techniques to balance the data for model training.

Figure 6. Depressive class percentage.

Many studies applied balancing techniques such as oversampling, undersampling, or synthetic data generation to address the issue of imbalanced datasets.

Algorithmic approaches

The analysis of algorithms used in the reviewed studies is summarized in Figure 7. Machine learning algorithms were a critical component of most studies, and the choice of algorithm often depended on the dataset type and the learning task at hand. The most commonly used algorithms were:

• ANN (Artificial Neural Networks): Popular for their ability to handle large, unstructured datasets like those from social media.
• NLP (Natural Language Processing): Used primarily for text-based data, including social network posts and text messages.
• SVM (Support Vector Machines): Effective in handling high-dimensional data and commonly applied to clinical datasets.
• RF (Random Forest): A versatile algorithm used for both structured and unstructured data, known for its robustness against overfitting.
• LR (Logistic Regression): Widely applied for binary classification tasks, particularly when the goal is to predict whether a subject is depressive or non-depressive.

Figure 7. Algorithms.

AB: Adaptive Boosting, ANN: Artificial Neural Network, DT: Decision Tree, EN: Elastic Net Regularization, GB: Gradient Boosting, KNN: K Nearest Neighbors, LGBM: Light Gradient Boosting Machine, LR: Logistic Regression, NB: Naive Bayes, NLP: Natural Language Processing, RF: Random Forest, SVM: Support Vector Machines, XGB: Extreme Gradient Boosting.

The studies were divided into supervised and unsupervised learning approaches, as shown in Figure 8. Supervised learning dominated the field, with 93.2% of the studies relying on labeled data. Only 6.1% of studies used unsupervised learning, and one study by Choi et al.¹⁶ implemented a semi-supervised approach.¹⁷ This method used a small portion of labeled data to fine-tune clusters generated by an unsupervised algorithm, showing promising results despite the limited labeled data.

Figure 8. Learning types.

Performance metrics

The reviewed studies used a variety of metrics to evaluate the performance of their models, as summarized in Table 2. The most commonly used metrics included:

• Accuracy (57% of studies): The average accuracy was 0.82, indicating good overall performance, but accuracy alone may not be sufficient in the case of imbalanced datasets.
• F1 Score (52% of studies): With an average of 0.75, this metric provides a balance between precision and recall, making it especially useful for imbalanced datasets.
• Precision (45% of studies): At an average of 0.78, precision measures the proportion of correctly predicted positive cases out of all predicted positives.
• Recall (34% of studies): Averaging 0.74, recall measures the proportion of actual positives correctly identified by the model.
• AUC (31%) of studies: The Area Under the Curve metric averaged 0.75, indicating good discriminative ability.
• Specificity and sensitivity (17% of the studies): Both metrics were used less frequently but provided insight into the model’s ability to correctly identify negative and positive cases, respectively.

Table 2. Used metrics.

Metric	Presence in articles	Average value
Accuracy	57%	0.82
F1	52%	0.75
Precision	45%	0.78
Recall	34%	0.74
AUC	31%	0.75
Specificity	17%	0.69
Sensitivity	17%	0.74

Several studies opted for regression-based models, measuring outcomes using metrics like Mean Absolute Error (MAE) and Root-Mean-Square Error (RMSE). These models focused on predicting depression levels on a continuous scale rather than binary classification. Chatterjee et al.¹⁸ and Aziz et al.¹⁹ predict the depression level on a scale from 0 to 6, while Crowson et al.²⁰ emulate the answers of the PHQ-9 questionnaire which is a discrete value but with a wide range of options. Oduntan et al.²¹ and Akyol²² used the usual metrics but also included the Mathews Correlation Coefficient (MCC) metric to compare their models. Oduntan et al.²¹ also chose the False Discovery Rate (FDR) metric. Tavchioski et al.²³ and Trotzek et al.²⁴ presented their studies in the eRisk [1] event. This event is a yearly contest where different researchers present techniques to diagnose depression using Artificial Intelligence techniques. To compare results, the contest provides its own metric, called Early Risk Detection Error (ERDE),²⁵ and they used it in their studies. This metric used the confusion matrix to determine its value, but it considers the instance when the prediction is made, taking into account that these models deal with longitudinal data and the goal is to detect depression symptoms as early as they appear. Inkpen et al.²⁶ and Skaik et al.,²⁷ other eRisk studies, also selected custom metrics: Average Hit Rate (AHR), Average Closeness Rate (ACR), Average Difference between overall Depression Levels (ADL), and Depression Category Hit Rate (DCHR). Lyu et al.²⁸ use the Pearson Correlation Coefficient (PCC) to evaluate their models, also based on supervised learning.

Unsupervised learning studies use heterogeneous metrics. Dipnall et al.^29,30 calculated the percentage of depressive people in different clusters to measure their results in two different studies. Choi et al.,¹⁶ who mixed unsupervised and semi-supervised learning approaches, used the Analysis of Variance (ANOVA) metric. This study compared the two different kinds of learning techniques and compared their results. For the semi-supervised approach, they labeled a portion of its data using standard psychological questionnaires while the rest of the data remained unclassified. The dataset consisted of demographic and clinical information about the patients, presented structurally. This partial classification was used to fix the clusters created by standard unsupervised learning models. If well the semi-supervised solution used fewer records for its training, it generated better results than the full unsupervised learning technique.

Future research directions

The most commonly suggested future research direction is the improvement of data quality. Many articles recommend incorporating multimedia data (images, audio, and videos) with existing text and numerical data to create more comprehensive diagnostic models. Several studies highlight the potential of combining social media data with clinical and mobile health data to enhance diagnostic accuracy. The integration of longitudinal data—data collected over extended periods—was also proposed to better capture the progression of depression symptoms.

Furthermore, several studies acknowledged the limitations of available data, particularly from social networks. These datasets often lack the accuracy and structure of clinical data, and concerns over privacy and ethical data usage remain a significant challenge. Researchers recommend improving access to clinical datasets and creating public datasets that contain real-world medical data.

Summary of research questions

1. RQ1 (Dataset Types): The most common dataset types were social networks (unstructured text) and clinical data (structured EHRs). Mobile devices and interviews were less frequently used (13% of the datasets).
2. RQ2 (Validation and Class Distribution): Questionnaires were the most used validation method (42%) followed by experts’ validation (22%), while most datasets showed imbalanced depressive class distributions, requiring balancing techniques.
3. RQ3 (Learning Types): Most studies applied supervised learning models, with limited exploration into unsupervised and semi-supervised methods.
4. RQ4 (Model Performance): The most frequently reported performance metric was accuracy, with an average value of 0.82.
5. RQ5 and RQ6 (Challenges and Future Directions): The most significant challenge is the lack of high-quality, publicly available datasets. Future research should focus on improving data quality, integrating diverse data sources, and addressing ethical concerns around privacy.

Conclusion

The analysis of the reviewed studies highlights a continuous increase in research focused on applying AI techniques for the diagnosis of Major Depressive Disorder (MDD), with a minor decline in 2022, followed by a resurgence in 2023. Given that this review encompasses studies up to mid-2024, it is expected that this growth trend will persist in the coming years, reflecting the increasing relevance of AI in mental health research.

Geographically, research contributions are primarily concentrated in the United States, India, and China, with a total of 35 countries across four continents contributing to the field. This broad international distribution reduces potential bias stemming from geographical constraints, ensuring a more comprehensive understanding of the global progress in this domain. However, it is worth noting that some regions, particularly lower-income countries, remain underrepresented in this research area.

In terms of data sources, over half of the studies rely on datasets extracted from social networks, with clinical data being the second most utilized. While clinical datasets are generally more robust and reliable, social network data is more accessible and easier to collect. This disparity underscores a key trade-off in the field: the use of clinical data enhances the validity of the models, but the acquisition process is often more resource-intensive. Validation methods also align with the type of dataset used, with clinical datasets predominantly validated through questionnaires, while social network datasets employ a variety of validation techniques, including expert validation, keyword-based classification, and sentiment analysis.

Algorithmic approaches in these studies are overwhelmingly dominated by supervised learning techniques, with a particular emphasis on the combination of Natural Language Processing (NLP) and Artificial Neural Networks (ANNs), especially when dealing with unstructured data. Most studies implement multiple algorithms, either through comparative analysis or as part of a pipeline, to achieve optimal performance.

The studies also highlight several limitations, many of which suggest avenues for future research. A primary limitation, cited by the majority of articles, is the availability and quality of datasets. Many studies operate with relatively small datasets, rarely exceeding several thousand records. When larger datasets are available, particularly from social networks, concerns about their reliability arise due to the use of user-generated classifications, such as keyword tagging or sentiment analysis, which are less accurate than clinical methods such as expert evaluations or psychological questionnaires.

Other dataset-related issues include class imbalances, insufficient representation of diverse population groups, and the lack of longitudinal data. These challenges highlight the need for more comprehensive datasets that capture a wider array of features, including demographic diversity, longitudinal records, and data from multiple languages and geographic regions, as well as the integration of multimedia data to enhance diagnostic accuracy.

In recent years, studies have begun to explore the use of large language models,^31,32 such as OpenAI’s ChatGPT² for the diagnosis of MDD. While this represents an exciting advancement, it raises significant concerns regarding the explainability of AI-based solutions, particularly given that such models are often regarded as “black box” systems. This concern is compounded when using proprietary models, where the underlying engineering and decision-making processes are not publicly accessible or transparent.

Overall, while the majority of studies report promising results, there remains considerable room for improvement, particularly in the areas of data acquisition and quality. The lack of comprehensive and diverse datasets is a major limitation, and addressing this issue is critical to improving the generalizability and reliability of AI-driven depression diagnoses. Although algorithmic advances continue to enhance performance, the success of these models will ultimately depend on the availability of high-quality, unbiased, and diverse datasets, which should remain a primary focus for future research in this field. Additionally, the challenge of accurately labeling depressive states, particularly in social media datasets, must be addressed to ensure that AI models can provide valid and clinically meaningful outcomes.

Ethics and consent

Ethical approval and consent were not required.

Data availability statement

No data are associated with this article.

Extended data

This systematic review follows the latest PRISMA guidelines.

Uruguay's open research data repository Redata: Dataset: Depression diagnosis using Artificial Intelligence: A systematic review, 10.60895/redata/DS0L5O.³³

This project contains the following underlying data:

1. Depression diagnosis using Artificial Intelligence_ A systematic review - Study list.docx
2. Depression diagnosis using Artificial Intelligence_ A systematic review.xlsx
3. Depression diagnosis using Artificial Intelligence_ A systematic review.csv
4. Figure 1. png
5. PRISMA abstract checklist for _Depression diagnosis using Artificial Intelligence with alphanumeric data_ A systematic review_.pdf
6. PRISMA checklist for _Depression diagnosis using Artificial Intelligence with alphanumeric data_ A systematic review_.pdf

Repositorio de datos abiertos de investigación de Uruguay, Repositorio de datos abiertos de investigación de Uruguay.

References

1. Gelenberg AJ: The Prevalence and Impact of Depression. J. Clin. Psychiatry. Mar. 2010; 71(3): e06. Publisher Full Text
2. Fava M, Kendler KS: Major Depressive Disorder. Neuron. Nov. 2000; 28(2): 335–341. Publisher Full Text
3. Depressive disorder (depression).2023. Accessed: Apr.19. Reference Source
4. Bains N, Abdijadid S: Major Depressive Disorder. StatPearls. Treasure Island (FL): StatPearls Publishing; 2023. Accessed: Apr. 03, 2023. Reference Source
5. Depression (WHO): Accessed: Jun. 14, 2022. Reference Source
6. Guo L, et al.: PROTOCOL: Treatment for depressive disorder among adults: An evidence and gap map of systematic reviews. Campbell Syst. Rev. Mar. 2023; 19(1): e1308. PubMed Abstract | Publisher Full Text | Free Full Text
7. Faisal-Cury A, Ziebold C, Rodrigues DM d O, et al.: Depression underdiagnosis: Prevalence and associated factors. A population-based study. J. Psychiatr. Res. Jul. 2022; 151: 157–165. PubMed Abstract | Publisher Full Text
8. Lao C-K, Chan Y-M, Tong HH-Y, et al.: Underdiagnosis of depression in an economically deprived population in Macao, China. Asia-Pac. Psychiatry. 2016; 8(1): 70–79. PubMed Abstract | Publisher Full Text
9. Arbabzadeh-Bouchez S, Tylee A, Lépine J-P: A European Perspective on Depression in the Community: The DEPRES Study. CNS Spectr. Feb. 2002; 7(2): 120–126. PubMed Abstract | Publisher Full Text
10. Barney LJ, Griffiths KM, Jorm AF, et al.: Stigma about Depression and its Impact on Help-Seeking Intentions. Aust. N. Z. J. Psychiatry. Jan. 2006; 40(1): 51–54. PubMed Abstract | Publisher Full Text
11. Wang P: On Defining Artificial Intelligence. J. Artif. Gen. Intell. Jan. 2019; 10(2): 1–37. Publisher Full Text
12. Boden MA: Artificial Intelligence. Elsevier; 1996.
13. Buttazzo G: Rise of artificial general intelligence: risks and opportunities. Front. Artif. Intell. Aug. 2023; 6. PubMed Abstract | Publisher Full Text | Free Full Text
14. Bohr A, Memarzadeh K: Chapter 2 - The rise of artificial intelligence in healthcare applications. Artificial Intelligence in Healthcare. Bohr A, Memarzadeh K, editors. Academic Press; 2020; pp. 25–60. Publisher Full Text
15. Wankhade M, Rao ACS, Kulkarni C: A survey on sentiment analysis methods, applications, and challenges. Artif. Intell. Rev. Oct. 2022; 55(7): 5731–5780. Publisher Full Text
16. Choi B, Shim G, Jeong B, et al.: Data-driven analysis using multiple self-report questionnaires to identify college students at high risk of depressive disorder. Sci. Rep. Dec. 2020; 10(1): 7867. PubMed Abstract | Publisher Full Text | Free Full Text
17. Zhu X, Goldberg AB: Introduction to Semi-Supervised Learning. Synth. Lect. Artif. Intell. Mach. Learn. Jan. 2009; 3(1): 1–130. Publisher Full Text
18. Chatterjee S, Mishra J, Sundram F, et al.: Towards Personalised Mood Prediction and Explanation for Depression from Biophysical Data. Sensors. Jan. 2024; 24(1): Art. no. 1. PubMed Abstract | Publisher Full Text | Free Full Text
19. Aziz S, Alsaad R, Abd-Alrazaq A, et al.: Performance of Artificial Intelligence in Predicting Future Depression Levels. Stud. Health Technol. Inform. Jun. 2023; 305: 452–455. PubMed Abstract | Publisher Full Text
20. Crowson MG, Franck KH, Rosella LC, et al.: Predicting Depression From Hearing Loss Using Machine Learning. Ear Hear. Jul. 2021; 42(4): 982–989. PubMed Abstract | Publisher Full Text
21. Oduntan A, Oyebode O, Hernandez A, et al.: I Let Depression and Anxiety Drown Me: Identifying Factors Associated with Resilience Based on Journaling using Machine Learning and Thematic Analysis. IEEE J. Biomed. Health Inform. Feb. 2022; 26: 3391–3397. PubMed Abstract | Publisher Full Text
22. Akyol S: New chaos-integrated improved grey wolf optimization based models for automatic detection of depression in online social media and networks. PeerJ Comput. Sci. 2023; 9: e1661. PubMed Abstract | Publisher Full Text | Free Full Text
23. Tavchioski I, Škrlj B, Pollak S, et al.: Early detection of depression with linear models using hand-crafted and contextual features.Sep. 2022.
24. Trotzek M, Koitka S, Friedrich CM: Utilizing Neural Networks and Linguistic Metadata for Early Detection of Depression Indications in Text Sequences. IEEE Trans. Knowl. Data Eng. Mar. 2020; 32(3): 588–601. Publisher Full Text
25. Losada D, Crestani F: A Test Collection for Research on Depression and Language Use.Sep. 2016; 28–39. Publisher Full Text
26. Inkpen D, Skaik R, Buddhitha P, et al.: uOttawa at eRisk 2021: Automatic Filling of the Beck’s Depression Inventory Questionnaire using Deep Learning.: 15.
27. Skaik R, Inkpen D: Predicting Depression in Canada by Automatic Filling of Beck’s Depression Inventory Questionnaire. IEEE Access. Jan. 2022; 10: 102031–102047. Publisher Full Text
28. Lyu S, Ren X, Du Y, et al.: Detecting depression of Chinese microblog users via text analysis: Combining Linguistic Inquiry Word Count (LIWC) with culture and suicide related lexicons. Front. Psych. 2023; 14: 1121583. PubMed Abstract | Publisher Full Text | Free Full Text
29. Dipnall JF, et al.: Into the Bowels of Depression: Unravelling Medical Symptoms Associated with Depression by Applying Machine-Learning Techniques to a Community Based Population Sample. PLoS One. Dec. 2016; 11(12): e0167055. PubMed Abstract | Publisher Full Text | Free Full Text
30. Dipnall JF, et al.: Why so GLUMM? Detecting depression clusters through graphing lifestyle-environs using machine-learning methods (GLUMM). Eur. Psychiatry. Jan. 2017; 39: 40–50. PubMed Abstract | Publisher Full Text
31. Detection and prediction of Future Mental disorder from Social Media Data using Machine Learning, Ensemble Learning, and Large Language Models. IEEE Journals & Magazine|IEEE Xplore. Accessed: Aug. 07, 2024. Reference Source
32. Danner M, et al.: Advancing Mental Health Diagnostics: GPT-Based Method for Depression Detection. 2023 62nd Annual Conference of the Society of Instrument and Control Engineers (SICE). Sep. 2023; pp. 1290–1296. Publisher Full Text
33. Di Felice M, Trupkin I, Deroche A, et al.: Dataset: Depression diagnosis using Artificial Intelligence: A systematic review.2024. Publisher Full Text

Footnotes

1 https://erisk.irlab.org/

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 20 Dec 2024

Author details Author details

¹ GEMIS, Universidad Tecnológica Nacional, Buenos Aires, Argentina
² Department of Biological Engineering, Universidad de la República, Montevideo, Uruguay

Martín Di Felice
Roles: Data Curation, Formal Analysis, Investigation, Methodology, Software, Validation, Visualization, Writing – Original Draft Preparation

Ilan Trupkin
Roles: Formal Analysis, Investigation, Validation, Visualization

Ariel Deroche
Roles: Formal Analysis, Investigation, Validation, Visualization

María Florencia Pollo Cattaneo
Roles: Conceptualization, Investigation, Methodology, Project Administration, Supervision, Writing – Review & Editing

Parag Chatterjee
Roles: Conceptualization, Investigation, Methodology, Project Administration, Supervision, Writing – Original Draft Preparation, Writing – Review & Editing

Competing interests

No competing interests were disclosed.

Grant information

The author(s) declared that no grants were involved in supporting this work.

Article Versions (1)

version 1

Published: 20 Dec 2024, 13:1549

https://doi.org/10.12688/f1000research.158443.1

Copyright

© 2024 Di Felice M et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

0

SEE MORE DETAILS

CITE

how to cite this article

Di Felice M, Trupkin I, Deroche A et al. Depression diagnosis using Artificial Intelligence: a systematic review [version 1; peer review: 2 approved with reservations]. F1000Research 2024, 13:1549 (https://doi.org/10.12688/f1000research.158443.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Version 1

VERSION 1

PUBLISHED 20 Dec 2024

Views

1

Reviewer Report 05 Nov 2025

Ixent Galpin, Universidad de Bogotá-Jorge Tadeo Lozano, Bogotá, Colombia

Approved with Reservations

https://doi.org/10.5256/f1000research.174030.r416829

This work presents a comprehensive review of depression diagnosis using AI approaches. The sources consulted (PubMed, IEEE Xplore and Scopus) are the relevant ones for the problem domain.

The focus is on textual datasets, and it would ... Continue reading

This work presents a comprehensive review of depression diagnosis using AI approaches. The sources consulted (PubMed, IEEE Xplore and Scopus) are the relevant ones for the problem domain.

The focus is on textual datasets, and it would have been interesting to consider other modalities (image, video, audio, etc). For example, is there potential to diagnose depression from MRI brain scans or facial recognition images? The survey does not answer these questions, however, this limitation is duly noted at the outset.

The methodology is sound and very clearly explained. However, several dimensions are defined at the start of the paper (page 5), which could be more crisply characterised:
* In the "Datasets" dimension there is mention of "number of datasets", presumably here is meant "number of records". I think that here is also worth mentioning other aspects of the dataset, such as the number of features, whether it is labelled or not, if so, what the target variable is, and a date range.
* In the "Type of datasets" dimension, I propose renaming the "Mobile devices" category to "mHealth App" as it is not clear what is meant by that. Social network data, clinical data, interview data and text message data can all be collected using mobile devices, so it is not a good name.
* In the algorithms dimension, it would be relevant to consider whether the problem of depression diagnosis is viewed as a classification or regression problem.
* In the results dimension, I would separate the metrics used (e.g., F1 score, MRSE) form the results as such.

All the results shown are aggregated across the corpus of papers examined, which is mostly helpful for readers to identify overall trends. However, a table showing paper-by-paper classification according to the dimensions explored would be useful, to help researchers identify specific gaps in the literature. As it stands, the review does not make it easy for readers to identify such gaps.

Also, better visualisations could have been used. Pie-charts are probably not adequate when the number of categories is high; a bar-chart ordered by precentage would be easier for the reader to parse. For example, see Figure 7.

I also feel that it should also be made explicit at the beginning of the paper that most works view this as a classification problem rather then regression. Regression is mentioned, but rather late in the paper.

Finally, I note that the focus is on machine-learning supervised approaches (specifically classification). There is a brief mention of generative AI approaches (specifically LLMs) in the conclusions. However, I consider that this increasingly relevant area of AI could have been mentioned earlier in the paper.

Small issues:
* Please add a citation in page 6 for the PRISMA guidelines

Are the rationale for, and objectives of, the Systematic Review clearly stated?

Yes
Are sufficient details of the methods and analysis provided to allow replication by others?

Yes
Is the statistical analysis and its interpretation appropriate?

Yes
Are the conclusions drawn adequately supported by the results presented in the review?

Yes
If this is a Living Systematic Review, is the ‘living’ method appropriate and is the search schedule clearly defined and justified? (‘Living Systematic Review’ or a variation of this term should be included in the title.)

Not applicable

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Machine learning, Large Language Models, Recommender Systems

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Respond or Comment

Views

5

Reviewer Report 05 Mar 2025

Sumit Dalal, National Institute of Technology, Kurukshetra, India

Approved with Reservations

https://doi.org/10.5256/f1000research.174030.r366895

The keyword combinations list is quite large. Please consider concising it by using a bulleted list or a text box for better readability.
The literature review covers studies only up to July 2024. The authors

The keyword combinations list is quite large. Please consider concising it by using a bulleted list or a text box for better readability.
The literature review covers studies only up to July 2024. The authors should clarify why they did not extend it until January 2025 and specify the starting date of the literature collection.
Figure 4 is missing percentage values and a mention of “Text Message”. Please update accordingly.
To enhance the survey section, the authors could:
1. List feature types from different sources.
2. Include datasets (as listed in Reference 4 ).
3. Discuss how validation techniques work. (This is only a suggestion.)
I would like to recommend some references that could be valuable for citations as well as for your future work:
1. Dalal S, et al., 2024 (Ref 1)
2. Dalal S, et al., 2024 (Ref 2)
3. Dalal S, et al., 2024 (Ref 3)
4. Dalal S, et al., 2025 (Ref 4)

The references to my work are proposed as they provide insights directly relevant to the manuscript's discussion on AI-based depression detection. Specifically, my studies contribute to this domain by developing and analyzing performances of AI models for depression detection from social media data. The suggested references present neuro-symbolic techniques for explainable AI models for depression detection.
By including these citations, the manuscript will offer a more comprehensive review of the field, ensuring that important advancements are acknowledged.

Are the rationale for, and objectives of, the Systematic Review clearly stated?

Yes
Are sufficient details of the methods and analysis provided to allow replication by others?

Yes
Is the statistical analysis and its interpretation appropriate?

Not applicable
Are the conclusions drawn adequately supported by the results presented in the review?

Yes
If this is a Living Systematic Review, is the ‘living’ method appropriate and is the search schedule clearly defined and justified? (‘Living Systematic Review’ or a variation of this term should be included in the title.)

Yes

References

1. Dalal S, Tilwani D, Gaur M, Jain S, et al.: A Cross Attention Approach to Diagnostic Explainability Using Clinical Practice Guidelines for Depression.IEEE J Biomed Health Inform. 2024; PP. PubMed Abstract | Publisher Full Text
2. S, Dalal S, Jain M, Dave: Deep Knowledge-Infusion For Explainable Depression Detection. https://arxiv.org/abs/2409.02122. 2024. Publisher Full Text
3. Dalal S, Jain S, Dave M: Convolution Neural Network Having Multiple Channels with Own Attention Layer for Depression Detection from Social Data. New Generation Computing. 2024; 42 (1): 135-155 Publisher Full Text
4. Dalal S, Jain S, Dave M: Review of Advancements in Depression Detection Using Social Media Data. IEEE Transactions on Computational Social Systems. 2025; 12 (1): 77-100 Publisher Full Text

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Explainable AI, Neuro-Symbolic AI, Prompt Engineering

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Respond or Comment

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 20 Dec 2024

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2
Version 1 20 Dec 24	read	read

Sumit Dalal, National Institute of Technology, Kurukshetra, India
Ixent Galpin, Universidad de Bogotá-Jorge Tadeo Lozano, Bogotá, Colombia

Comments on this article

All Comments(0)

Add a comment

Sign up for content alerts

Browse by related subjects

Back to all reports

Reviewer Report

1 Views

05 Nov 2025 | for Version 1

Ixent Galpin, Universidad de Bogotá-Jorge Tadeo Lozano, Bogotá, Colombia

1 Views Cite this report Responses(0)

Approved With Reservations

This work presents a comprehensive review of depression diagnosis using AI approaches. The sources consulted (PubMed, IEEE Xplore and Scopus) are the relevant ones for the problem domain.

The focus is on textual datasets, and it would have been interesting to consider other modalities (image, video, audio, etc). For example, is there potential to diagnose depression from MRI brain scans or facial recognition images? The survey does not answer these questions, however, this limitation is duly noted at the outset.

The methodology is sound and very clearly explained. However, several dimensions are defined at the start of the paper (page 5), which could be more crisply characterised:
* In the "Datasets" dimension there is mention of "number of datasets", presumably here is meant "number of records". I think that here is also worth mentioning other aspects of the dataset, such as the number of features, whether it is labelled or not, if so, what the target variable is, and a date range.
* In the "Type of datasets" dimension, I propose renaming the "Mobile devices" category to "mHealth App" as it is not clear what is meant by that. Social network data, clinical data, interview data and text message data can all be collected using mobile devices, so it is not a good name.
* In the algorithms dimension, it would be relevant to consider whether the problem of depression diagnosis is viewed as a classification or regression problem.
* In the results dimension, I would separate the metrics used (e.g., F1 score, MRSE) form the results as such.

All the results shown are aggregated across the corpus of papers examined, which is mostly helpful for readers to identify overall trends. However, a table showing paper-by-paper classification according to the dimensions explored would be useful, to help researchers identify specific gaps in the literature. As it stands, the review does not make it easy for readers to identify such gaps.

Also, better visualisations could have been used. Pie-charts are probably not adequate when the number of categories is high; a bar-chart ordered by precentage would be easier for the reader to parse. For example, see Figure 7.

I also feel that it should also be made explicit at the beginning of the paper that most works view this as a classification problem rather then regression. Regression is mentioned, but rather late in the paper.

Finally, I note that the focus is on machine-learning supervised approaches (specifically classification). There is a brief mention of generative AI approaches (specifically LLMs) in the conclusions. However, I consider that this increasingly relevant area of AI could have been mentioned earlier in the paper.

Small issues:
* Please add a citation in page 6 for the PRISMA guidelines

Are the rationale for, and objectives of, the Systematic Review clearly stated?

Yes
Are sufficient details of the methods and analysis provided to allow replication by others?

Yes
Is the statistical analysis and its interpretation appropriate?

Yes
Are the conclusions drawn adequately supported by the results presented in the review?

Yes
If this is a Living Systematic Review, is the ‘living’ method appropriate and is the search schedule clearly defined and justified? (‘Living Systematic Review’ or a variation of this term should be included in the title.)

Not applicable

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Machine learning, Large Language Models, Recommender Systems

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

5 Views

05 Mar 2025 | for Version 1

Sumit Dalal, National Institute of Technology, Kurukshetra, India

5 Views Cite this report Responses(0)

Approved With Reservations

The keyword combinations list is quite large. Please consider concising it by using a bulleted list or a text box for better readability.
The literature review covers studies only up to July 2024. The authors should clarify why they did not extend it until January 2025 and specify the starting date of the literature collection.
Figure 4 is missing percentage values and a mention of “Text Message”. Please update accordingly.
To enhance the survey section, the authors could:
1. List feature types from different sources.
2. Include datasets (as listed in Reference 4 ).
3. Discuss how validation techniques work. (This is only a suggestion.)
I would like to recommend some references that could be valuable for citations as well as for your future work:
1. Dalal S, et al., 2024 (Ref 1)
2. Dalal S, et al., 2024 (Ref 2)
3. Dalal S, et al., 2024 (Ref 3)
4. Dalal S, et al., 2025 (Ref 4)

The references to my work are proposed as they provide insights directly relevant to the manuscript's discussion on AI-based depression detection. Specifically, my studies contribute to this domain by developing and analyzing performances of AI models for depression detection from social media data. The suggested references present neuro-symbolic techniques for explainable AI models for depression detection.
By including these citations, the manuscript will offer a more comprehensive review of the field, ensuring that important advancements are acknowledged.

Are the rationale for, and objectives of, the Systematic Review clearly stated?

Yes
Are sufficient details of the methods and analysis provided to allow replication by others?

Yes
Is the statistical analysis and its interpretation appropriate?

Not applicable
Are the conclusions drawn adequately supported by the results presented in the review?

Yes
If this is a Living Systematic Review, is the ‘living’ method appropriate and is the search schedule clearly defined and justified? (‘Living Systematic Review’ or a variation of this term should be included in the title.)

Yes

References

1. Dalal S, Tilwani D, Gaur M, Jain S, et al.: A Cross Attention Approach to Diagnostic Explainability Using Clinical Practice Guidelines for Depression.IEEE J Biomed Health Inform. 2024; PP. PubMed Abstract | Publisher Full Text
2. S, Dalal S, Jain M, Dave: Deep Knowledge-Infusion For Explainable Depression Detection. https://arxiv.org/abs/2409.02122. 2024. Publisher Full Text
3. Dalal S, Jain S, Dave M: Convolution Neural Network Having Multiple Channels with Own Attention Layer for Depression Detection from Social Data. New Generation Computing. 2024; 42 (1): 135-155 Publisher Full Text
4. Dalal S, Jain S, Dave M: Review of Advancements in Depression Detection Using Social Media Data. IEEE Transactions on Computational Social Systems. 2025; 12 (1): 77-100 Publisher Full Text

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Explainable AI, Neuro-Symbolic AI, Prompt Engineering

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Respond to this report

Responses (0)

[1] 1. Gelenberg AJ: The Prevalence and Impact of Depression. J. Clin. Psychiatry. Mar. 2010; 71(3): e06. Publisher Full Text

[2] 2. Fava M, Kendler KS: Major Depressive Disorder. Neuron. Nov. 2000; 28(2): 335–341. Publisher Full Text

[3] 3. Depressive disorder (depression).2023. Accessed: Apr.19. Reference Source

[4] 4. Bains N, Abdijadid S: Major Depressive Disorder. StatPearls. Treasure Island (FL): StatPearls Publishing; 2023. Accessed: Apr. 03, 2023. Reference Source

[5] 5. Depression (WHO): Accessed: Jun. 14, 2022. Reference Source

[6] 6. Guo L, et al.: PROTOCOL: Treatment for depressive disorder among adults: An evidence and gap map of systematic reviews. Campbell Syst. Rev. Mar. 2023; 19(1): e1308. PubMed Abstract | Publisher Full Text | Free Full Text

[7] 7. Faisal-Cury A, Ziebold C, Rodrigues DM d O, et al.: Depression underdiagnosis: Prevalence and associated factors. A population-based study. J. Psychiatr. Res. Jul. 2022; 151: 157–165. PubMed Abstract | Publisher Full Text

[8] 8. Lao C-K, Chan Y-M, Tong HH-Y, et al.: Underdiagnosis of depression in an economically deprived population in Macao, China. Asia-Pac. Psychiatry. 2016; 8(1): 70–79. PubMed Abstract | Publisher Full Text

[9] 9. Arbabzadeh-Bouchez S, Tylee A, Lépine J-P: A European Perspective on Depression in the Community: The DEPRES Study. CNS Spectr. Feb. 2002; 7(2): 120–126. PubMed Abstract | Publisher Full Text

[10] 10. Barney LJ, Griffiths KM, Jorm AF, et al.: Stigma about Depression and its Impact on Help-Seeking Intentions. Aust. N. Z. J. Psychiatry. Jan. 2006; 40(1): 51–54. PubMed Abstract | Publisher Full Text

[11] 11. Wang P: On Defining Artificial Intelligence. J. Artif. Gen. Intell. Jan. 2019; 10(2): 1–37. Publisher Full Text

[12] 12. Boden MA: Artificial Intelligence. Elsevier; 1996.

[13] 13. Buttazzo G: Rise of artificial general intelligence: risks and opportunities. Front. Artif. Intell. Aug. 2023; 6. PubMed Abstract | Publisher Full Text | Free Full Text

[14] 14. Bohr A, Memarzadeh K: Chapter 2 - The rise of artificial intelligence in healthcare applications. Artificial Intelligence in Healthcare. Bohr A, Memarzadeh K, editors. Academic Press; 2020; pp. 25–60. Publisher Full Text

[15] 15. Wankhade M, Rao ACS, Kulkarni C: A survey on sentiment analysis methods, applications, and challenges. Artif. Intell. Rev. Oct. 2022; 55(7): 5731–5780. Publisher Full Text

[16] 16. Choi B, Shim G, Jeong B, et al.: Data-driven analysis using multiple self-report questionnaires to identify college students at high risk of depressive disorder. Sci. Rep. Dec. 2020; 10(1): 7867. PubMed Abstract | Publisher Full Text | Free Full Text

[17] 17. Zhu X, Goldberg AB: Introduction to Semi-Supervised Learning. Synth. Lect. Artif. Intell. Mach. Learn. Jan. 2009; 3(1): 1–130. Publisher Full Text

[18] 18. Chatterjee S, Mishra J, Sundram F, et al.: Towards Personalised Mood Prediction and Explanation for Depression from Biophysical Data. Sensors. Jan. 2024; 24(1): Art. no. 1. PubMed Abstract | Publisher Full Text | Free Full Text

[19] 19. Aziz S, Alsaad R, Abd-Alrazaq A, et al.: Performance of Artificial Intelligence in Predicting Future Depression Levels. Stud. Health Technol. Inform. Jun. 2023; 305: 452–455. PubMed Abstract | Publisher Full Text

[20] 20. Crowson MG, Franck KH, Rosella LC, et al.: Predicting Depression From Hearing Loss Using Machine Learning. Ear Hear. Jul. 2021; 42(4): 982–989. PubMed Abstract | Publisher Full Text

[21] 21. Oduntan A, Oyebode O, Hernandez A, et al.: I Let Depression and Anxiety Drown Me: Identifying Factors Associated with Resilience Based on Journaling using Machine Learning and Thematic Analysis. IEEE J. Biomed. Health Inform. Feb. 2022; 26: 3391–3397. PubMed Abstract | Publisher Full Text

[22] 22. Akyol S: New chaos-integrated improved grey wolf optimization based models for automatic detection of depression in online social media and networks. PeerJ Comput. Sci. 2023; 9: e1661. PubMed Abstract | Publisher Full Text | Free Full Text

[23] 23. Tavchioski I, Škrlj B, Pollak S, et al.: Early detection of depression with linear models using hand-crafted and contextual features.Sep. 2022.

[24] 24. Trotzek M, Koitka S, Friedrich CM: Utilizing Neural Networks and Linguistic Metadata for Early Detection of Depression Indications in Text Sequences. IEEE Trans. Knowl. Data Eng. Mar. 2020; 32(3): 588–601. Publisher Full Text

[25] 25. Losada D, Crestani F: A Test Collection for Research on Depression and Language Use.Sep. 2016; 28–39. Publisher Full Text

[26] 26. Inkpen D, Skaik R, Buddhitha P, et al.: uOttawa at eRisk 2021: Automatic Filling of the Beck’s Depression Inventory Questionnaire using Deep Learning.: 15.

[27] 27. Skaik R, Inkpen D: Predicting Depression in Canada by Automatic Filling of Beck’s Depression Inventory Questionnaire. IEEE Access. Jan. 2022; 10: 102031–102047. Publisher Full Text

[28] 28. Lyu S, Ren X, Du Y, et al.: Detecting depression of Chinese microblog users via text analysis: Combining Linguistic Inquiry Word Count (LIWC) with culture and suicide related lexicons. Front. Psych. 2023; 14: 1121583. PubMed Abstract | Publisher Full Text | Free Full Text

[29] 29. Dipnall JF, et al.: Into the Bowels of Depression: Unravelling Medical Symptoms Associated with Depression by Applying Machine-Learning Techniques to a Community Based Population Sample. PLoS One. Dec. 2016; 11(12): e0167055. PubMed Abstract | Publisher Full Text | Free Full Text

[30] 30. Dipnall JF, et al.: Why so GLUMM? Detecting depression clusters through graphing lifestyle-environs using machine-learning methods (GLUMM). Eur. Psychiatry. Jan. 2017; 39: 40–50. PubMed Abstract | Publisher Full Text

[31] 31. Detection and prediction of Future Mental disorder from Social Media Data using Machine Learning, Ensemble Learning, and Large Language Models. IEEE Journals & Magazine|IEEE Xplore. Accessed: Aug. 07, 2024. Reference Source

[32] 32. Danner M, et al.: Advancing Mental Health Diagnostics: GPT-Based Method for Depression Detection. 2023 62nd Annual Conference of the Society of Instrument and Control Engineers (SICE). Sep. 2023; pp. 1290–1296. Publisher Full Text

[33] 33. Di Felice M, Trupkin I, Deroche A, et al.: Dataset: Depression diagnosis using Artificial Intelligence: A systematic review.2024. Publisher Full Text

Depression diagnosis using Artificial Intelligence: a systematic review

Abstract

Background

Methods

Results

Conclusion

Keywords

Introduction

Table 1. Research questions.

Methods

Automated search

Manual screening

Data extraction

Datasets

Algorithms

Validation methods

Geographic distribution

Results

Future research directions

Results

Figure 1. Flow diagram of the study identification.

Studies by year

Figure 2. Studies per year.

Studies by country

Figure 3. Studies per country.

Dataset analysis

Figure 4. Dataset types.

Dataset validation

Figure 5. Validation types.

Depressive class distribution

Figure 6. Depressive class percentage.

Algorithmic approaches

Figure 7. Algorithms.

Figure 8. Learning types.

Performance metrics

Table 2. Used metrics.

Future research directions

Summary of research questions

Conclusion

Ethics and consent

Data availability statement

Extended data

References

Footnotes

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated