Keywords
Machine Learning, Anxiety, Depression, Pregnancy, COVID-19, Random Forest
This article is included in the Artificial Intelligence and Machine Learning gateway.
Machine Learning, Anxiety, Depression, Pregnancy, COVID-19, Random Forest
The emergence of the Coronavirus disease (COVID-19) in late 2019 and early 2020 has severely impacted the global population. Being characterized as an infectious disease primarily spreading through droplets of saliva or nasal discharge1, the infection rate is significant and its consequences can be lethal2–4. The disease is particularly dangerous to vulnerable populations, such as the elderly and those with underlying medical conditions including cardiovascular disease, diabetes, respiratory disease, and cancer3–5. Nonetheless, the specific implications of COVID-19 infection on pregnancy and childbirth have remained unidentified throughout the pandemic6–9.
The uncertainty about the nature, transmission, and mortality of the virus, together with its rapid spread and the consequential social and mobility restrictions (quarantines, lockdowns, and social distancing) have impacted the mental health of pregnant women worldwide5,10. In fact, the psychological effects of COVID-19 on pregnant women may lead to the appearance or increment of stress, anxiety, and depression symptoms as indicated in Broche-Perez et al. study11. In a 2020 study by Tokgoz et al.12, the authors demonstrated that pregnant women during the COVID-19 pandemic presented higher rates of depression, stress, and anxiety than pregnant women before the pandemic. The study further evidenced that mental health disorders during pregnancy can result in pre-term labour, low birth weight, delayed neuropsychiatric development in children, preeclampsia, and unscheduled caesarean delivery11,13.
However, mental health disorders among pregnant women are widely undiagnosed and could result in worse consequences for mother and child14,15. Furthermore, the traditionally applied screening programs for psychological conditions rely on self-reporting and are for the most part designed to detect the population with pre-existing symptoms6,16. Contrary to the available traditional assessment of psychological disorders, artificial intelligence (AI) models can predict potential incidence of depression and anxiety among pregnant women, which would facilitate pre-emptive action, treatment, and early diagnosis16,17. As a matter of fact, one subarea of AI, machine learning (ML), has previously been used in the field of mental health for the prediction of psychological conditions such as anxiety, depression, obsessive-compulsive disorder (OCD), and post-traumatic stress disorder (PTSD), both prior to and during the onset of the COVID-19 pandemic18,19.
In a 2020 study by Seah et al.20, five ML algorithms where applied for the prediction of anxiety, depression, and stress on individuals around the world using the Depression, Anxiety and Stress Scale questionnaire (DASS 21). The Random Forest classifier, a machine learning algorithm used for data classification, had the best performance accuracy in predicting psychological conditions. A study by Priya et al.21, utilized ML tools for the creation of a new diagnostic methodology for anxiety and depression to replace traditional diagnosis through self-reported symptoms. The tool represented an improvement in diagnosis and treatment. Similarly, a study by Richter et al.22 applied eight ML algorithms, including a hybrid model, for the prediction of psychological problems such as anxiety, depression, and stress. The study found that the hybrid model presented higher accuracy rates than the single algorithms used. In relation to maternal health, only a few studies have been found to use ML models for the prediction of psychological disorders16,19,23,24. Among the available studies in this field, a study by Shin et al.24, developed a predictive model for postpartum depression using nine different ML approaches, the results showed that the Random Forest model achieved the highest accuracy rates. In addition, the study of Hochman et al.16, provided evidence that ML models are able to accurately screen and identify populations at high risk of postpartum depression for preventive intervention.
Thus, the use of ML techniques in mental health prediction and diagnosis might yield positive results in the reduction of self-harm and the provision of timely treatment for at-risk patients. However, very limited studies have used ML in maternal mental health, especially in relation to COVID-1911,16,19,24. This study aims to enrich the literature by assessing the performance of ML techniques in studying the effect of the COVID-19 lockdown on maternal mental health in low- and middle-income countries. The study used ML for predicting depression and anxiety symptoms from different features during the COVID-19 lockdown. To date, this is the first international study using population-based datasets from five countries (Palestine, Lebanon, Jordan, Saudi Arabia, and Bahrain) that accounts for multiple maternal and mental health variables.
The study obtained written approval of the Ethics Committee in Scientific Research of University of Jordan, Jordan (19/2020/585), as well as universities from all participating countries. Written informed consent was obtained from all participants.
Data set. This study is the first of its kind as it utilized a regional dataset for evaluating the performance of ML algorithms in predicting depression and anxiety among pregnant and postpartum women during the COVID-19 lockdown in five Arab countries. The stratification of participants into different sets of data according to country of residence and the overall prediction for the total participants provide important and interesting information about the effect of the COVID-19 lockdown on pregnant women. A total of 3,569 women (1,939 pregnant and 1,630 postpartum) from five countries (Jordan, Palestine, Lebanon, Saudi Arabia, and Bahrain) participated in the study. Data were collected during the period of lockdown from July to December 2020.
The data set was extracted from a regional study conducted by the authors for assessing the impact of the COVID-19 pandemic on pregnant and postpartum women's physical and mental health. The study collected data from five Arab countries including: Lebanon, Palestine, Jordan, Bahrain, and Saudi Arabia. A total of 3569 women (currently pregnant or were pregnant during the COVID-19 pandemic lockdown) were selected in this study. The data set including socio-demographic variables and risk factors related to depression, anxiety, and physical and mental health among pregnant women is shown in Table 1.
A cross-sectional study design was used for collecting the study data during the COVID-19 pandemic from August to November 2020, in the five listed countries: Lebanon, Palestine, Jordan, Bahrain, and Saudi Arabia) in the Arab region. The snowball sampling method was used to recruit pregnant women. The initial participants were contacted through the research team’s professional network, and the obstetric and maternity clinics in the participating countries. Data was collected through a web-based questionnaire, which was previously validated in two published studies25,26, and the software was designed by Palestinian National Nutrition Platform. The questionnaire was disseminated by researchers through their social media networks (Facebook, WhatsApp, and Instagram), and the participating universities network. Furthermore, hard copies of the questionnaire were distributed to women in some areas with limited internet access through obstetric and maternity clinics. The survey considered several sociodemographic features of pregnant women, medical history, nutrition patterns, physical activity, smoking, education, residency, economic situation, anxiety indicators, and depression indicators. Questions regarding the pre-pregnancy period were not included in the survey. For the full questionnaire, see Extended data27.
The following criteria guided the data collection process: (i) pregnancy during the COVID-19 pandemic period; (ii) normal pregnancy (i.e., no complications); (iii) aged over 18 (iv) place of residence (the five study countries); (v) having answered all questions in the questionnaire. Moreover, the exclusion criteria included conception during the intra-COVID19 pandemic period, as well as risk factors such as miscarriage and chronic health complications.
Outcome variables. The outcome variables included pregnant women's depression and anxiety levels. Participants were assessed for depression and anxiety using the Patient Health Questionnaire (PHQ-9) and Generalized Anxiety Disorder (GAD-7) scales.
Depression: The depression data was collected using the Patient Health Questionnaire (PHQ), a self-reported scale designed by 28 to screen for symptoms of depression. The PHQ items are composed of four answer categories (Never =0; several days =1, more than half of the days =2, and nearly every day =3). The total score was calculated by summing the scale items responses. The PHQ total score was classified into the following groups: Low=0; Moderate=1; High =2.
Anxiety: The GAD-7 scale28 was used for measuring generalized anxiety disorder. The anxiety score was estimated by assigning scores of 0, 1, 2, and 3 to the response categories of (Never =0; several days =1, more than half of the days =2, and nearly every day =3). The total score was calculated by summing the scale items responses. The GAD-7 total score was classified into the following groups: Low=0; Moderate=1; High =2.
Features. All potential features (predictors), including associated risk factors and socio-demographic variables were considered in the ML models for the assessment of pregnant women before and during the COVID-19 lockdown. The socio-demographic features included women’s age, age at marriage, country of residence, education level, work status, family income, and locality.
Associated risk factors included pre- and post-COVID-19 pandemic food consumption patterns, smoking status, body mass index (BMI), physical activity, healthy food consumption (fruits, vegetables, meat, grains, and dairy products), unhealthy food consumption (sweets, soft drinks, energy drinks, and fast food), physical activity level, technology-related activities, COVID-19 diagnosis, relatives diagnosed with COVID-19, underlying health conditions (diabetes, gestational diabetes, hypertension, gestational hypertension, heart and arterial diseases, liver diseases, high cholesterol, high triglycerides, thyroid disorders, or respiratory problems), cancellation of follow up appointment due to COVID-19, number of pregnancies, number of abortions, family problems, social problems, psychological stress, and work-related stress.
Data analysis. General descriptive analysis and ANOVA tests were used for describing the distribution of women based on risk factors, while prediction and classifications were measured using ML techniques. The classification accuracy, confusion matrix, precision, sensitivity, and specificity were used for evaluating the ML prediction performance. The ML algorithms were applied in the Python AI development platform to predict the incidence and severity level of depression and anxiety during COVID-19 among pregnant women. The data set was divided into a 70:20:10 ratio for training, testing and validation.
To evaluate whether the ML algorithms can predict pregnant women's depression and anxiety, the outcome variables and features were included in the ML models. The performance of ML models was first evaluated for depression and then for anxiety separately. The performance metrics for Gradient Boosting Machines (GB), Distributed Random Forests (RF), Extreme Randomized Forests (XRT), Naïve Bayes (NB), Support Vector Machine (SVM), Multilayers Neural Network (MNN), and Decision Tree (DT) are presented in the results. The accuracy, precision, Area Under the Curve (AUC), Matthew's Correlation Coefficient (MCC) and Receiver Operating Characteristic Curve (ROC) were used for measuring the performance accuracy.
Table 2 and Table 3 show the descriptive analysis of the participants’ data by anxiety and depression levels. Results indicated that the women’s mean age was 28.5 (±5.3) years. Among participants, 11.6% and 8.7% had moderate and high levels of depression, respectively while 22.4% and 7.7% had moderate and high levels of anxiety, respectively.
The rates of anxiety and depression were found to differ by the country of residence, education level, family income, work stress, social problems, health problems, family problems, financial problems, psychological problems, sleeping hours per day, fear of COVID-19 infection, and unhealthy food consumption. A greater percentage of women with high levels of depression and anxiety symptoms were found in Palestine (19.9%, 22.5%) and Jordan (18.6%, 21.0%), respectively. Furthermore, the results in Table 2 indicated that the highest percentages of depression were among women with self-reported family problems (35.3%), sleep deprivation (<6 hours per night) (31.6%), psychological problems (30.5%), financial problems (29.7%), COVID-19 diagnosis (27.4%), social (25.7%), and work stress (23.1%). Furthermore, high levels of anxiety symptoms were found among women with self-reported family problems (33.2%), financial (23.6%), social (21.6%), and psychological problems (22.5%) as shown in Table 3.
Different performance measures were considered in our study to evaluate whether the ML models can predict women’s depression and anxiety symptoms during the COVID-19 lockdown. Seven ML classification algorithms were tested on our dataset, including SVM, K-nearest neighbour (KNN), NB, Random Forest (RF), Neural Network (NN), DT, and GB. The performance was evaluated using several assessments measures such as accuracy, precision, Area Under the Curve (AUC), Matthew's Correlation Coefficient (MCC) and Receiver Operating Characteristic Curve (ROC). The performance measures were calculated using the following equations:
1. Specificity
2. Precision
3. Recall
4. F-measure
5. Matthew’s Correlation Coefficient
6. Accuracy
Figure 1 represents the comparison of accuracy rates among the selected machine learning algorithms in predicting women's depression and anxiety symptoms. All tested models reported a high level of accuracy (ranging from 80.0–83.3%) for predicting depression among pregnant women except for NB. On the other hand, various levels of accuracy were reported for the ML models when predicting anxiety. The GB model presented the highest accuracy rate (82.9%) followed by RF and NB (81.3%). Nonetheless, all the ML models reported an acceptable rate of accuracy for both depression and anxiety symptoms.
Additional performance measures were used for evaluating the ML prediction performance of depression and anxiety symptoms, including AUC, sensitivity, specificity, F-Measure, and MCC. Figure 2 illustrates the different performance measures of depression prediction models. Balanced accuracy, sensitivity, and F measures were observed across the ML models. The AUC varied across models; DT reported the lowest AUC rate (68.8%), while other models ranged from 82.6% to 91.9%. The MCC performance measure showed high variability across models being relatively low among the different ML models; the NB model reported the highest MCC value of 63%. Overall, GB reported the highest AUC, ACC, sensitivity, and F1 measures among all other ML models.
Figure 3 shows the different performance measures of ML models for the anxiety prediction. Performance analysis for the anxiety prediction reported quite similar AUC, ACC, sensitivity, and F1 measures for the RF, NN, KNN, and GB models. SVM and DT models had the lowest accuracy measures. The sensitivity and accuracy were highest at the GB and NB models. The MCC performance measure varied among the ML models, the highest of which were found in the NB and GB models (74.3%, 72.8%, respectively). The SVM had the lowest MCC performance measure (52.4%). Overall, GB achieved the best accuracy and sensitivity, and F1-measures of 82.9%, and a balanced MCC measure of 72.8%.
The GB and RF receiver operating characteristics (ROC) for the moderate and high depression and anxiety classes is presented in Figure 4 (A and B) and Figure 5 (A and B) respectively. Three numerical categories of student depression and anxiety classes were used: low, moderate, and high. The ROC resides in the upper left corner; thus, the gradient boosting algorithm showed a better prediction of positive value than the other studied algorithms (AUC of 91.9% and 93.5% for depression and anxiety, respectively).

Gradient Boosting and Random Forest ROC sensitivity and specificity analysis: (A) Moderate depression symptoms analysis; (B) High depression symptoms analysis.
The 23 variables used for predicting depression and anxiety symptoms in the ML models were classified and ranked from 0 to 100%. The variables with importance level greater than 60% were considered. The participants reported different levels of variables’ importance for depression and anxiety. The distribution of most important variables for depression and anxiety can be found in Figure 6 and Figure 7, respectively. The most significant variables in predicting depression symptoms were stress during lockdown, psychological factors, family problems, and country of residence. While the most significant variables in predicting anxiety were stress during lockdown, financial problems, family problems, social problems, and COVID-19 diagnosis.
In this study, we used machine learning techniques for the prediction of depression and anxiety among pregnant and postpartum women from five Middle Eastern countries (Lebanon, Palestine, Jordan, Bahrain, and Saudi Arabia) during the COVID-19 lockdown. We found that 20.3% of women had moderate to severe maternal depression while 30.1% of them had moderate to severe anxiety, the highest rates being among Palestinian and Jordanian women. The findings of this study are consistent with other studies that indicated high levels of anxiety and depression symptoms among pregnant women during COVID-19 lockdown3. Women reported a significant concern and were found at high risk of developing post-traumatic stress disorder, which requires direct intervention from health care providers for caring of pregnant and postpartum women mental health during COVID-19 pandemic.
The performance of different ML models in predicting maternal depression and anxiety was evaluated through measuring accuracy, specificity, precision, recall, F-measure, and Matthew's Correlation Coefficient (MCC). The accuracy performance of the studied models was similar and did not indicate a significant difference across models. GB and RF reported the best accuracy, sensitivity, and F1 measures for depression prediction. The MCC has been measured for the selected models as an alternative performance measure which is not affected by an unbalanced dataset. The MCC measure showed acceptable scores for both depression and anxiety symptoms. The NB had the highest MCC values followed by GB and RF. Thus, the results in this study are consistent with other studies that assessed the performance of ML classifiers in predicting depression among pregnant and postpartum women, where the RF model showed the highest accuracy and AUC values16,24.
The ML prediction models of postpartum depression developed by Shin et al., (2020) achieved an AUC of 0.79. On the other hand29, utilized a multilayer perceptron approach using several risk factors for depression prediction among Spanish pregnant and postpartum women. The model accomplished an AUC value of 0.82, sensitivity value of 0.84, and specificity of 0.81. Furthermore, in a study, Logistic Regression (LR) classifier was used for depression prediction and achieved an accuracy value of 83.3%30 while employing multiple ML algorithms including KNN, LR, Linear Discriminant Analysis (LDA), and B improved the overall accuracy values to 90%28,29. Additionally, our study was consistent with 24 study in which the ML classifiers were used in predicting postpartum depression and found that depression before pregnancy, stress during pregnancy, and smoking were the most significant risk factors for depression. On the contrary of our findings31, reported that women’s age, marital status, and education were the most significant factors relating to postpartum depression.
Furthermore, our study reported a higher AUC performance measure than other similar ML prediction studies, whose AUC measures were 80%32, 79%5, and 78%33. The results in our study showed an accuracy of 83.3%, which is comparable to the 84% accuracy rate reported in other studies26,30. Nevertheless, the study sample used in this research was collected from diverse population groups across countries, thus diverging background and environmental factors were expected to affect the homogeneity of the dataset.
Significant risk factors for pregnant and postpartum depression and anxiety were found, including country of residence, family income, smoking, COVID-19 diagnosis, number of hours of sleep, stress during the COVID-19 lockdown, family support, social support, financial situation, psychological problems, and work stress. Additionally, risk factors particularly significant for anxiety included education level, locality, and work status. We found the rates of anxiety symptoms to be higher than those of depression among pregnant and postpartum women during the pandemic lockdown. The results showed that Jordanian, Palestinian and Lebanese women had higher anxiety and depression than Saudi and Bahraini women. The increased risk for depression and anxiety among women could be explained by low family income, financial problems, and poor healthcare systems available in these countries34. The study also reported significant differences in anxiety because of locality and education levels. Women with lower education levels reported higher anxiety; similarly, women living in urban areas presented higher anxiety levels. These findings could be explained by the stricter lockdown in cities and the lack of knowledge about the disease among women with lower education levels.
The machine learning models returned the five highest ranking features affecting women’s depression symptoms: stress during pregnancy, psychological problems, family support, country of residence, and number of hours of sleep-in descending order. The highest-ranking features for anxiety were stress during pregnancy, financial problems, family problems, social problems, and COVID-19 diagnosis. Our findings are consistent with similar studies indicating that stress during pregnancy negatively affects women's mental health and might influence incidence of postpartum depression9,14,32. Furthermore, the results were consistent with other studies indicating that family income, and social and psychological problems had significant impact on maternal mental health3,6,8.
The study provides an interesting finding that the accuracy performance measures is relatively high and remains stable between the selected ML models, especially for AUC, accuracy, and sensitivity even at reduced number of variables. This finding is consistent with other studies17,31,34–36 that indicated the high correlation between anxiety and depression symptoms and other socio-demographic risk factors. Thus, stress, family support, financial situation, psychological problems, and country of residence were among the most important variables associated with depression and anxiety during the pandemic lockdown. This is important to consider when developing intervention strategies and programs. The stability in performance measures reflects that the self-reported survey methods can be used as a good assessment tool for anxiety and depression. Moreover, pregnant women had more anxiety symptoms than depression during lockdown, which might affect maternal and child health.
Our findings suggest that deploying machine learning techniques for the screening of pregnant and postpartum women will help in identifying those at highest risk of anxiety and depression through clustering and classification, which will in turn aid in the development of effective preventive interventions. Thus, this research not only addresses the integration of innovative technology for the prediction and diagnosis of depression and anxiety among pregnant and postpartum women in low- and middle-income countries, but given the international dataset used, it assesses the prediction power of several ML algorithms across diverging population groups with distinct risk factors. Additionally, the study included variables specific to the COVID-19 lockdown period, which differentiates it from similar studies.
Nevertheless, some limitations are found in this study, including the extent of the study sample. Having a smaller dataset limits the power of predictions to train a robust range of algorithms, as well as limits the number of clusters and classifications produced by the ML predictive models. In addition, the study used the online self-reported assessment, which was not fully completed by all study participants. Nonetheless, the incomplete and missing data were excluded from out dataset. Finally, a more comprehensive study with a larger and more representative dataset including clinical data is recommended for future research among low- and middle-income countries.
The study assessed the performance measures of machine learning algorithms in predicting depression and anxiety among pregnant and postpartum women in low- and middle-income countries during the COVID-19 pandemic lockdown. Based on the results presented, this research concludes that ML algorithms, particularly (yet not exclusively) Gradient Boosting and Random Forest, are effective predictive models for maternal mental health. These models could be integrated into clinical medical information systems for the automatic prediction of pregnant women’s depression and anxiety based on the identified key variables. The deployment of ML models will provide effective clinical applications for the development of prevention and intervention programs. Likewise, by making use of accurate machine learning techniques such as Random Forest, public health professionals, healthcare providers, and decision-makers will be able to predict rising issues and implement relevant intervention programs to enhance maternal and child health in their respective countries.
Harvard Dataverse: Pregnancy and Mental Health Data during COVID-19
https://doi.org/10.7910/DVN/FCDGEB27
This project contains the following underlying data:
Dataverse: Pregnancy and Mental Health Data during COVID-19
https://doi.org/10.7910/DVN/FCDGEB27
This project contains the following extended data:
Data are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication).
The authors would like to thank the study participants for their time and effort in responding to our study. Furthermore, The authors would like to thank the following for assisting in data collection: Elissa Naim, Manal Fardon (Lebanon); Narmeen Al-Awwad (The Hashemite University, Jordan); Asma Bash (The University of Jordan); Nahla Al-Bayyari (Al-Balqa Applied University, Jordan).; Shreen Sulten, Nada Omar Abdul jawad, and Mahmoud Sami (King Hamad University Hospital, Kingdom of Bahrain); Rana Ghabbash, Asma Imam (Al-Quds University, Palestine); Firas Abdel Jawad (Makassed Hospital); Nabil Thawabteh (Makassed Hospital); Areej Alamery (Ministry of Health, Saudi Arabia).
Author Disclaimer: The views expressed in this article do not necessarily represent the views, decisions or policies of WHO, Saudi FDA or the other institutions with which the authors are affiliated.
| Views | Downloads | |
|---|---|---|
| F1000Research | - | - | 
| PubMed Central Data from PMC are received and updated monthly. | - | - | 
Is the work clearly and accurately presented and does it cite the current literature?
Yes
Is the study design appropriate and is the work technically sound?
Yes
Are sufficient details of methods and analysis provided to allow replication by others?
Yes
If applicable, is the statistical analysis and its interpretation appropriate?
Yes
Are all the source data underlying the results available to ensure full reproducibility?
Yes
Are the conclusions drawn adequately supported by the results?
Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: My research area is in the fields of Artificial Intelligence (AI). Within AI, I am interested in problems related to health or education modeling, machine learning, and data mining, and their interdisciplinary applications to real-life problems. Furthermore, I’m interested in computer networks and cybersecurity, in which I worked on the development of models and methods of network management and protection.
Is the work clearly and accurately presented and does it cite the current literature?
Yes
Is the study design appropriate and is the work technically sound?
Yes
Are sufficient details of methods and analysis provided to allow replication by others?
Yes
If applicable, is the statistical analysis and its interpretation appropriate?
Yes
Are all the source data underlying the results available to ensure full reproducibility?
Yes
Are the conclusions drawn adequately supported by the results?
Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Human reproduction, endocrinology, ART
Alongside their report, reviewers assign a status to the article:
| Invited Reviewers | ||
|---|---|---|
| 1 | 2 | |
| Version 1 04 Apr 22 | read | read | 
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)