Machine learning techniques for predicting depression and anxiety in pregnant and postpartum women during the COVID-19 pandemic: a cross-sectional regional study

Background: Maternal depression and anxiety are significant public health concerns that play an important role in the health and well-being of mothers and children. The COVID-19 pandemic, the consequential lockdowns and related safety restrictions worldwide negatively affected the mental health of pregnant and postpartum women. Methods: This regional study aimed to develop a machine learning (ML) model for the prediction of maternal depression and anxiety. The study used a dataset collected from five Arab countries during the COVID-19 pandemic between July to December 2020. The population sample included 3569 women (1939 pregnant and 1630 postpartum) from five countries (Jordan, Palestine, Lebanon, Saudi Arabia, and Bahrain). The performance of seven machine learning algorithms was assessed for the prediction of depression and anxiety symptoms. Results: The Gradient Boosting (GB) and Random Forest (RF) models outperformed other studied ML algorithms with accuracy values of 83.3% and 83.2% for depression, respectively, and values of 82.9% and 81.3% for anxiety, respectively. The Mathew’s Correlation Coefficient was evaluated for the ML models; the Naïve Bayes (NB) and GB models presented the highest performance measures (0.63 and 0.59) for depression and (0.74 and 0.73) for anxiety, respectively. The features’ importance ranking was evaluated, the results showed that stress during pregnancy, family support, financial issues, income, and social support were the most significant values in predicting anxiety and depression. Conclusion: Overall, the study evidenced the power of ML models in predicting maternal depression and anxiety and proved to be an efficient tool for identifying and predicting the associated risk factors that influence maternal mental health. The deployment of machine learning models for screening and early detection of depression and anxiety among pregnant and postpartum women might facilitate the development of health prevention and intervention programs that will enhance maternal and child health in low- and middle-income countries.


Introduction
The emergence of the Coronavirus disease  in late 2019 and early 2020 has severely impacted the global population. Being characterized as an infectious disease primarily spreading through droplets of saliva or nasal discharge 1 , the infection rate is significant and its consequences can be lethal 2-4 . The disease is particularly dangerous to vulnerable populations, such as the elderly and those with underlying medical conditions including cardiovascular disease, diabetes, respiratory disease, and cancer [3][4][5] . Nonetheless, the specific implications of COVID-19 infection on pregnancy and childbirth have remained unidentified throughout the pandemic [6][7][8][9] .
The uncertainty about the nature, transmission, and mortality of the virus, together with its rapid spread and the consequential social and mobility restrictions (quarantines, lockdowns, and social distancing) have impacted the mental health of pregnant women worldwide 5, 10 . In fact, the psychological effects of COVID-19 on pregnant women may lead to the appearance or increment of stress, anxiety, and depression symptoms as indicated in Broche-Perez et al. study 11 . In a 2020 study by Tokgoz et al. 12 , the authors demonstrated that pregnant women during the COVID-19 pandemic presented higher rates of depression, stress, and anxiety than pregnant women before the pandemic. The study further evidenced that mental health disorders during pregnancy can result in pre-term labour, low birth weight, delayed neuropsychiatric development in children, preeclampsia, and unscheduled caesarean delivery 11,13 .
However, mental health disorders among pregnant women are widely undiagnosed and could result in worse consequences for mother and child 14,15 . Furthermore, the traditionally applied screening programs for psychological conditions rely on self-reporting and are for the most part designed to detect the population with pre-existing symptoms 6,16 . Contrary to the available traditional assessment of psychological disorders, artificial intelligence (AI) models can predict potential incidence of depression and anxiety among pregnant women, which would facilitate pre-emptive action, treatment, and early diagnosis 16,17 . As a matter of fact, one subarea of AI, machine learning (ML), has previously been used in the field of mental health for the prediction of psychological conditions such as anxiety, depression, obsessive-compulsive disorder (OCD), and post-traumatic stress disorder (PTSD), both prior to and during the onset of the COVID-19 pandemic 18,19 .
In a 2020 study by Seah et al. 20 , five ML algorithms where applied for the prediction of anxiety, depression, and stress on individuals around the world using the Depression, Anxiety and Stress Scale questionnaire (DASS 21). The Random Forest classifier, a machine learning algorithm used for data classification, had the best performance accuracy in predicting psychological conditions. A study by Priya et al. 21 , utilized ML tools for the creation of a new diagnostic methodology for anxiety and depression to replace traditional diagnosis through self-reported symptoms. The tool represented an improvement in diagnosis and treatment. Similarly, a study by Richter et al. 22 applied eight ML algorithms, including a hybrid model, for the prediction of psychological problems such as anxiety, depression, and stress. The study found that the hybrid model presented higher accuracy rates than the single algorithms used. In relation to maternal health, only a few studies have been found to use ML models for the prediction of psychological disorders 16,19,23,24 . Among the available studies in this field, a study by Shin et al. 24 , developed a predictive model for postpartum depression using nine different ML approaches, the results showed that the Random Forest model achieved the highest accuracy rates. In addition, the study of Hochman et al. 16 , provided evidence that ML models are able to accurately screen and identify populations at high risk of postpartum depression for preventive intervention.
Thus, the use of ML techniques in mental health prediction and diagnosis might yield positive results in the reduction of self-harm and the provision of timely treatment for at-risk patients. However, very limited studies have used ML in maternal mental health, especially in relation to COVID-19 11,16,19,24 . This study aims to enrich the literature by assessing the performance of ML techniques in studying the effect of the COVID-19 lockdown on maternal mental health in low-and middle-income countries. The study used ML for predicting depression and anxiety symptoms from different features during the COVID-19 lockdown. To date, this is the first international study using population-based datasets from five countries (Palestine, Lebanon, Jordan, Saudi Arabia, and Bahrain) that accounts for multiple maternal and mental health variables.

Ethics
The study obtained written approval of the Ethics Committee in Scientific Research of University of Jordan, Jordan (19/2020/585), as well as universities from all participating countries. Written informed consent was obtained from all participants.

Data set.
This study is the first of its kind as it utilized a regional dataset for evaluating the performance of ML algorithms in predicting depression and anxiety among pregnant and postpartum women during the COVID-19 lockdown in five Arab countries. The stratification of participants into different sets of data according to country of residence and the overall prediction for the total participants provide important and interesting information about the effect of the COVID-19 lockdown on pregnant women. A total of 3,569 women (1,939 pregnant and 1,630 postpartum) from five countries (Jordan, Palestine, Lebanon, Saudi Arabia, and Bahrain) participated in the study. Data were collected during the period of lockdown from July to December 2020.
The data set was extracted from a regional study conducted by the authors for assessing the impact of the COVID-19 pandemic on pregnant and postpartum women's physical and mental health. The study collected data from five Arab countries including: Lebanon, Palestine, Jordan, Bahrain, and Saudi Arabia. A total of 3569 women (currently pregnant or were pregnant during the COVID-19 pandemic lockdown) were selected in this study. The data set including socio-demographic variables and risk factors related to depression, anxiety, and physical and mental health among pregnant women is shown in Table 1.
Questionnaire A cross-sectional study design was used for collecting the study data during the COVID-19 pandemic from August to November 2020, in the five listed countries: Lebanon, Palestine, Jordan, Bahrain, and Saudi Arabia) in the Arab region. The snowball sampling method was used to recruit pregnant women. The initial participants were contacted through the research team's professional network, and the obstetric and maternity clinics in the participating countries. Data was collected through a web-based questionnaire, which was previously validated in two published studies 25,26 , and the software was designed by Palestinian National Nutrition Platform. The questionnaire was disseminated by researchers through their social media networks (Facebook, WhatsApp, and Instagram), and the participating universities network. Furthermore, hard copies of the questionnaire were distributed to women in some areas with limited internet access through obstetric and maternity clinics. The survey considered several sociodemographic features of pregnant women, medical history, nutrition patterns, physical activity, smoking, education, residency, economic situation, anxiety indicators, and depression indicators. Questions regarding the pre-pregnancy period were not included in the survey. For the full questionnaire, see Extended data 27 .

Criteria
The following criteria guided the data collection process: (i) pregnancy during the COVID-19 pandemic period; (ii) normal pregnancy (i.e., no complications); (iii) aged over 18 (iv) place of residence (the five study countries); (v) having answered all questions in the questionnaire. Moreover, the exclusion criteria included conception during the intra-COVID19 pandemic period, as well as risk factors such as miscarriage and chronic health complications.
Outcome variables. The outcome variables included pregnant women's depression and anxiety levels. Participants were assessed for depression and anxiety using the Patient Health Questionnaire (PHQ-9) and Generalized Anxiety Disorder (GAD-7) scales.
Depression: The depression data was collected using the Patient Health Questionnaire (PHQ), a self-reported scale designed by 28 to screen for symptoms of depression. The PHQ items are composed of four answer categories (Never =0; several days =1, more than half of the days =2, and nearly every day =3). The total score was calculated by summing the scale items responses. The PHQ total score was classified into the following groups: Low=0; Moderate=1; High =2.
Anxiety: The GAD-7 scale 28 was used for measuring generalized anxiety disorder. The anxiety score was estimated by assigning scores of 0, 1, 2, and 3 to the response categories of (Never =0; several days =1, more than half of the days =2, and nearly every day =3). The total score was calculated by summing the scale items responses. The GAD-7 total score was classified into the following groups: Low=0; Moderate=1; High =2.
Features. All potential features (predictors), including associated risk factors and socio-demographic variables were considered in the ML models for the assessment of pregnant women before and during the COVID-19 lockdown. The socio-demographic features included women's age, age at marriage, country of residence, education level, work status, family income, and locality.
Associated risk factors included pre-and post-COVID-19 pandemic food consumption patterns, smoking status, body mass index (BMI), physical activity, healthy food consumption (fruits, vegetables, meat, grains, and dairy products), unhealthy food consumption (sweets, soft drinks, energy drinks, and fast food), physical activity level, technology-related activities, COVID-19 diagnosis, relatives diagnosed with COVID-19, underlying health conditions (diabetes, gestational diabetes, hypertension, gestational hypertension, heart and arterial diseases, liver diseases, high cholesterol, high triglycerides, thyroid disorders, or respiratory problems), cancellation of follow up appointment due to COVID-19, number of pregnancies, number of abortions, family problems, social problems, psychological stress, and work-related stress.

Data analysis.
General descriptive analysis and ANOVA tests were used for describing the distribution of women based on risk factors, while prediction and classifications were measured using ML techniques. The classification accuracy, confusion matrix, precision, sensitivity, and specificity were used for evaluating the ML prediction performance. The ML algorithms were applied in the Python AI development platform to predict the incidence and severity level of depression and anxiety during COVID-19 among pregnant women. The data set was divided into a 70:20:10 ratio for training, testing and validation.
To evaluate whether the ML algorithms can predict pregnant women's depression and anxiety, the outcome variables and features were included in the ML models. The performance of ML models was first evaluated for depression and then for anxiety separately. The performance metrics for Gradient Boosting Machines ( Results Table 2 and Table 3 show the descriptive analysis of the participants' data by anxiety and depression levels. Results indicated that the women's mean age was 28.5 (±5.3) years. Among participants, 11.6% and 8.7% had moderate and high levels of depression, respectively while 22.4% and 7.7% had moderate and high levels of anxiety, respectively.
The rates of anxiety and depression were found to differ by the country of residence, education level, family income, work stress, social problems, health problems, family problems, financial problems, psychological problems, sleeping hours per day, fear of COVID-19 infection, and unhealthy food consumption. A greater percentage of women with high levels of depression and anxiety symptoms were found in Palestine (19.9%, 22.5%) and Jordan (18.6%, 21.0%), respectively. Furthermore, the results in Table 2 indicated that the highest percentages of depression were among women with self-reported family problems (35.3%), sleep deprivation (<6 hours per night) (31.6%), psychological problems (30.5%), financial problems (29.7%), COVID-19 diagnosis (27.4%), social (25.7%), and work stress (23.1%). Furthermore, high levels of anxiety symptoms were found among women with self-reported family problems (33.2%), financial (23.6%), social (21.6%), and psychological problems (22.5%) as shown in Table 3.   Figure 1 represents the comparison of accuracy rates among the selected machine learning algorithms in predicting women's depression and anxiety symptoms. All tested models reported a high level of accuracy (ranging from 80.0-83.3%) for predicting depression among pregnant women except for NB. On the other hand, various levels of accuracy were reported for the ML models when predicting anxiety. The GB model presented the highest accuracy rate (82.9%) followed by RF and NB (81.3%). Nonetheless, all the ML models reported an acceptable rate of accuracy for both depression and anxiety symptoms.

ML performance measures
Additional performance measures were used for evaluating the ML prediction performance of depression and anxiety symptoms, including AUC, sensitivity, specificity, F-Measure, and MCC. Figure 2 illustrates the different performance measures of depression prediction models. Balanced accuracy, sensitivity, and F measures were observed across the ML models. The AUC varied across models; DT reported the lowest AUC rate (68.8%), while other models ranged from 82.6% to 91.9%. The MCC performance measure showed high variability across models being relatively low among the different ML models; the NB model reported the highest MCC value of 63%. Overall, GB reported the highest AUC, ACC, sensitivity, and F1 measures among all other ML models. Figure 3 shows the different performance measures of ML models for the anxiety prediction. Performance analysis for the anxiety prediction reported quite similar AUC, ACC, sensitivity, and F1 measures for the RF, NN, KNN, and GB models. SVM and DT models had the lowest accuracy measures. The sensitivity and accuracy were highest at the GB and NB models. The MCC performance measure varied among the ML models, the highest of which were found in the NB and GB models (74.3%, 72.8%, respectively). The SVM had the lowest MCC performance measure (52.4%). Overall, GB achieved the best accuracy and sensitivity, and F1-measures of 82.9%, and a balanced MCC measure of 72.8%.
The GB and RF receiver operating characteristics (ROC) for the moderate and high depression and anxiety classes is presented in Figure 4 (A and B) and Figure 5 (A and B) respectively. Three numerical categories of student depression and anxiety classes were used: low, moderate, and high. The ROC resides in the upper left corner; thus, the gradient boosting algorithm showed a better prediction of positive value than the other studied algorithms (AUC of 91.9% and 93.5% for depression and anxiety, respectively).

Features' importance
The 23 variables used for predicting depression and anxiety symptoms in the ML models were classified and ranked from 0     to 100%. The variables with importance level greater than 60% were considered. The participants reported different levels of variables' importance for depression and anxiety. The distribution of most important variables for depression and anxiety can be found in Figure 6 and Figure 7, respectively. The most significant variables in predicting depression symptoms were stress during lockdown, psychological factors, family problems, and country of residence. While the most significant variables  in predicting anxiety were stress during lockdown, financial problems, family problems, social problems, and COVID-19 diagnosis.

Discussion
In this study, we used machine learning techniques for the prediction of depression and anxiety among pregnant and postpartum women from five Middle Eastern countries (Lebanon, Palestine, Jordan, Bahrain, and Saudi Arabia) during the COVID-19 lockdown. We found that 20.3% of women had moderate to severe maternal depression while 30.1% of them had moderate to severe anxiety, the highest rates being among Palestinian and Jordanian women. The findings of this study are consistent with other studies that indicated high levels of anxiety and depression symptoms among pregnant women during COVID-19 lockdown 3 . Women reported a significant concern and were found at high risk of developing post-traumatic stress disorder, which requires direct intervention from health care providers for caring of pregnant and postpartum women mental health during COVID-19 pandemic.
The performance of different ML models in predicting maternal depression and anxiety was evaluated through measuring accuracy, specificity, precision, recall, F-measure, and Matthew's Correlation Coefficient (MCC). The accuracy performance of the studied models was similar and did not indicate a significant difference across models. GB and RF reported the best accuracy, sensitivity, and F1 measures for depression prediction. The MCC has been measured for the selected models as an alternative performance measure which is not affected by an unbalanced dataset. The MCC measure showed acceptable scores for both depression and anxiety symptoms. The NB had the highest MCC values followed by GB and RF. Thus, the results in this study are consistent with other studies that assessed the performance of ML classifiers in predicting depression among pregnant and postpartum women, where the RF model showed the highest accuracy and AUC values 16,24 .
The ML prediction models of postpartum depression developed by Shin et al., (2020) achieved an AUC of 0.79. On the other hand 29 , utilized a multilayer perceptron approach using several risk factors for depression prediction among Spanish pregnant and postpartum women. The model accomplished an AUC value of 0.82, sensitivity value of 0.84, and specificity of 0.81. Furthermore, in a study, Logistic Regression (LR) classifier was used for depression prediction and achieved an accuracy value of 83.3% 30 while employing multiple ML algorithms including KNN, LR, Linear Discriminant Analysis (LDA), and B improved the overall accuracy values to 90% 28,29 . Additionally, our study was consistent with 24 study in which the ML classifiers were used in predicting postpartum depression and found that depression before pregnancy, stress during pregnancy, and smoking were the most significant risk factors for depression. On the contrary of our findings 31 , reported that women's age, marital status, and education were the most significant factors relating to postpartum depression.
Furthermore, our study reported a higher AUC performance measure than other similar ML prediction studies, whose AUC measures were 80% 32 , 79% 5 , and 78% 33 . The results in our study showed an accuracy of 83.3%, which is comparable to the 84% accuracy rate reported in other studies 26,30 . Nevertheless, the study sample used in this research was collected from diverse population groups across countries, thus diverging background and environmental factors were expected to affect the homogeneity of the dataset.
Significant risk factors for pregnant and postpartum depression and anxiety were found, including country of residence, family income, smoking, COVID-19 diagnosis, number of hours of sleep, stress during the COVID-19 lockdown, family support, social support, financial situation, psychological problems, and work stress. Additionally, risk factors particularly significant for anxiety included education level, locality, and work status. We found the rates of anxiety symptoms to be higher than those of depression among pregnant and postpartum women during the pandemic lockdown. The results showed that Jordanian, Palestinian and Lebanese women had higher anxiety and depression than Saudi and Bahraini women. The increased risk for depression and anxiety among women could be explained by low family income, financial problems, and poor healthcare systems available in these countries 34 . The study also reported significant differences in anxiety because of locality and education levels. Women with lower education levels reported higher anxiety; similarly, women living in urban areas presented higher anxiety levels. These findings could be explained by the stricter lockdown in cities and the lack of knowledge about the disease among women with lower education levels.
The machine learning models returned the five highest ranking features affecting women's depression symptoms: stress during pregnancy, psychological problems, family support, country of residence, and number of hours of sleep-in descending order. The highest-ranking features for anxiety were stress during pregnancy, financial problems, family problems, social problems, and COVID-19 diagnosis. Our findings are consistent with similar studies indicating that stress during pregnancy negatively affects women's mental health and might influence incidence of postpartum depression 9,14,32 . Furthermore, the results were consistent with other studies indicating that family income, and social and psychological problems had significant impact on maternal mental health 3,6,8 .
The study provides an interesting finding that the accuracy performance measures is relatively high and remains stable between the selected ML models, especially for AUC, accuracy, and sensitivity even at reduced number of variables. This finding is consistent with other studies 17,31,34-36 that indicated the high correlation between anxiety and depression symptoms and other socio-demographic risk factors. Thus, stress, family support, financial situation, psychological problems, and country of residence were among the most important variables associated with depression and anxiety during the pandemic lockdown.
This is important to consider when developing intervention strategies and programs. The stability in performance measures reflects that the self-reported survey methods can be used as a good assessment tool for anxiety and depression. Moreover, pregnant women had more anxiety symptoms than depression during lockdown, which might affect maternal and child health.
Our findings suggest that deploying machine learning techniques for the screening of pregnant and postpartum women will help in identifying those at highest risk of anxiety and depression through clustering and classification, which will in turn aid in the development of effective preventive interventions. Thus, this research not only addresses the integration of innovative technology for the prediction and diagnosis of depression and anxiety among pregnant and postpartum women in low-and middle-income countries, but given the international dataset used, it assesses the prediction power of several ML algorithms across diverging population groups with distinct risk factors. Additionally, the study included variables specific to the COVID-19 lockdown period, which differentiates it from similar studies.
Nevertheless, some limitations are found in this study, including the extent of the study sample. Having a smaller dataset limits the power of predictions to train a robust range of algorithms, as well as limits the number of clusters and classifications produced by the ML predictive models. In addition, the study used the online self-reported assessment, which was not fully completed by all study participants. Nonetheless, the incomplete and missing data were excluded from out dataset. Finally, a more comprehensive study with a larger and more representative dataset including clinical data is recommended for future research among low-and middle-income countries.

Conclusion
The study assessed the performance measures of machine learning algorithms in predicting depression and anxiety among pregnant and postpartum women in low-and middle-income countries during the COVID-19 pandemic lockdown. Based on the results presented, this research concludes that ML algorithms, particularly (yet not exclusively) Gradient Boosting and Random Forest, are effective predictive models for maternal mental health. These models could be integrated into clinical medical information systems for the automatic prediction of pregnant women's depression and anxiety based on the identified key variables. The deployment of ML models will provide effective clinical applications for the development of prevention and intervention programs. Likewise, by making use of accurate machine learning techniques such as Random Forest, public health professionals, healthcare providers, and decision-makers will be able to predict rising issues and implement relevant intervention programs to enhance maternal and child health in their respective countries.

Iyad Tumar
Birzeit University, Birzeit, Palestinian Territory The study used the machine learning techniques in predicting the effect of COVID 19 on the women depression and anxiety. The machine learning models used original data set collected from Arab countries during COVID19-pandemic lockdown. The study sample composed of 3569 women (1939 pregnant and 1630 postpartum). The study indicated that the gradient boosting algorithm reported the highest performance compared to other algorithms.
The study addressed an important problem in developing countries, and the result of the study is very encouraging, mainly in the deployment of machine learning in the fields of Mental health and public health. Furthermore, the study is well-written and organized. However, a few issues need more clarifications: The study contains many figures and tables, will be much better to add some of them in the Appendix.
based, wrapper-based or embedded-based methods? It seems they have based on embedded one, but they should be precise.
Why feature extraction methods are not used such as LDA, ICA and t-SNE? 2.
Article contains a lot of figures and tables per evaluation measure. I think these figures and tables can be put in the appendix and focus on one evaluation measure such as accuracy.

3.
You may provide a matrix that shows the correlation between each couple of features and between each feature and the different classes.

4.
What is the b CA mentioned under the figures 2 and 3? (Component analysis or classification accuracy).

5.
Authors should show a comparison of their work with other works even they are not applied on the same dataset. It is important to see how the efficiency of each machine learning varies between the work and other works.

6.
Authors stated that they used 3569 samples in their study while when we refer to the table  2 and table 3 there is something missing. For example, if we count in table 2 all the samples in the row that corresponds to the country of residence regardless the depression level, they sum to 1601. In contrast, the sum of the samples in the row smoking during pandemic is 1000 and others sums to 1600. Are there missing features' values?

7.
There is something missing in table 2 (table 3 also). Table 2 shows that the total samples belonging to class "No depression" is 878 over 1601, "Moderate depression" is 413/1601 and "High depression" is 310/1601. Is the total number of samples are 1601? I thought they are 3569.

8.
In page 5 -section "Results", you mention "Among participants, 11.6% and 8.7% had moderate and high levels of depression, respectively while 22.4% and 7.7% had moderate and high levels of anxiety, respectively". It means, the class "No depression" constitute the 80% of the dataset and "No anxiety" 70% of the dataset. How is this class-imbalance handled during experimentations?

9.
Put the confusion matrix for depression-prediction and the one for anxiety-prediction to see how the misclassified samples are distributed.

10.
The dataset is split into 70:20:10 (training:testing:validation). What type of parameters did you use that need validation? Did you split them randomly into three groups? You should use techniques other than random sampling such as cross-validation.

11.
Authors should show the training error to compare it with the testing error in order to show if there is overfitting.

12.
No need to put the formulas of the evaluations measures (Precision, recall, AUC,…). They are known. 13.