Modelling sociodemographic factors that affect malaria prevalence in Sussundenga, Mozambique: a cross-sectional study.

Background: Malaria is still one of the leading causes of mortality and morbidity in Mozambique with little progress in malaria control over the past 20 years. Sussundenga is one of most affected areas. Malaria transmission has a strong association with environmental and sociodemographic factors. The knowledge of sociodemographic factors that affects malaria, may be used to improve the strategic planning for its control. Currently such studies have not been performed in Sussundenga. Thus, the objective of this study is to model the relationship between malaria and sociodemographic factors in Sussundenga, Mozambique. Methods: Houses in the study area were digitalized and enumerated using Google Earth Pro version 7.3. In this study 100 houses were randomly selected to conduct a community survey of Plasmodium falciparum parasite prevalence using rapid diagnostic test (RDT). During the survey, a questionnaire was conducted to assess the sociodemographic factors of the participants. Descriptive statistics were analyzed and backward stepwise logistic regression was performed establishing a relationship between positive cases and the factors. The analysis was carried out using SPSS version 20 package. Results: The overall P. falciparum prevalence was 31.6%. Half of the malaria positive cases occurred in age group 5 to 14 years. Previous malaria treatment, population density and age group were significant predictors for the model. The model explained 13.5% of the variance in malaria positive cases and sensitivity of the final model was 73.3%. Conclusion: In this area the highest burden of P. falciparum infection was among those aged 5–14 years old. Malaria infection was related to sociodemographic factors. Targeting malaria control at community level can combat the disease more effectively than waiting for cases at health centers. These finding can be used to guide more effective interventions in this region.


Background
Malaria is a serious and sometimes fatal disease caused by a Plasmodium spp. parasite that commonly infects Anopheles spp. mosquitos which feed on humans. Although malaria can be a deadly disease, infection and death can be prevented. 1 Almost half of the world's population lives in areas at risk of malaria transmission. Six countries account for more than half of all malaria cases worldwide and Mozambique is among them. 2 In Mozambique, a country in Sub-Saharan Africa, with a population of over 30 million, malaria is one of the leading causes of mortality and morbidity. In 2018, Mozambique recorded the third largest number of malaria cases in the world, accounting for 5% of all cases. 3 The country has made little progress in malaria control. Indoor residual spraying (IRS), insecticide treated bed nets (ITNs), and parasitological diagnosis in health facilities using rapid diagnostic test (RDTs) with effective artemisinin combination therapy (ACT) are the forms of malaria intervention currently being used. The entire country uses RDTs with ACT as the standard of care in public health facilities and ITNs are only available at antenatal clinics, indicated for pregnant women and children under five. 4 Manica Province in central Mozambique has the second highest number of malaria incidences in the country. In the first quarter of 2020, there were 1,039,283 recorded cases with an incidence of 371 per 1000 inhabitants. 5 Sussundenga village, in Manica Province is one of most affected areas, with 31,397 malaria cases reported in 2019.
Malaria risk, disease severity, and clinical outcome depend on environmental, sociodemographic, economic, and behavioral factors. [6][7][8][9][10][11][12] A study in Chimoio, the provincial capital of Manica, close to Sussundenga Village, modelled the influence of climate on malaria occurrence. The study indicated that selected environmental characteristics accounted for 72.5% of malaria incidences, implying that non-environmental factors such as sociodemographic, economic, cultural and behavioral traits would account for the rest. 13 While Mozambique is a country with one of the highest incidences and prevalence of malaria in the region and, it accounts for nearly half of childhood deaths, little is known about the epidemiology to inform appropriate and effective interventions. This is one of two major barriers to expanding control measures in the country with the other being limited funding.
In the country, malaria transmission occurs all year round and, the knowledge of sociodemographic factors that affect malaria is crucial for informing the implementation of the most appropriate and effective malaria interventions to achieve control. In Sussundenga no studies are known in this field. Therefore, the objective of this study was to model the relationship between malaria and sociodemographic factors in Sussundenga's rural municipality.

Study area
The village of Sussundenga is a rural, agrarian community 40 km from the Zimbabwe border, and is 40 km from the provincial capital of Chimoio ( Figure 1).

Data collection
GoogleEarth Pro TM17 Google Earth Pro version 7.3 (Google, Amphitheatre Pkwy, Mountain View, CA, USA). satellite imagery was used to digitize and enumerate all household structures in the village of Sussundenga ( Figure 2). This was a pilot study to determine malaria prevalence, risk factors, and health seeking behaviors. The sample size was determined by feasibility for the study team and study design of the community based cross-sectional survey. All households in the study area were digitized and enumerated using Google Earth Pro. With the aim of enrolling 100, a random sample of 125 households was taken, as backup for refusals and errors in the digitizing process (misclassified non-household structures).  Coordinates of the households were extracted using a GPS device and maps of the selected households to conduct study visits. The study involved two visits to the selected households. The first was a notification visit where the study team introduced themselves to the head of the household and explained the objectives and procedures of the study. It is customary for the head of household to provide permission to the study team before any activities take place at the household involving other household members. Once the head of household gave permission, the study team conducted a household census with the head of household and begin the process of individual written informed consent with the household residents, for all adult (18+ years) residents and parental permission and consent from minors.
After obtaining consent from the household residents, the study team informed participants when they would return the following day to conduct the study activities. The only eligibility requirement was that the residents live in the household full time. Data collectors verbally administered a questionnaire to collect the basic demographics. The field study was carried out from December 2019 to January 2020.
The study nurse collected current malaria specific symptoms by self-report and took participant's temperature using a digital thermometer (GP-300, RoHS:ISO 9000). They then collected a finger prick blood sample to administer a Rapid Diagnostic Test (RDT), RightSign Biotest R (Biotest, Hangzhou Biotest Biotech Co, China, Ref.No:IMPF -C51S). According to the manufacture, this test captures the HRP2 antigen on the strip and has a sensitivity is >99.0%. The results were recorded and, in the event, that a participant was positive for malaria, the study nurse referred them to the Sussundenga rural health center (RHC) for diagnosis confirmation and treatment. The questionnaire was conducted using tablet computers with the REDCap a secure, web-based data capture tool. Study data were collected and managed using REDCap electronic data capture tools hosted at University of Minnesota, downloaded to an Excel sheet for analysis. 18 REDCap (Research Electronic Data Capture) is a secure, web-based application designed to support data capture for research studies, providing: 1) an intuitive interface for validated data entry; 2) audit trails for tracking data manipulation and export procedures; 3) automated export procedures for seamless data downloads to common statistical packages; and 4) procedures for importing data from external sources.

Data analysis
This study was a cross-sectional community-based survey. The analyses were conducted on datasets downloaded from REDCap to an Excel spread sheet. A binary variable was used to represent the dependent variable, malaria infection, to show whether malaria was present (positive) to RDT or absent (negative) was used.
The explanatory variables analyzed were the following sociodemographic factors: age, if the person was an adult or child, age category (0 to 4, 5 to 14, 14 to 24 and <24), sex (male and female), history of malaria treatment, if the person had paid employment, cell phone ownership, education level, population density of the neighborhood, location (neighborhood), household category or type (hut or conventional) and household size.
The malaria prevalence, was calculated by dividing positive cases of malaria by the study population tested at the time multiplied by 100. 19 Prevalence % ð Þ¼ Persons having malaria Tested during the period Â 100 Chi-square for proportion of age group and sex was tested. To establish the relationship between malaria prevalence and sociodemographic factors, logistic backward stepwise logistic regression was used with the following model: Where: G (P i ) = link function P i = likelihood of response for the -ith factor ß o = intercept This method starts with a full (saturated) model and each step gradually eliminates variables that do not contribute.
Allowing for a reduced model that best explains the data. This method is useful since, it reduces the number of predictors, reducing multicollinearity and resolves overfitting. 20 To test the goodness of fit for the model, the Hosmer-Lemeshow (1989) test was performed. 21 To build the final model, the independent variables p<0.05 were included. Outcomes such as scores statistic's, regression coefficient's, significance levels of variable coefficients and, overall classification accuracy were performed.
The sensitivity (conditional probability of a positive test given that the patient has malaria) of the final model measures the proportion of positive that were correctly identified and, was calculated using 22 :

Results
Malaria prevalence, sex, age and, age group and education level of participants From 125 selected households 100 were visited Figure 3 presents the positive and negative cases per visited site. Of the 358 participants tested and, interviewed 108 (31.6%) tested positive for malaria. There was an equal distribution of the enrolled participants among sex, 55% were female and 45% males, Chi-squared = 1.28, P = 0.2578, Degree of freedom (DF) = 1.
The age of participants varied from 1 to 80 years old, with a median of 17 years and an average of 21 standard deviation (SD), 16.2 years old. The participants' education level varied, where 35.1% had no education or less than primary (5 grades), 47.4% had primary or basic school (grades 5 to 10) and 17.5% had secondary and higher education.
Malaria prevalence by age category Figure 4 presents the malaria positivity results for age categories. Half of the malaria positive cases occurred among those 5 to 14 years age category. This category comprises has 32.7% of the Sussundenga population according to the National Institute of Statistics (INE). The age category of over 24 years presented 17.6% of the malaria cases, this age category comprises 30.4% of the Sussundenga population according to the INE. There was a statistically significant difference in positive malaria cases among groups, Chi-squared = 25.857, P = 0.0022, DF = 9.

Association between malaria infection and sociodemographic factors
The backward stepwise regression selection of predictors into the binary logistic model produced a series of models and, in this study, we only present the relevant, initial models and other outputs can be found in appendix 1.       The cut-off value is .500. Table 2. Hosmer-Lemeshow test.
Step Chi-squared DF Sig.   [34][35][36][37][38][39] This reduction in odds is likely due to prophylactic effect of ACT. It provides protection from 2 weeks to 1 month after completion. After repeated infections, the individual develops a certain degree of immunity. Also, when re-infected, patients tend to present a mild form of the diseases without symptoms and, natural active immunity is established after ten or more P. falciparum infections, which can be sufficient to suppress symptoms and clinical signs. 40 Different results were reported in Angola where women who had a previous malaria infection during pregnancy also had a higher risk to contract malaria. 41 This is likely because pregnant women may take sulfadoxine-pyrimethamine rather than ACT.
In this study population density was found as a significant predictor for an individual to test positive for malaria. Similar results were reported in Chimoio 24 in 2016, in a study in 14 endemic African countries 42 in 2017 and in Ethiopia 43 in 2015.
The variables age, if the person was an adult or child, sex, paid employment, cell phone ownership, education level, location (Bairro) and household size were removed from the model due to redundancy and for not adding significance to the model.
The age category is a good proxy for age group and, household size for household category. Paid employment and cell phone ownership variables were included in this study, as rural wealth indicators. These were not found significant predictors contrary to a study in Mozambique that indicated that, children from higher income families (58%) tend to be at lower risk for malaria compared to children from lower income families (43%). 44 Another study in sub-Saharan Africa 45 showed that, malaria prevalence increases with a decrease in income in 2018. The capability model using social, economic, and demographic variables to predict malaria positive cases (model accuracy), was 72.3% in this study. A logistic regression model analyzing hematological parameter and age in Ghana reported 77.4%. 30 The sensitivity of the final model in classifying malaria positive cases was 73.3% and the final model was able to predict 66% (PPV) meaning that the model is very effective in predicting malaria infection using sociodemographic characteristics. In Iran a model predicting malaria re-introduction reported 81.8% positive predictive value 39 and 52.72% in Ghana in a model analyzing hematological parameter and age. 30 Limitations of the study Data collection for this study was conducted in December and January during the rainy and wet season which is also the peak malaria transmission season. Because of this, it is likely that we detected a large number of infections and results reflect this season and my not be representative of malaria dynamics in the dry season. The RightSign Biotest R test detects the histidine rich protein 2 antigen of the P. falciparum parasite which can last over a month in the blood among patients recently treated with malaria.

Conclusion
This study evaluated the sociodemographic factors that affect malaria prevalence in Sussundenga Village, Mozambique. Recent diagnosis and treatment, population density and age category were found to be significant predictors.

Graduate Research and Innovation Program, Centro Universitario FMABC, Santo André, Brazil
Ferrao et al. provide us with a cross-sectional study aimed at estimation of malaria falciparum prevalence in Mozambique, Africa. The study is well written and has good data, but some parts are not clear enough, as follows: Study sample size and figure 2. Please specify the randomization process to select 125 households in the study area. What is the statistical power of the selection of 100 houses? If the study area's landscape is heterogenous in terms of risk of malaria infection, is a random selection the best approach for selection? Should a stratified sample selection approach be used instead? 1.
To estimate sensitivity and specificity, it is essential to have a training dataset and a testing dataset. The training dataset is to build a statistical model and the testing dataset is to evaluate the fitted model. Please specify the training and the testing data. 2.
The built model shows that access to treatment and age are the only important predictors. Please explain the lack of importance of social and economic predictors in the built model. The built model shows a coefficient (1.289) lacking a variable. Please revise.

3.
RDT based on HRP2 has issues to detect falciparum lacking HRP2 genes. This is an important limitation and indicates that the estimates of prevalence may be underestimated.

Conclusions have several issues:
"Recent diagnosis and treatment, population density and age category were found to be significant predictors". Issue: pop density was not significant predictor.
"The model accuracy was 72.3% implying that the model is robust". Issue: it depends on the approach that it was calculated. 6.
"This model indicates that 13.5% of malaria cases can be attributed to sociodemographic factors while previous studied indicated that environmental conditions are attributed to approximately 73% of malaria cases". Issue: be specific of which sociodemographic factor, and environmental conditions were not studied in this study.

If applicable, is the statistical analysis and its interpretation appropriate? Partly
Are all the source data underlying the results available to ensure full reproducibility? Partly

Are the conclusions drawn adequately supported by the results? No
Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Epidemiology of vector-borne diseases
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Author Response 29 Apr 2022
Joao Ferrao, UnISCED, Beira, Mozambique Dear Reviewer. thank you very much for your precious comments. They are very useful and they were used to improve the manuscript.
Ferrao et al. provide us with a cross-sectional study aimed at estimation of malaria falciparum prevalence in Mozambique, Africa. The study is well written and has good data, but some parts are not clear enough, as follows: Study sample size and figure 2.
Please specify the randomization process to select 125 households in the study area., what is the statistical power of the selection of 100 houses? If the study area's landscape is heterogenous in terms of risk of malaria infection, is a random selection ○ the best approach for selection? Should a stratified sample selection approach be used instead? Response: Thanks for the important question raised. Indeed, this was a pilot study to determine malaria prevalence, risk factors, and health seeking behaviors. The sample size was determined by feasibility for the study team and study design of the community based cross-sectional survey.
All households in the study area were digitized and enumerated using Google Earth Pro. A random sample of 125 households was taken, with the aim of enrolling 100 to account for errors in the digitizing process.
The village is relatively small (156.9 Km 2 ) and we added the area of the village in the test. The Sussundenga village is within an area of 156.9 Km 2 .
To estimate sensitivity and specificity, it is essential to have a training dataset and a testing dataset. The training dataset is to build a statistical model and the testing dataset is to evaluate the fitted model. Please specify the training and the testing data. For this case where we used an accuracy with cut-off=0.5, we don't see the need of training data.
After data imputation and engendering feature the third step was to split data into train and test. This was carried out by the software.
The built model shows that access to treatment and age are the only important predictors. Please explain the lack of importance of social and economic predictors in the built model. RDT based on HRP2 has issues to detect falciparum lacking HRP2 genes. This is an important limitation and indicates that the estimates of prevalence may be underestimated. ○ Response: This is true and a limitation of the current HRP2 based RDTs. There are limited data on HRP2 deletions throughout Mozambique and specifically in Manica Province. However, in a study published in 2019 ( https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6711899/) the authors found very few cases of HRP2 deletion and impacts on the efficacy of the current RDTs. Of those infections not detected by the RDTs, not all were P. falciparum. It is unlikely in our study setting that HRP2 deletions impacted the efficacy of the RDT and biased our prevalence estimates.

Conclusions have several issues:
"Recent diagnosis and treatment, population density and age category were found to be significant predictors". Issue: pop density was not significant predictor. "The model accuracy was 72.3% implying that the model is robust". Issue: it depends on the approach that it was calculated. "This model indicates that 13.5% of malaria cases can be attributed to sociodemographic factors while previous studied indicated that environmental conditions are attributed to approximately 73% of malaria cases". Issue: be specific of which sociodemographic factor, and environmental conditions were not studied in this study.
In the manuscript entitled, "Modelling sociodemographic factors that affect malaria prevalence in Sussundenga, Mozambique: a cross-sectional study", the authors present the results of a survey and statistical analysis designed to uncover predictors of positive parasite status by rapid diagnostic test in this area of endemic malaria transmission. Malaria is a significant contributor to disease burden in Mozambique, so this topic is of high importance. In my report below I raise a number of questions for the authors regarding aspects of the analysis that were unclear to me (i.e., perhaps where the clarity of presentation could be improved) or where I felt that further analysis might strengthen the conclusions.

Context and Aims:
The framing of the study in terms of context and aims as it currently stands deserves considerable review. Owing precisely to its status as one of the most malarious countries in the world, Mozambique has attracted a great deal of attention from malaria researchers over many decades and there have been many studies investigating environmental and socio-demographic factors behind malaria transmission in the country. This wealth of prior research and understanding contrasts strikingly with the authors' proposition that "little is known about the epidemiology to inform appropriate and effective interventions". Importantly, it also raises the question of why information on two wellknown factors shaping malaria transmission in Mozambique, namely IRS and ITN use, do not seem to have been captured or included in this study? Another key known factor, household construction, may have been included but there is no detail that I could see in the manuscript to illuminate the meaning of the "household category" variable used in the model? (iv) The population density variable must have a large dynamic range because it is assigned a slope ("constant"?) of 0.000 in table 8: could this be rescaled so that we can see its slope within the 3 decimal places?
(v) I was confused why the age categories changed from four in the earlier discussion to three in Table 4? Also, it would help to nominate one age group as the reference group so that the odds ratios can be understood as relative to that group.
(vi) I'm confused by the focus on understanding the predictive accuracy of the model in terms of specificity and sensitivity(*), which are appropriate for a diagnostic tool, but which may not be particularly relevant to the use of a risk factor model such as this one for field epidemiology. I.e., if the end use is to prioritise the delivery of a particular intervention such as seasonal malaria chemopraxis then identifying that a certain age group has twice the parasite prevalence of another could be of substantial value even where the sensitivity was low because prevalence itself was very low across both strata. This comes back to the context and aims of the study, in the sense that the value of the fitted model (or more precisely the knowledge discovered through it) is ultimately something that exists only relative to the way in which it is intended to be used.
(*also: are these defined according to a thresholding of the predictive prevalence above and below 50%?) https://www.researchgate.net/publication/357008627_P_Falciparum_Community_Prevalence_and_Health_Se Another key known factor, household construction, may have been included but there is no detail that I could see in the manuscript to illuminate the meaning of the "household category" variable used in the model? Household size and household construction are very important variables. In the methodology we rephrase the variables and their meaning. We hope that now is clear that household category means type of house or type of construction. In this study, household category or household construction was found as a predictor variable.
Regarding the statement "ITNs are only available at antenatal clinics, indicated for pregnant women and children under five": I'm not sure that this is well phrased, since the WHO recommends universal net use in high transmission areas. While ITN distribution campaigns focus on the highest risk groups of young children and pregnant women these are not the only groups who should be advised to use bed nets; likewise, ITNs would generally be available commercially at markets. (See e.g. Scott et al. Mal J, 2021) 6 . ○ Response: Thank you for the comment. We do agree that "young children and pregnant women are not the only groups who should be advised to use bed nets; Indeed, recent studies are indicating an age shift in Malaria due this situation.
As for the statement: "likewise, ITNs would generally be available commercially at markets. We would agree in a "normal" market driven country. For the Mozambican case where, most people are living bellow the poverty line, buying a mosquito net for prevention can be a luxury.
For example, in 2021, a mosquito factory in Chimoio, Manica closed it is doors and the major reason was lack of clients to purchase the nets.
We added this useful contribution in our discussion.

Statistical Analysis:
The statistical analysis method used to derive the primary results of this study, namely the identification of key factors behind malaria prevalence in the study area, is a stepwise logistic regression, which is indeed appropriate for this objective. Some minor details require clarification or revision: "To evaluate potential confounders and, effect modifiers between the final model variables, the Hosmer-Lemeshow (1989) test was performed." This doesn't make sense to me: the HM test is for model specification / acceptable fit, rather than for breaking variables down into their roles in the causal hierarchy.
○ Response: Thanks for a very good observation We agree, to avoid confusing we rephrased the sentence.
(iii) I was confused by the chi square test reports in some places: for the distribution by sex the chi-squared statistic of 0.081 doesn't sound like the right order of magnitude and in fact I get 0.081 as the p-value for a binomial exact test on this sample so perhaps this is a typo?; ○ Response: Thank you for the observation.
As stated in the methodology, "Sussundenga has an estimated population of 31,429 inhabitants, 47% males and 53% females". In the present study, the enrolled participants among sex, 55% were female and 45% males. Using the Biostat 5.3 software we find the following out put The table was corrected.
For the tests by age category, since this is a four x two table I would have thought we're looking at degrees of freedom rather than 6? ○ Response: Thanks for the observation.
We believe that is more appropriate to check the age category compared also to sample results and National Institute of Statistics projections for accuracy, giving us a 4 x 4 contingency table. The following recalculations are presented and were corrected in the manuscript.