Modeling the trend of reported malaria cases in Kisumu county, Kenya [version 1; peer review: 1 approved with reservations, 1 not approved]

Background: Although there has been an extensive scale-up of malaria interventions in Kenya, malaria infections persist at unacceptably high levels in some of the regions. Even with renewed calls to eradicate the disease through increased international donor assistance and country-specific government involvement, malaria is still a cause of worry in endemic regions. The objective of this study was to determine the factors associated with the incidence of malaria in Kisumu County over time. Methods: The study conducted secondary analysis of data from a cross-sectional survey of routinely reported malaria cases. The population of interest were patients confirmed to have malaria by laboratory test. A sample size of 384 was randomly selected from all laboratory-confirmed malaria cases as reported by health facilities in Kisumu County from January 2014 to December 2017. The analysis involved descriptive, trend analysis and time series analysis (ARIMA). A negative binomial regression model was used to measure the effect of each of the selected predictor variables on incidence of malaria and the incidence rate ratio, was reported. Frequency distribution of each of the categorical variables was calculated. Results: The overall pattern of the reported malaria cases had seasonal variations for weekly cases. The best-fitting time series model developed for predicting the number of weekly reported cases of malaria was ARIMA (2, 0, 1). It was observed that the negative binomial was actually the best model to fit the incidences of malaria because the dispersion parameter given by Poisson regression model had been reduced from 70.292 to 1.103. Conclusion: There is a need to encourage health professionals to regularly review and report cases of malaria in their facilities. This is because reporting rates, completeness and the consistency of malaria reported cases remain extremely low. Open Peer Review


Introduction
Malaria is among the major killer diseases within tropical regions. Sub-Saharan Africa bears its burden than any other region across the globe. The parasites are spread to humans through bites from infected Anopheles mosquitoes, introducing the protozoans into the human body. Even with renewed calls to eradicate the disease through increased international donor assistance and country specific government involvement, malaria is still a cause of worry in the endemic regions.
According to the world malaria report, anti-malaria progress has slowed down in many parts of the world and it is unlikely that the world will achieve targets for 2020 set by the World Health organization (WHO) Global Technical Strategy in reducing malaria incidences. A year on from this recognition, still there is no significant progress in global malaria control (WHO, 2018). Globally, 219 million cases of malaria were estimated to have occurred in 2017 as compared to 239 million cases that occurred in 2010. Even though there were 20 million fewer cases by in 2017 than in the year 2010 worldwide, from 2015 to 2017 there was a slight upward change in trend, suggesting that progress had generally stalled (World Malaria Report, 2018). The WHO statistics of 2017 reveal that Sub-Saharan Africa is still suffering greatly from malaria deaths, with 200 million cases comprising of 92% of the global burden. Malaria has impacted the health and economic development of Sub-Saharan African countries negatively and is considered a major impediment to sustainable development in the world's poorest regions (Gallup & Sachs, 2001).
The 2030 Agenda adopted in September 2015 by world leaders was to ensure sustainable development globally. The aim is to eradicate poverty, improving health, reduce inequity and address challenges related to climate change by the year 2030. In order to track progress, they created a number of Sustainable Development Goals (SDGs), each having a specific target to be realized in the next 15 years. One of the set targets is to end epidemics of neglected tropical diseases and malaria by 2030. A set of indicators to track success on malaria eradication is, the number of malaria incidence experienced in a population and the number of people affected by neglected tropical diseases and are seeking interventions against these diseases (SDG Indicators, 2018) The inclusion of universal health coverage (UHC) reaffirms the position of the SDGs on global health priorities (Barasa et al., 2018). The purpose of UHC is to make sure that all citizens of a state are able to access curative, rehabilitative and preventive health services at minimum costs, thereby eliminating disparities in access to health care. This also includes free consultation, diagnosis services for normal ailments and getting drugs for free. The Kenyan UHC model of dispensation by the current government calls for the enhancement of HIV, tuberculosis and malaria treatment in the country, with the aim of eradication of these diseases.
More than 70% of the entire population in Kenya is at risk of malaria which is the major cause of ill health (KNBS, 2015). In 2017, confirmed malaria incidences were 2,783,846 as reported by the public health facilities. The diagnosed cases contribute correspondingly to the high number of malaria deaths in the Kenyan population, which increased from 2016 to 2017 as a result of stalled progress in malaria prevention.
With the aim of achieving the largest reduction rate in morbidity and mortality, the President's Malaria Initiative (PMI) has always considered the regions of Kenya experiencing the high incidences of malaria since 2013. The counties of Kisumu, Vihiga, Migori, Bungoma, Homa Bay, Kakamega, Busia and Siaya all together have a projected population of about 9.6 million in 2018 and bear the highest incidence of malaria in Kenya (President's Malaria Initiative, 2018a). The quarterly surveillance bulletin released in December 2016 revealed that, 73% of all confirmed malaria cases were reported from these counties (President's Malaria Initiative, 2018b). Execution of malaria control interventions focuses on endemic areas and are usually guided by the epidemiological zones. They include community managing the cases as they occur and intermittent preventive treatment in pregnancy (Musuva et al., 2017) It is in the light of this discourse that the Kenyan government and other developing partners in the health sector have remained dedicated in improving the delivery of health service in various parts of the country, with high priority being placed on prevention and control measures of malaria in endemic regions with eventual elimination. This being one of the objectives that were to be realized by 2018, together they aimed at reducing malaria burden by two-thirds (US President's Malaria Initiative, 2019).
Malaria situation with regard to facility reported cases and distribution of malaria remains unclear within the recent past in the study area, despite the above-mentioned studies in various parts of the country. Therefore, this study aimed at mapping the corresponding factors contributing to the incidence of malaria with the trend data on malaria cases across the region.

Study design and setting
A cross-sectional survey using routinely reported national Programme data on malaria cases in the DHIS2 platform, as reported by health facilities in Kisumu County was conducted from January 2014 to December 2017.
Kisumu County is located in the former Nyanza Province and its headquarters are in Kisumu City, which is situated approximately 370 km west of the Kenyan capital, Nairobi. According to 2009 census, Kisumu County had a population of 968,879 people and covers an area landmass of 2085.9 km 2 and 567 km 2 covered by water.

Study population and eligibility of participants
The study population included individuals at risk of malaria in Kisumu County.
All patients confirmed to have malaria by laboratory test were recruited in the study and patients diagnosed with ailments other than malaria were excluded.

Sample size determination
The Fisher et al. (1998) formula was used to determine the sample size for the study as follows: 2 2 n Z pq d = Whereby: n = is the desired sample size (if the target population is over 10,000) and Z = 1.96 which corresponds to 95% confidence level. p = is proportion of patients confirmed to have malaria by laboratory test while q = is (1-p) is proportion of patients confirmed not to have malaria by laboratory test. d = standard error at 95% confidence limit (0.05). Since the proportion of patients confirmed to have malaria by laboratory test (P) was not known, it was estimated to be 50% (Mugenda & Mugenda, 2013) Therefore, based on the given figures Data collection and study variables Any patient who was referred to the laboratory by the clinical officer/nurse after presumptive diagnosis of malaria and fulfilled the criteria for inclusion was enrolled in the study.
Secondary data of all laboratory-confirmed malaria cases as reported by all Health Facilities was downloaded from the District Health Information System (DHIS2) using the standard application programming interface provided. The data included weekly number of malaria cases. Patients testing Positive for malaria was the dependent variable and the predictors were locality, Time in years and Total number of patients tested for malaria. The predictors were assessed as given in Table 1. A conceptual framework depicting the predictor-outcome relationship is displayed in Figure 1.

Ethical considerations
Considering that secondary data was used, there was no interaction with human participants in this study. However, all the personal information generated from this study was treated

Dependent variable
Patients testing positive for malaria Patients confirmed to have malaria by laboratory test

Minimization of biases
Data was downloaded from the DHIS2 using the standard application programming interface provided. Each facility's data was visually cross-checked subjected to range and limit tests to confirm obvious outliers or errors in transcription.

Statistical analysis
Weekly facility-level data on malaria parasitological testing were extracted from the Integrated Disease Surveillance and Response (IDSR) registry through the DHIS2 platform. The analysis involved checking for completeness, consistency and accuracy of the data. This include descriptive, trend analysis and time series analysis (ARIMA). The ARIMA model was used and applied to time series data of malaria reported cases in Kisumu County.
Based on the Box-Jenkins approach, model building involved ascertaining the order of the AR and MA components which was guided by the Auto Correlation Function and partial correlation coefficients plots. Dicker Fuller and Shapiro-Wilk test were used to test for stationarity and normality respectively. Additionally, Pearson's Chi-square was used to test the goodness of fit of both Poisson and negative binomial regression models.
The influence of time in years, total number of patients tested and locality on the number of malaria cases was evaluated.
Frequency distribution of each variable and each category for each of the categorical variables was calculated and presented using R statistical software version 3.5.1. A significant count model negative binomial regression model was used to measure the effect of each of the predictor variables on the number of malaria cases and the incidence rate ratio (IRR), was reported. The statistical significance was measured using the 95% confidence interval and P-values. Missing data was addressed using single imputation method.

Descriptive statistics of the study data
The study included 384 reported malaria cases in the final analysis, for the period January 2014 to December 2017. A line graph illustrating reported malaria cases from 2014-2017 is shown in Figure 2. Figure 3 shows the trend in weekly reported malaria cases.

Model identification
The procedure for model building involved ascertaining the order of the AR and MA components. This was guided by the auto correlation function and partial correlation coefficient plots based on the Box-Jenkins approach.
The autocorrelation function in Figure 4 of malaria reported cases from 2014 to 2017 at 95% confidence level spikes significantly at lag 1, 2, 3 and 4. After lag 4 it cuts off, implying that the moving average component of the ARIMA model of order 4 (MA= 4) would be needed to describe this data set.
The partial autocorrelation function in Figure 5 at 95% confidence level significantly spikes at lag 1 and then quickly cut off, implying that the autoregressive component of the ARIMA model is of order 1 (AR=1).   The ACF and PACF plots in Figure 4 and Figure 5 respectively suggest that a MA = 4 and AR =1 would be needed in describing this data set as coming from a moving average and autoregressive process, respectively.
Testing for stationarity A constant mean, constant variance and a constant autocorrelation structure signifies a stationary series. The test for stationarity using augmented Dicker Fuller test as shown in Table 2 depicts that the time series is stationary with a significant p-value of 0.01.
In comparing the Akaike information criterion (AIC) values of the five likely ARIMA models depicted in Table 3, it was deduced that model ARIMA (2, 0, 1) is the best model for the data since it has the lowest AIC value of 5581.917.
Shapiro-Wilk normality test results (p-value ≤ 0.0001) in Table 4, depicts that the residuals are normally distributed. Consequently, it can be inferred that the true mean of the residuals is approximately equal to zero and there is a constant variance among residuals of the selected model. Table 5 shows the residual deviance for the fitted Poisson regression was given as 26219 on 373 degrees of freedom. To check   the fit of the fitted Poisson model, the value of the residual deviance 26219 on 373 degrees of freedom was considered as observed in Table 5 26219 70.29223, 373 = a dispersion parameter of 70.29223 is an indication that the data is over-dispersed as the value is far greater than 1. This implies that the model is not fit because the mean and variance of the response variable are not equal. If the mean and variance were equal, the residual deviance should be approximately equal to the degrees of freedom.

Modeling incidence of malaria cases results of regression analyses
In this case, the assumption of mean equal to variance of the Poisson random variable was violated and so, a negative binomial regression model was deemed suitable and practical as they cater for over-dispersion. Also, they allow the likelihood ratio and other standard maximum likelihood tests to be applied. The fitted Poisson model had an AIC value of 29318 and a null deviance of 115777 on 383 degrees of freedom. Table 6 represents parameter estimates after validating the Poisson regression model using negative binomial regression model since the assumptions of Poisson regression was not met.
The AIC of this model is 5345.2, a deviance of 411.59 on 373 degrees of freedom also following the chi-square distribution 2 (n p) X D − ∼ degrees of freedom. The dispersion parameter was found to be 1.10346 as shown in Table 7. It was observed that the negative binomial was actually the best model which fit the incidences of malaria as the dispersion parameter given by Poisson regression model had been reduced from 70.29223 to 1.10346, an indication that the assumptions of Poisson regression were not met.  Table 5 below shows the goodness of fit results, which clearly indicates that negative binomial regression model is a better fit for the incidence of malaria compared to the Poisson model. To begin with, the ratios of deviance and Pearson chi-square to degree of freedom 2 (n p)   This increase may be due to improvements in the reporting system by health facilities over time, non-usage of mosquito nets by the residents and low socioeconomic status of the population (Adenomon, 2014). This fluctuating incidence recorded yearly could also be attached to the people not being careful about the illness or inconsistencies by the health care authorities in the management of the illness. The decreasing trend in 2017 could be due to an increase in the use of mosquito nets, general improvement in the malaria awareness by the population of Kisumu owed to the intensive campaigns in malaria education by several organizations. Climatic changes could have also attributed to this (Anokye et al., 2018).
The overall pattern of the reported malaria cases as reported by this study had seasonal variations. Weekly cases towards midyear recorded the highest number of cases, whereas the first weeks of the year recorded the lowest number of cases. This could be as a result of heavy rainfall experienced in the months of April, May, June and July, which is usually the rainfall season in Kisumu, and the temperature associated with rainfall seasons.
This confirms the findings of Craig et al. (1999) and Zhou et al. (2004) who revealed that that climate is a major factor in explaining the incidence of malaria. According to Zhou et al. (2004), the incidence of malaria is influenced by rainfall since mosquitoes need stagnant water for a complete life cycle. At the same time, Mabaso et al. (2007) also discovered that rainfall seasonality as well as minimum temperature are related to the number of Plasmodium falciparum-infective bites got by an individual annually or during a season.
The best fitted time series model developed for predicting the number of weekly reported cases of malaria was ARIMA (2, 0, 1). This suggests that ARIMA (2, 0, 1) can be utilized as a forecasting model to predict the future values of a series. ARIMA works best when data exhibits a constant pattern over time with a minimum amount of outliers (Labys, 2006). Researchers can use this model to forecast malaria reported cases (Mabaso et al., 2001). Nonetheless, it should be updated from time to time with the incorporation of current data.

Significance of predictor variables on malaria incidences
Accordingly, the study investigated the association between time in years, locality and total number of patients tested for malaria in Kisumu using weekly malaria data from 2014 to 2017.
The study results revealed that residing in Kisumu East, Seme, Nyando and Kisumu Central sub-counties were statistically associated to the incidence of malaria as compared to residing in Kisumu West sub-county. Residing in Muhoroni and Nyakach sub-counties were not statistically associated with the incidence of malaria as compared to residing in Kisumu West sub-county. The different localities have demonstrated inequalities in malaria incidence mainly due to variations in access to health services, urbanization and wealth distribution. This is similar to findings by Galactionova et al. (2017), who identified regional inequalities in the coverage of malaria interventions because of the inequality of the wealth distribution within and across many countries.
Many studies have also established that there is less access to services in the rural areas compared to the urban areas, leading to less reported malaria cases from the rural sub-counties. This makes this variable a useful determinant of malaria incidence since some localities are more urban than rural and vice versa.
Generally, there was fluctuation in malaria incidences during the last four years. The year 2017 was statistically associated to the incidence of malaria as compared to the year 2014. Whereas 2015 and 2016 were not statistically associated with the incidence of malaria as compared to the year 2014. Several factors may be responsible for seasonal changes, for example climatic changes, ecologic and environmental factors, social and economic determinants such as change in health care infrastructure. Availability of health facilities and drug resistance also have an impact on incidence of malaria. Although there were different malaria control activities in each year, such as activities to decrease incidence of malaria, insecticide indoor spraying, distribution of ITNs, massive malaria control campaigns and distribution of ITNs the prevalence is still constant. This is similar to findings by Alemu et al. (2012).
There was a significant association between the total number of patients tested for malaria and the incidence of malaria. Though urban areas have normally been at lower risk of malaria, unpredictable and unplanned population growth has been a key factor in making urban or peri-urban transmission an increasing problem .This could explain the high malaria incidence in 2016.This finding is consistent to a study carried out by (Knudsen & Slooff, 1992).

Trends analysis
The objective of this research was to determine the factors associated with the incidence of malaria in Kisumu County over time, given the locality where the patients sought health facility intervention, time in years and the total number of patients who were tested for malaria. The study utilized weekly malaria reported cases data from IDSR Disease Surveillance registry, for the period January 2014 to December 2017 in Kisumu County. Reported malaria cases confirmed by laboratory test was used in the analysis and modeling.
A seasonal pattern was observed in the malaria incidences in Kisumu County. ARIMA (2, 0, 1) model was found to be the best fit statistical model to predict malaria incidences in Kisumu County. The results found from this study offer useful information for policy makers to be able to effectively implement timely and effective malaria preventive and control measures.

Significance of predictor variables on malaria incidences
Goodness of fit model assessment criteria were used in selecting which model will fit the malaria reported cases better by exhausting both Poisson and negative binomial regression models. The negative binomial regression model was found to fit the data better than the Poisson regression model, based on the results. In modeling the incidence of malaria, the quality of the Poisson model was estimated using the AIC. The AIC value produced was 29,318, the deviance was 26,219 and a dispersion parameter of 70.29 showing an over-dispersion in the data leading to a violation of one of its main assumption of the equality of mean and variance parameters. It became a necessary to validate the Poisson regression models using negative binomial because of over-dispersion.
The results obtained suggested that Kisumu East, Seme, Nyando, Kisumu Central localities, the year 2017 and the total number of patients who underwent a laboratory confirmation test for malaria were significant factors for incidence of malaria in Kisumu County over time. Whereas, Nyakach, Muhoroni localities, the years 2015 and 2016 were not significant factors for malaria incidences. The findings provide better insight of environmental and socio-economic effects on malaria and provide important information for malaria prediction. Nonetheless, there is need for additional studies to consider which other factors influence the incidence of malaria notwithstanding environmental and socio-economic factors. Likewise, health professionals practicing in Kisumu County should be encouraged to regularly review and report cases of malaria in their facilities. This is because, reporting rates, completeness and the consistency of malaria reported cases remain extremely low and importantly was poorer in localities where malaria is endemic.

Source data
The data can be obtained from the Integrated Disease Surveillance and Response (IDSR) registry through the DHIS2 platform, using the standard application programming interface provided https://hiskenya.org/dhis-web-commons/security/login. action. Weekly facility-level data on malaria parasitological testing were extracted from Kenya DHIS2 between January 2014 to December 2017.
Access to the raw dataset is possible upon placing a formal request to the Ministry of Health, Kenya (emailing pshealthke@ gmail.com), since the platform is kept under restricted access. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

João L Ferrão
Instiuto Superior de Ciências de Educação, Beira, Mozambique This topic is very important and contributes to Malaria eradication efforts. The effort in malaria combat is multidisciplinary and mathematical modelling can greatly contribute. I see this study as good attempt. However, major revision should be addressed to fulfill the intention.
Results: Figure 1. Are you presenting incidence or malaria cases -can you clarify? ○ Figure 3. Are you presenting trend on incidence or malaria cases -can you clarify? ○

Discussion:
Results are for ARIMA, Negative binomial and, Poisson model. I don't see any discussion on these models compared to other authors.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.
The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias • You can publish traditional articles, null/negative results, case reports, data notes and more • The peer review process is transparent and collaborative • Your article is indexed in PubMed after passing peer review • Dedicated customer support at every stage • For pre-submission enquiries, contact research@f1000.com