Dynamics of Ebola epidemics in West Africa 2014

This paper investigates the dynamics of Ebola virus transmission in West Africa during 2014. The reproduction numbers for the total period of epidemic and for different consequent time intervals are estimated based on a newly suggested linear model. It contains one major variable - the average time of infectiousness (time from onset to hospitalization) that is considered as a parameter for controlling the future dynamics of epidemics. Numerical implementations are carried out on data collected from three countries Guinea, Sierra Leone and Liberia as well as the total data collected worldwide. Predictions are provided by considering different scenarios involving the average times of infectiousness for the next few months and the end of the current epidemic is estimated according to each scenario.


Introduction
The outbreak of the 2014 Ebola virus epidemic in West Africa, started in late 2013, does not seem to be under control and accurate predictions appear to be extremely difficult. The major reason for this might be due to unstable treatment conditions that provide different reproduction numbers at different periods. However, there are also other challenges related to the mathematical modeling of this epidemic. To address these challenges, several new models have been suggested that show quite different results, we note a few of them published recently [1][2][3][4][5][6][7][8] .
In this article we introduce a new model to study the dynamics of the current outbreak by considering the average infectious period as a time-dependent parameter. It is derived from the well studied SIR (Susceptible-Infectious-Recovery) model with time delay (e.g. 9,11), where the decrease in the number of susceptible population in compartment S is the major force stopping epidemics. The susceptible population S is often considered as a whole population. A major drawback of this model, in terms of the current epidemic, is that the population infected constitutes a very small proportion of the total population, a very small decrease in S has almost no effect on compartment I.
We discuss how this drawback could be tackled and introduce a new model that uses only compartment I. This leads to a linear model having some similarities to those models based only on transmission rates from infectious population at different generations (e.g. 6). Our main goal is to fit data by estimating fewer and most influential parameters without considering many other issues like the infectiousness in hospitals and death ceremonies. This in addition, allows us to have a more robust model with easily interpreted parameters that can be used for more accurate predictions. The main parameter in this model is the average infectious period τ 2 (time from onset to hospitalization) that defines the dynamics of infectious population. This parameter can also be considered as a control parameter in the development of control models dealing with the spread of infection.
We calculate the basic reproduction numbers R 0 for each country (Guinea, Sierra Leone and Liberia) as well as the total Ebola data worldwide. We also provide predictions corresponding to different scenarios by considering different values for τ 2 for future time periods.

Methods
We use the notation I a (t) for the number of "active" infectious population at time t; it mainly represents the total number of infectious population that are not yet hospitalized. C(t) and D(t) are the cumulative number of infected cases and deaths, respectively. The population density of a country is denoted by D. This is used in the definition of the infection force of the disease with coefficient β. Moreover, μ stands for the natural death rate of the population, α for the death rate due to disease, and τ 1 for the average latent period (in days) that infected individuals become infectious and τ 2 for the average infectious period (in days).
The main equations of our model are as follows (see Appendix for details): Here ω is a gamma (cumulative) distribution function (with p.d.f -ω p ) for deaths due to disease 8 ; for the values of the parameters see Appendix. We note that there are only three parameters that need to be estimated to fit data for cumulative number of infected and death cases. These parameters are: • αthe death rate due to disease; • β -the coefficient of the force of infection; • τ 2 -the average infectious period.
Here α and β are continuous variables, τ 2 is a discrete variable with integer values (days).
Basic Reproduction Number -R 0 . We calculate the basic reproduction number by considering the stationary states in (1) as follows:

Amendments from Version 1
This second version has two new references 5 and 10. Moreover, in Section "Methods" the second paragraph in 'The effective reproduction numbers' subsection has been replaced by two new paragraphs.

REVISED
Since the natural death rate μ is close to zero; that is, 1μ ≈ 1, from (2) we have , the reproduction number R 0 ≈ βDτ 2 . This means that the reproduction number depends almost linearly on τ 2 .
The effective reproduction numbers -R k , k ≥ 1. The effective reproduction numbers R k are considered on several consecutive time intervals Δ k = [t k , t k+1 ], k = 1, 2, …, with corresponding values of τ 2 . They are calculated by the same formula as R 0 .
Here we make a reasonable assumption that the transmission rate βD describes the interaction of population (that is, in some sense, related to the local conditions and the life style) and should become relatively stable in the long term for a particular country. Then, the efforts in preventing the spread of infection are mainly observed in the change (decrease) in the value of τ 2 .
The infection rate is the key factor defining the dynamics of infectious population. The study 5 shows that the infection rate is a linearly decreasing function of the total case reported. In 10 the 1995 Ebola outbreak in Congo is considered where the transmission coefficient β is assumed to decrease exponentially due to control interventions. In our model the transmission of infection depends on two parameters β and τ 2 . Here β (having slightly a different meaning to what was used in 10) is constant and the dynamics of infectious populations depends on the change in τ 2 .
Therefore, to calculate the effective reproduction numbers, we fit data and find the optimal values for α and β, that are constant over the whole period, and optimal values τ k 2 on each interval Δ k . Then R k is calculated by formula (2) setting τ 2 = τ k 2 .
The sequence of optimal values τ 1 2 , τ 2 2 , …, is considered as a method to describe the effectiveness of measures applied for preventing the spread of infection. This sequence very much defines the reproduction numbers on each consecutive time interval and therefore the dynamics of the infected population. It also allows us to consider future scenarios in terms of possible average infectious periods (i.e. times from onset to hospitalization).

Results and discussion
Data were retrieved from the WHO website (http://www.who. int/csr/disease/ebola/situationreports/en/) for the cumulative numbers of clinical cases (confirmed, probable and suspected) collected till 11 November 2014. In all numerical experiments, the second half of the available data for each country is used for First we consider the whole period of infection in each country and find the best fit in terms of three variables α, β and τ 2 (Problem (DF 1 ) in Appendix). The results are presented in Table 1. Although from Figure 1 it can be observed that the best fit for Guinea is not as good as for the other cases, these results provide some estimate for the reproduction number R 0 for a whole period of infection till 11-Nov-2014. In all cases (except Guinea), R 0 is around 1.20 and for Guinea -1.09. We note that the dynamics of infected population is much more complicated (especially in Guinea) which suggests that the reproduction number has been changing since the start of Ebola-2014 in almost all countries. This fact has been studied in 8 in terms of the instantaneous reproduction number over a 4-week sliding windows for each country (see also the next section for different values for τ 2 ).
The effective reproduction numbers According to (2), the basic reproduction number is mainly determined by β and τ 2 . Since in our model parameter τ 2 takes discrete values (days) it would be interesting to study the change of this parameter over time while keeping β the same for the whole period. This approach makes it possible to consider different scenarios for future developments regarding the change in this parameter and to provide corresponding predictions.
We consider three consequent time intervals Δ k = [t k , t k+1 ] (k = 1, 2, 3) for each country and find optimal values α, β and τ k 2 (k = 1, 2, 3) (Problem (DF 2 ) in Appendix). The results are presented in Table 2.  Table 2. Results of best fits: the (effective) reproduction numbers R k and average infectious period τ k 2 (in days) for different intervals Δ k , k = 1, 2, 3. The optimal values for α and β are also provided; they are constant for a whole period.

Liberia
In all cases the effective reproduction number is still greater than 1.
In Liberia it shows a decrease from 1.45 to 0.99 and this can be seen in quite a noticeable decrease in the number of cumulative infected cases ( Figure 2).

Future scenarios
We consider only the cumulative number of infected population worldwide. From Table 2 it can be observed that the number τ 2 has changed as 4, 5 and 4 from 22-March to 11-Nov. We keep this initial best fit (the optimal values of parameters (World) are in Table 2) and consider different scenarios for possible changes of this parameter in the future while keeping the values of α and β unchanged.
The future time intervals are designed as follows: the first interval Δ 1 is 12/Nov/2014-31/Dec/2014, followed by each next month Δ 2 -Δ 4 , and the last interval Δ 5 starts from 1-Apr-2015. The results are presented in Table 3. The reproduction numbers are 0.778 (for τ 2 = 3), 1.035 (for τ 2 = 4) and 1.284 (for τ 2 = 5).  Table 2). The lines represent the best fits, red and black circles represent the data. In the best scenario in Table 3 it is assumed that the current trend stays stable (τ 2 = 4) and a 25 percent decrease in the hospitalization time starts from 1-Jan-2015, then the epidemic may continue till Apr-2015 with the total number of infected cases reaching 31,000.

Model
The idea behind the model introduced in this paper is related to the SIR model with time delay. Since we are going to implement it on available daily data, a discrete version of this model is considered with the step-size one day. Moreover, since the "recovered" population is not our focus, we will only consider equations related to susceptible (S) and infectious (I) individuals. The most commonly used SIR model 1 in the literature is provided below (see, for example, 8 ): Here λ is the recruitment of the population; μ is the natural death rate of the population; α is the death rate due to disease; γ is the recovery rate; and τ 1 is the latent period that infected individual becomes infectious.
The fraction (1-μ) τ 1 represents the survival rate of population over the period of [0, τ 1 ] (in continuous-time case it is equivalent to e -μτ 1 ).
Below we examine this model in detail and develop an improved model.

Susceptible individuals. Equation (3) describes the dynamics of susceptible individuals S(t).
This equation "keeps" the number of infectious individuals I(t) bounded. For example, when the basic reproduction number is greater than 1, there exists 8 an endemic equilibrium (S*, I*) and S(t) → S*, I(t) → I* as t → ∞. In the case when the birth rate is zero (λ = 0) the relation I(t) → 0 suggests that S(t) → 0.
Thus according to this model the epidemic ends because the number of susceptible individuals S(t) decreases over time and the effective reproduction number (as a function of time) becomes less than 1 at some stage; that is, the number of newly infected population F(S(t), I(t)) decreases thanks to the "enough" decrease in the number of susceptible individuals (while I(t) still increases). This might be applicable to epidemics in early 1900s but it is definitely not applicable to recent ones.
This issue significantly restricts the application of the SIR model for the study of the current Ebola virus epidemic. Below we consider 3 possibilities to overcome this difficulty.

1.
The simplest way would be to use a "relatively small" number S(0) for a possible number of susceptible individuals that may become infected. This approach has been implemented in 1 where the total population size in each country (Guinea, Sierra Leone and Liberia) was assumed to be 10 6 individuals.
2. An interesting (and most reliable in our opinion) approach would be considering "relatively small" number of population S(0) as a variable that needs to be estimated. We have implemented this approach and the results show that currently available curve/data is not "long" enough to uniquely determine S(0); that is, almost the same quality of data fit can be achieved for different numbers S(0) (we have tried 50,000, 100,000 and 200,000) leading to different numbers of "stabilized" cumulative infected cases and infection periods. Taking this factor into account, we do not consider this approach, however we note that it might be quite possible soon with the availability of more data points.

3.
In this paper we adopt another approach by neglecting the compartment S completely and leaving just the compartment I. The force of infection F(S, I) in this case is the main factor to be determined. We take this function in the form where D is the population density of a particular country. In a more general setting, one would involve functions nonlinear in I (like F(S, I) = βDI ξ with ξ ≤ 1). However, since the infectious population I is a very small portion of the total population, function F can be assumed linear at least in early stages of epidemics. In this case equation (4) can be represented in the form The major drawback of this model is that I may growth infinitely if the reproduction number is greater than 1; in this model there is no variable/parameter (like S(t) in SIR) that could force I to decrease. On the other hand we believe that it can better describe the

Author contributions
All authors contributed equally to the writing and subsequent refinement of the manuscript.

Competing interests
No competing interests were disclosed.

Grant information
This work was funded by the Collaborative Research Network (CRN) of the Federation University of Australia and the University of Melbourne.

Data fitting: Optimization Problems
Main parameters. We have formulated the dynamical system (7),(8), (9). Given the observed cumulative number of infected cases -C 0 (t) and cumulative number of death cases -D 0 (t), the parameters of the systems can be estimated by the best fit. Before formulating this problem we discuss the parameters to be estimated.
The density D and the natural death rate of the populationμ is available for each country. We set D = 41, 80, 36 and 50 for Guinea, Sierra Leone, Liberia and the worldwide data, respectively. The natural death rate is around 10 deaths for 1000 population per year (1 percent yearly) for all the three countries. Thus in all numerical implementations, the daily rate μ is set to be 0.01/365 = 0.0000274. It is reasonable to have the same average latency period -τ 1 for infected individual to become infectious. The previous studies (e.g. 7) suggest that it is between 2-21 days with the mean of 11.4 days. Our numerical experiments show that the values between 6-8 provide better results; we set τ 1 = 6 in all cases.
Parameters of the gamma distribution can be taken from 7 . We set with mean value 7.5. Note that the choice of values a and b within reasonable intervals, by keeping the mean value the same, has almost no effect on the quality of data fitting. Taking into account this fact, the parameters of the gamma distribution are chosen as in (10). In all the calculations, we set n=35 (days) in (9) (note that for large i function values ω(i) are almost zero).
Initial values I a (t), t ≤ 1, for the equation are chosen in the form I a (t) = ξC 0 (1), for all t ≤ 1.
where C 0 (1) is the actual cumulative infectious. Numerical experiments show that the choice of ξ in the interval 0.4-0.7 has very little impact on the quality of data fitting. We set ξ = 0.4 in all cases except Liberia for which the value 0.7 was better. Accordingly, we do not consider ξ as a variable and set the above mentioned values for each country/data. Therefore, the main parameters that define the dynamics of Ebola epidemics in different countries are α -the death rate due to disease, β -the coefficient of the force of infection and τ 2 -the average infectious period.
Data fitting. We consider data collected till 11 November 2014 for the cumulative number of infectious (confirmed, probable and suspected) and death individuals; they will be denoted by C 0 (t) and D 0 (t), behavior of an infected population in "small" time intervals and provide more accurate reproduction numbers.
Active infectious population. Now we discuss the infectious population and equation (6) in more detail. We call "active infectious populations" at time t the infected population that are infectious at that time but are not hospitalized yet. Denote by I a (t) the number of active infectious populations at time t. We will rewrite equation (6) in terms of I a .
Denote by τ 2 the average infectious period; that is, time from onset (τ 1 ) to hospitalization. Then, an infected person is assumed to be active infectious during the period [τ 1 , τ 1 + τ 2 ]. Since τ 2 is relatively small, we can assume that none is recovering during that period. This means that the rate of recovery γ in (6) is no longer needed for I a (t).
Cumulative number of infected cases. The first term (1-μ) τ 1β DI a (t-τ 1 ) in (7) describes the number of new cases at time t. The cumulative number of infectious cases at (t+1) will be denoted by C(t+1). It can be calculated as Cumulative number of deaths. To calculate the cumulative number of deaths at time t, we consider all infectious cases (hospitalized or not) in the interval [t-τ 1 , t-n] where n is a sufficiently large number. In particular we assume that death may occur after the onset. As mentioned above, the distribution of death is described by a gamma distribution function ω with its p.d.f -ω p . Then, the cumulative number of deaths due to disease can be calculated as According to this formula, we fit the second half of given data in order to decrease the choice of initial values I a (t), t ≤ 1, defined by (11).
Basic reproduction number R 0 To calculate the basic reproduction number, the above model is considered on the whole interval. The corresponding data fitting problem is: Problem (DF 1 ): Given data C 0 (t i ) and D 0 (t i ), i ≥ 1, and time interval [1, T 2 ]: