Tracking and forecasting milepost moments of the epidemic in the early-outbreak: framework and applications to the COVID-19

Background: The outbreak of the 2019 novel coronavirus (COVID-19) has attracted global attention. In the early stage of the outbreak, the most important question concerns some meaningful milepost moments, including the time when the number of daily confirmed cases decreases, the time when the number of daily confirmed cases becomes smaller than that of the daily removed (recovered and death), and the time when the number of daily confirmed cases and patients treated in hospital, which can be called “active cases”, becomes zero. Unfortunately, it is extremely difficult to make right and precise prediction due to the limited amount of available data at the early stage of the outbreak. To address it, in this paper, we propose a flexible framework incorporating the effectiveness of the government control to forecast the whole process of a new unknown infectious disease in its early-outbreak. Methods: We first establish the iconic indicators to characterize the extent of epidemic spread. Then we develop the tracking and forecasting procedure with mild and reasonable assumptions. Finally we apply it to analyze and evaluate the COVID-19 outbreak using the public available data for mainland China beyond Hubei Province from the China Centers for Disease Control (CDC) during the period of Jan 29th, 2020, to Feb 29th, 2020, which shows the effectiveness of the proposed procedure. Results: Forecasting results indicate that the number of newly confirmed cases will become zero in the mid-early March, and the number of patients treated in the hospital will become zero between mid-March and mid-April in mainland China beyond Hubei Province. Conclusions: The framework proposed in this paper can help people get a general understanding of the epidemic trends in countries where COVID-19 are raging as well as any other outbreaks of new and unknown infectious diseases in the future.


Introduction
The atypical pneumonia caused by the 2019 novel coronavirus , which is a highly infectious human disease, was first reported in Dec 31st, 2019 in Wuhan, the capital of Hubei Province in China (WHO et al., 2020). To mitigate the effect of epidemics spreading across China and other countries, Wuhan was temporarily shut-down from Jan 23th, 2020, which has proved to be efficient in the timely stopping the spread of the coronavirus (Chinazzi et al., 2020). However, due to the "Spring Festival travel rush", there was still a rising number of confirmed cases in China in the following two months, which has caused great strain on medical resources (Li et al., 2020).
The questions that draw the most concerns are how COVID-19 will spread, and when it will end. People were always asking when the number of the daily confirmed cases will become smaller than the previous days, and when the daily confirmed cases will become smaller than that of the removed (recovered and death). These are not only of highly important for the general public, but also for government, who plays an important role in controlling the disease within a short period as much as possible. Since the decline of the number of newly confirmed cases and the number of active cases imply the alleviation of epidemic, the emergence of these turning points convey useful information for decision making on medical resources allocation and isolation policies in the post-stage of the epidemic.
Meanwhile, it is also important to predict when will the number of daily confirmed cases become "zero", as well as when the number of active cases will be "zero". The latter indicates the end of the epidemic. These two "zero points" can also help the government to consider loosening population migration restriction in cities. Additionally, authorities in economic departments can use the forecasting results to assess the impact of the epidemic on the economy in advance, and plan for the restoration of normal production and living order.
There have been various publications on COVID-19 from different perspectives, i.e., the origin of COVID-19, the clinical features as well as epidemic transmission characteristics. Specifically, for the origin of the virus, Fan et al. (2019) and Luk et al. (2019) pointed out that COVID-19 is an infectious disease caused by a virus closely related to SARS-CoV, while others believed that the COVID-19 virus was originally derived from wild animals (Benvenuto et al., 2020;Huang et al., 2020). For the epidemic transmission characteristics, Holshue et al. (2020) and Hui et al. (2020) found that the virus can be transmitted from person to person and that it has a high interpersonal transmission rate. Zhao et al. (2020) investigated the preliminary estimation of the basic reproduction number R 0 , which ranged from 2.24(95%CI : 1.96 − 2.55) to 3.58(95%CI : 2.89 − 4.39) in the early outbreak, while Prasse et al. (2020) estimated it around 2.2, Tang et al. (2020) applied likelihoodbased and model-based methods to the analysis of early reported cases, and the results showed that R 0 could be is as high as 6.47. Zhou et al. (2020) used the SEIR model and stated that the range of R 0 of COVID-19 is 2.8-3.3, indicating that the early pathogenic transmission capacity of COVID-19 is close to or slightly higher than SARS. Other studies related to R 0 are Anastassopoulou et al. (2020); Zhang et al. (2020) and referenced therein. Unfortunately, each of these models may result in different estimations of R 0 , which may cause any predictions based on R 0 to be unstable.
However, there are some obvious shortcomings of forecasting methods based on epidemic models in terms of outbreak prediction. For example, the SEIR model is a mathematical method relying on an assumption of epidemiological parameters for disease progression, which are absent for a novel pathogen. For instance, the basic infection number R 0 , the daily recovery rate, the characteristics of the disease itself (such as the infection rate and the conversion rate of the latent to the infected), the daily exposure rate of the latent and infected, and their initial population infection status (total population, infected, the initial value of the latent, the susceptible, the healer, etc.) and many other key parameters need to be set. For infectious diseases that have already appeared in the past, or those who have a large amount of data, it is not difficult to obtain these parameters. However, for unknown, sudden and early infectious diseases, obtaining these parameters is full of difficulties, which leads to a great uncertainty and limitations in the prediction of the epidemic situation using the SEIR model.
Moreover, there exist many challenges for the prediction of a new epidemic situation similar to COVID-19. First, little prior knowledge that can be refered to or analogized for a brand new epidemic; secondly, the existence of government management will make the development of the epidemic completely different from that under free development, thus how to incorporate the influence of government measures into the fitting process of parameters and build a statistical model from this needs to be considered; thirdly, in the early-outbreak the initial data often fluctuates violently and the data quality is low, thus many commonly used parameter estimation methods are not applicable anymore; furthermore, the amount of data in the early stage is too small, making it difficult to directly rely on the inertia of the data to make forward prediction. In summary, in the early stages of a brand new epidemic, how to use some low-quality and small data sets to make basic and relatively accurate forecast judgements for the entire process of the epidemic, is a long-term pain point.
To cope with these challenges, we propose a simple and effective framework incorporating the effectiveness of the government control to forecast the whole process of a new unknown infectious disease in its early-outbreak, from which we emphasis the prediction of meaningful milepost moments. Specifically, we first propose a series of iconic indicators to characterize the extent of epidemic spread, and describe four periods of the whole process corresponding to the four meaningful milepost moments: two turning points and two "zero" points; then we develop the proposed procedure with mild and reasonable assumptions, specfically without relying on an assumption of epidemiological parameters for disease progression. Finally we apply it to analyze and evaluate COVID-19 using publicly available data from mainland China beyond Hubei Province from the China CDC during the period of Jan 29th, 2020, to Feb 29th, 2020, which shows the effectiveness of the proposed procedure.
From the empirical study, we can suggest that the proposed method may cast a flexible framework and perspective for early prediction of a sudden and unknown new infectious disease with effective government control. Specifically, in the early stage of the epidemic when some regular information is initially displayed, the proposed method can be used to predict the process of epidemic development and to judge which stage of development the situation is at, when the peak will be reached, and when the turning point will appear. Moreover, by continuously accumulating data and updating the model during the development of the epidemic, we can also predict when the epidemic will basically end. Finally, the proposed method enjoys great generalizability, which can be used to understand the epidemiological trend of COVID-19 spread in other counties, which will provide useful guidance for fighting against it.
The reminder of this paper is organized as follows. In Section 2, we proposed the main methodology, where we defined the iconic indicators to characterize the extent of epidemic spread in Section 2.1, yielding four periods of the whole process corresponding to the four meaningful milepost moments: two turning points and two "zero" points in Section 2.2, then Section 2.3 presents the proposed procedure with mild and reasonable assumptions. Then we applied the proposed method to the COVID-19 using the public available data in mainland China beyond Hubei Province from the China CDC during the period of Jan 29th, 2020, to Feb 29th, 2020, and describe the trend of the COVID-19 spread in detail in Section 3. Some conclusions and discussions are finally given in Section 4.

Methods
The data we used are provided by China CDC via public data sources, in which the cumulative confirmed cases up to the given day t, the daily confirmed cases at day t, the daily recovered ones and the daily deaths at day t are included. All the data analysis results are done with R software, version 3.6.0 and higher is recommended. The main code for the implementation of the proposed procedure as well as the data and its full description are available from Github (See data availability for more detail (YuanchenZhu2020, 2020).
In order to assess and predict the epidemic, we first define a set of necessary indicators that can reflect the status of disease contagion. We then divide the cycle of the epidemic into four stages, which are divided by the turning points of the proposed indicators. Finally, we propose a computational framework to predict the turning points.

The iconic indicators to characterize a epidemic
It is obvious that the contagion process of an unknown virus in different regions would be diverse with respect to the number of patients and the growth pattern of the epidemic, because of population density, population mobility, public health conditions, as well as disease prevention and control measures. Therefore, we first constructed a set of indicators to monitor the essential laws of the development of the disease.
There are several requirements for the monitoring indicators. Firstly, as the number of patients can vary greatly across regions, the scale of the data should be eliminated so that the analysis methods and results are comparable. Secondly, they should well reflect the general laws and characteristics of the epidemic process as well as accurately and coherently describe the entire process of the epidemic from the beginning to the end. Particularly, they should be able to answer the question of when the turning point of the epidemic would appear. Thirdly, they should be as simple and convenient as possible so that it can be applied with publicly available data. Last but not least, the indicators should have clear meaning and be easily interpreted.
Following the above, we first adopt three basic indicators that are published daily by the provincial and municipal governments of China. That is, for time t, the daily confirmed cases E t , the daily recovered cases O t , and the daily deaths D t . Then we define a few monitoring indicators to characterize the epidemic stages, that is the number of active cases N t , the daily infection rate K t and the daily removed (the sum of recovered and deaths) rate I t , which are defined as follows.
• The number of active cases N t is defined as the cumulative confirmed cases with recovered ones and deaths removed up to t, that is Note that N t is essential for epidemic investigation, since it reflects the size of local patients and the pressure on the medical system.
• The daily infection rate K t is defined as the ratio of the daily confirmed cases at time t and the number of active cases at time t − 1, i.e.
Obviously, K t reflects the rate at which patients enter the treatment system. It is influenced by many factors, including the property of the infectious disease, the average immune capacity of the population, population density, climate condition, public health conditions, public health awareness, the awareness of self-prevention of diseases and the efforts of epidemic prevention and control.
• Similarly, the daily removed rate I t is defined as the ratio of the daily removed cases at time t and the number of active cases at time t − 1, i.e. where I t reflects the rate at which patients leave the medical system, that is, the rate at which the pressure on medical resource is eased.
Using the above indicators, we further define R t as the outbreak status on day t as follow: where N 0 denotes the initial number of active cases at the beginning of the outbreak. In particular, when the daily infection rate and removed rate are relatively stable, denoted as K and I respectively, we have the constant epidemic status index R = 1 + K − I. Then (1) can be written as: which shows that the epidemic situation is in the form of an exponential curve. And the epidemic status indicator R can well reflect the rate of expansion or convergence of the population with infectious capacity.

Four stages of an epidemic
In this section, we will describe the whole process of a epidemic under the assumption that the government has implemented effective control measures, which can be divided into four stages, i.e. "outbreak period", "controlled period", "mitigation period" and "convergence period" successively. And we will quantify the iconic features for each stage, which corresponds to the two turning points and two "zero" points, respectively.

Stage 1: Outbreak Period
In the initial stage of an epidemic outbreak, there is delay of social response due to the limited knowledge of the epidemic, and the power of contagion prevention and control is inevitably not enough. Thus the daily infection rate K t would be high. At the same time, the recovery process in the initial stage is relatively long, and the number of severe patients is small, leading the daily removed rate I t to be close to "zero". Therefore, the outbreak status indicator R t during this period is usually much larger than 1, that is: It can be seen that, during the outbreak period, the number of newly diagnosed patients increases sharply, and the number of active cases will increase dramatically correspondingly, which will pose a great burden to medical institutions, especially for hospitals.
As the epidemic exacerbates, if the government begins to intervene through a series of emergency measures, where a disease prevention and control system is quickly established, the daily infection rate K t will significantly decrease. Usually, the new daily confirmed cases will begin to decline as well. During the epidemic prevention and control process, once the situation improves, we will see the emergence of the first turning point denoted as T 1 . Then after the data T 1 , the newly diagnosed patients E t changes from a rapid rise in the outbreak period to a descending channel (E t < E t−1 ). In summary, the emergence of the first turning point T 1 indicates that the disease control measures have begun to work, which implies the end of the "Outbreak Period".

Stage 2: Controlled Period
The emergence of the first turning point is a very positive signal, indicating that the public health management measures have obviously taken effect and the epidemic has entered the "controlled period". However, due to the fact that the completion rate I t at this stage is still relatively low, the number of active cases will continue to increase. The controlled period will continue until the second turning point T 2 appears, that is, active cases N t reaches the peak and starts to decline. This is because the completion rate increase so significantly that K t = I t is fulfilled after a long period of treatment in the previous stage. When the completion rate I t surpasses infection rate K t , the number of patients treated in the hospital begins to decline from peak.

Stage 3: Mitigation Period
The sign of the end of the controlled period is K t = I t . Thereafter, K t will continue to fall with the rise of I t , which gives This indicates that the daily completion rate I t will start to be greater than the daily infection rate K t , that is, the value of the outbreak status indicator R t becomes less than 1. The population size with infectious capacity will be reduced, and the pressure of medical resources will be significantly relieved, marking the beginning of the "mitigation period". The mitigation period will continue until the appearance of zero reported newly confirmed cases, that is, E t = 0, which we call the first "zero" point Z 1 . After the first "zero" point is reached, the intensity of prevention and control in the entire society will be relieved except for hospitals, that is, the "mitigation period" ends and the "convergence period" starts.

Stage 4: Convergence Period
The "convergence period" will end at the second "zero" point Z 2 , which means that the number of people treated in the hospital is equal to or close to "zero". After reaching the second "zero" point, the epidemic is completely over.
For clarity, we summarize the iconic features and the corresponding milepost moments of each stage in the whole process of the epidemic in Table 1.

Implementation: the proposed model
According to Section 2.2, the modeling and predicting of the epidemic need to be divided into two parts. The first part corresponds to the outbreak period, where the intervention and disease curing is not effective enough. The infection rate K t increases rapidly and the completion rate I t is small. Thus, the number of newly diagnosed patients E t increases rapidly, and the number of active cases N t increases. The pressure on medical resources will soon be overwhelmed. According to Equation (2), N t will be in an exponential growth trend without forming a convex curve, nor will the so-called two turning points or two "zero" points appear.
The second part, which is the focus of this article, is when the K t starts to decrease and I t starts to increase due to effective intervention and improved recovery level for individual patients. Only in this situation will the turning points and "zero" points T 1 , T 2 , Z 1 , Z 2 successively appear, and then the epidemic could end. Therefore, we will model the development of the epidemic under the assumption of effective intervention, then we can obtain the early prediction of two turning points and two "zero" points based on the predicting modeling of E t and N t .
Suppose that the infection rate K t and the removed rate I t change gently with a stable unitary rate of change within a time window m before time t 0 , then given m and t 0 , denote V K|(t 0 ,m) and V I|(t 0 ,m) as the unitary rate of change of K t and I t respectively, that is, For any t > t 0 , the infection rate K t and the removed rate I t can be predicted as follows: Thus, we can obtain the outbreak status R t , the number of patients in the hospital N t , and the number of newly diagnosed E t as According to the prediction process, it can be seen that the prediction results mainly depend on V K|(t 0 ,m) and V I|(t 0 ,m) , whose value is up to the selection of time window m and starting point t 0 . However, it is worth noting that the selection of m and t 0 is not arbitrary, which is suggested as in the follow assumption.  End with the number of newly diagnosed reaches peak (the first turning point) the number of active cases reaches peak (the second turning point) the number of active cases equals to 0 (the first "zero" point) the number of active cases equals to 0 (the second "zero" point) It is worth noticing that the assumption is proposed to make sure that the trend of outbreak development have already emerged and stable, which means that the outbreak have already been controlled. The assumption is an mild requirement, since when some basic condition are satisfied, such as the epidemic prevention policy is effective and steady, the unitary rate of change would be relatively stable. Our method is totally based on the assumption above, thus when any constraint listed above is not satisfied, our algorithm would be inapplicable.
In summary, here we describe details of the proposed procedure in Algorithm 1.

Algorithm 1. Main Prediction Procedure
1: Initial setting m and t 0 , which satisfying Assumption 1; 2: Compute V K and V I according to (3); Set t = t 0 + 1.
3: Prediction: updating the predicted results at time t via the forecasting value ahead of l = t−t 0 -step as follows:

4: Prediction of the milepost moments: If
If none of the above is satisfied, turn to the next step.
It is also worth noting that in practice, there are many special cases that we need to take into consideration, thus we created a relatively complete computing framework, which has already been implemented and made into R packages and are available from Github (See data availability for more detail (YuanchenZhu2020, 2020).
the more data we accumulate, the clearer the underlying law of the epidemic. Therefore, we can also continuously modify the iterative prediction model according to the actual data, so that the prediction of the next stage and the prediction of the long-term situation can be more accurate.

Application: Analysis of the COVID-19 in mainland China beyond Hubei Province
We apply our model to analyze and evaluate the COVID-19 using publicly available data from mainland China beyond Hubei Province from the China CDC during the period of Jan 29th, 2020, to Feb 29th, 2020. Here we first show the actual trend of the COVID-19, and then compared with the predicted ones via the proposed method. Finally, we will show the effect of m on the predicted results. All these results are implemented via R software.

The turning points and "zero" points observed
After the shutdown of most parts of Hubei province on Jan 23rd, other parts of China also immediately launched prevention and control strategies, including regional isolation, admission of all confirmed patients, isolating all suspected patients and so on. The effective implementation of these intervention policies quickly controlled the rapid spread of the epidemic in these areas. As can be seen in Figure 1, the parameter infectious rate K t , which reflects the intensity of the spread of the epidemic, has shown a significant downward trend since Jan 27th after severe fluctuations from Jan 22nd to 26th. As can be seen in Figure 1, we find out that the daily confirmed cases peaked on Jan 30th, 2020, with 761 confirmed cases and then continued to decline for two consecutive days.
However, the migration raised from people returning to work after Chinese New Year on Feb 3rd undermines the continuous decline of E t . Since Feb 2nd, the number of daily confirmed patients in mainland China beyond Hubei Province has increased for two consecutive days, where the E t on Feb 3rd has increased by 23% compared to that on Feb 2nd. It can be concluded that these fluctuations are caused by the resuming of social activities, which leads E t to continue to decline since Feb 4th. In many literature and media reports, Feb 3rd is used as the time point when the number of newly confirmed patients starts to decline. But considering the fact that the epidemic was already under control, here we still view Jan 30th as the first turning point.
After that, the second turning point T 2 , which is the time point when the number of active cases N t starts to decline, is also observed. Figure 2 shows the true curves of the daily infection rate K t , daily removed rate I t , and N t calculated based on the actual data from mainland China beyond Hubei from Jan 22th, 2020 to Mar 13th, 2020. It can be seen that the second turning point T 2 appeared on Feb 11th, with the emergence of K t < I t on that day, and the number of patients in the hospital continued to decrease since then.
As for the first "zero" point Z 1 , the definition is the time when the number of daily confirmed cases is equal to "zero", which is too strict for the real situation. Thus, in this article, we take the criteria for cancelling travel warnings developed by the WTO during SARS as a reference, and make some adjustments to the definition of the first "zero" point: the time when the daily confirmed cases E t continues to be less than 5 for 3 days is revised to be Z 1 . Then, if we exclude confirmed cases that originated from abroad, daily confirmed cases has already become less than 5 since Mar 3rd in mainland China beyond Hubei Province, thus according to our revised definition, Mar 5th is Z 1 . However, there were still 1,089 active cases on that day. Therefore, it would still take some extra time to reach the second "zero" point Z 2 .

Prediction results
Starting from Jan 29th, we use the proposed forecasting method to make real-time predictions on the two turning points T 1 and T 2 and two "zero" points Z 1 and Z 2 with window size m = 5. To clarify, the data before January 26 th fluctuates violently, with assumption unsatisfied. Only after January 27 th the data becomes stable, thus we waited 2 days to make sure the trend had emerged and began our prediction at January 29 th . The specific and predicted results are as follows.   We first conducted the proposed prediction model on Jan 29th, which indicated that the first turning point T 1 would arrive on Jan 31st, i.e., E t < E t − 1. In reality, the first turning point did arrive on Jan 30th, which is only one day away from our predicted result.
As for the second turning point, since the true T 2 occurred on Feb 11th, we summarize the frequency of the prediction results obtained with t 0 varying from Jan 29th to Feb 10th, 2020 and m = 5 in Figure 3(a). From it we can see that the prediction of the second turning point mainly concentrated in the range from Feb 9th to Feb 11th, which is consistent with the observed second turning point in reality. It is worth mentioning that we got the general information of T 2 at a very early stage: we predicted on Feb 2nd that the second turning point T 2 would arrive on Feb 11th, which is exactly the same as the second turning point that observed in reality. Since then, we have continuously tracked the rolling predictions, which have not yet changed much.
Similarly, Figure 3(b) and Figure 3(c) show the frequency of the prediction results for two "zero" points obtained with t 0 varying from Jan 29th to Feb 29th, 2020 and m = 5, respectively. Specifically, for the predicted first "zero" point Z 1 in Figure 3(b), we divide the prediction results from these days into 5 intervals, which can be seen that the prediction results of the first "zero" point Z 1 are mainly concentrated on Mar 1st to 5th, which is consistent with the actual result. There is also a "pessimistic" prediction as a result of the sudden fluctuation of data on Feb 3rd, which predicted that the first "zero" point would arrive on Mar 17th. For the predicted second "zero" point Z 2 in Figure 3(c), it can be seen that the second "zero" point will be reached from early-March to late-March. However, there is a prediction result that Z 2 will appear on May 11th, which is far away from other results. The reason for this uncommon result is that the starting point of this forecast is Jan 29th, when the epidemic situation in mainland China beyond Hubei was still in the outbreak period with E t still rising, I t very small, so the prediction result about the finish of the epidemic may not be accurate.
Furthermore, we also present the forecast results of the four milepost moments together with the trend of the cumulative number of active cases ˆt N and the cumulative number of infectious Figure 4 when the prediction starting point t 0 fixed at Jan 29th, Jan 31st, Feb 12th and Feb 26th, 2020, respectively. As can be seen from Figure 4(a), on Jan 29th, which is the very early stage of the epidemic, we predicted that the first turning point would appear on Jan 31st, which is only one day behind the actual observation. Additionally, the time of the second turning point result predicted on that day was Feb 14th, which is only 3 days away from the reality. The first "zero" and second "zero" forecast results are Mar 7th and May 11th, respectively.
Figure 4(b) shows the prediction results when the first turning point have already appeared, from which we can see that the prediction for T 2 on Jan 31st is accurately with the second turning point possible occurring on Feb 11th. Meanwhile the first "zero" point and the second "zero" point are predicted to appear around Mar 4th and Mar 23rd, respectively.
Similarly, after the arrival of the second "zero" point, Figure 4(c) shows the forecast results of the first and second "zero" points predicted on Feb 12th, which show the forecast results for Z 1 and Z 2 are on Mar 9th and Mar 25th, respectively. From the fitting results, we know that our prediction of the cumulative number of active cases N t and the total number of confirmed patients is very similar to the actual situation, so our prediction results are likely reliable. Finally, we also give a very recent (Feb 26th) forecast in Figure 4(d), which is similar to the results mentioned above.

Results with different window sizes m
Note that the number of m plays an important role in the proposed procedure, and all the results we discussed in the section 3.2 are obtained with fixed m = 5. In this section, we will illustrate the impact of different choice of m on the results, and give the empirical choice in real data analysis. Parallel to Section 3.2, here we obtain the results for the second turning point and both "zero" points via implementation of the proposed procedure with m =3, 4, and 6, respectively. We summarize all these results for the second turning point and both "zero" points in Figure 5, respectively.
From Figure 5, we can see that the highest frequency of prediction results for the second turning point occur around the period from Feb 9th to 11th for all choice of m, which means that the second turning point is most likely to occur during this period; similar results hold for the forecast of the first "zero" with the most likelihood of appearance around the early March. Both results show the limited influence of m on the results. From Figure 5(c), although the results of forecast frequency distributions for the second "zero" point with different m seem not as concentrated as those for the second turning point and the first "zero", it varies slightly, with its occurrence from mid-March to mid-April. Overall, the choice of m seems not to be a critical value for the forecasting results, and we recommend its empirical choice from 3 to 6.

Discussion and conclusion
Focusing on the four meaningful mileposts, we put forward a simple and effective framework incorporating the effectiveness of the government control to forecast the whole process of a new unknown infectious disease in its early-outbreak. Specifically, we first propose a series of iconic indicators to characterize the extent of epidemic spread, and describe four periods of the whole process corresponding to the four meaningful milepost moments: two turning points and two "zero" points; then we develop the proposed procedure with mild and reasonable assumption, especially without relying on an assumption of epidemiological parameters for disease progression.
We examine our model with COVID-19 data in mainland China beyond Hubei province, which can detect the gross process of the epidemic at its early-outbreak. Specifically, in the first predicting task that conducted on Jan 29, the predicted date when the number of newly confirmed patients E t would fall for the first time is only one day behind the observation in reality. On Feb 2nd, our model predicted that the date when the number of patients in the hospital N t reaches its peak is Feb 11th, which is consistent with the real world situation. Later, the forecasting results fluctuated but were overall stable and close to the true observation. Meanwhile, we predict that the first "zero" point Z 1 will arrive between the end of Feb and the beginning of March. And the second "zero" point Z 2 will arrive at mid-March to mid-April. We also checked the robustness of our model under different time windows and found that the selection of the time window has little effect on the prediction of turning points. As a prediction model for the task of early warning of a new epidemic, our prediction model is proved to be quite efficient.
At present, many countries around the world are overwhelmed by the COVID-19 epidemic, which calls for global efforts. While our method is able to depict and predict the trend of an epidemic at a very early stage, it can be used to predict the current COVID-19 epidemic internationally, or any other new, unknown, explosive epidemic in the future. We believe that the prediction results of this method can provide decision support for epidemic control and intervention. It is worth noting that, due to the short-term dependence of our method, our model may show poor performance for wildly fluctuating data. Thus, more data preprocessing methods like data smoothing need to be developed within our framework, in order to allow for wider use of our method.
This project contains the following underlying data: • Data of China Mainland Beyond Hubei.csv (A csv file with data collected from China CDC and four variables: the cumulative confirmed cases up to the given day t, the daily confirmed cases at day t, the daily recovered ones and the daily deaths at day t, with t from Jan 29th to Feb 29th, 2020) The procedure depends on the starting point t 0 , and on the size of the time window used, m. It is assumed that K t and I t change gently within time window m before t 0 , so that the average change rate of both K t and I t within that period are reliable values. Under those assumptions, recurrence formulas allow predicting the required time points.
The results reported from the application of the proposed methodology to the China Covid-19 data are quite good, predicting well the observed time points, and hence show that the method is well founded and useful.
The manuscript is quite well written, and is easy to follow.
The pertinence and actuality of the topic go without saying. The method appears sound to me, and is not extremely complicated, neither relying on extraordinary assumptions. manuscript.
diagnosed cases and of the daily removed cases.
The paper is clear and smoothly written. I agree for its indexing after to have kept into consideration the following remarks: Section 2. Methods Equation (1) 1) N_l in brackets (1+K_l -N_l) should to be replaced with I_l Section 2.3 2) I suggest of referring explicitly the exponential model of the infection rate K_t and of the removed rate I_t (in the window m); 3) the term "average" for V_{K|(t0,m)} (and V_{I|(t0,m)}) is confused, it would be better to refer to it as "unitary rate of change" (and "unitary rate of removed"). I think, that is more properly the "rate of change" (and the "rate of removed") associated with a unitary time step.
Algorithm 1  6) The proposed strategy has been verified on data referred to a very early stage of the epidemic and it performs short-term dependency. Given to the spread of the epidemic in world wide, maybe it can be proved on data related to the outbreak of the pandemic in another Countries.

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes Are the conclusions drawn adequately supported by the results?

Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Statistical Data Analysis; Data Stream Analysis; Functional Data Analysis; Symbolic Data Analysis; Clustering I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias • You can publish traditional articles, null/negative results, case reports, data notes and more • The peer review process is transparent and collaborative • Your article is indexed in PubMed after passing peer review • Dedicated customer support at every stage • For pre-submission enquiries, contact research@f1000.com