SARS-CoV-2 epidemic in India: epidemiological features and in

After SARS-CoV-2 set foot in India, the Government took a Background number of steps to limit the spread of the virus in the country. This included restricted testing, isolation, contact tracing and quarantine, and enforcement of a nation-wide lockdown starting 25 March 2020. The objectives of this study were to i) describe the age,gender distribution and mortality among COVID-19 patients identified till 14 April 2020 and predict the range of contact rate; and ii) predict the number of active COVID-19 patients after 40 days of lockdown. : We used a cross-sectional descriptive design for first objective Methods and a susceptible-infected-removed model for   predictions. We in silico collected data from government-controlled and crowdsourced websites. : Studying age and gender parameters of 1161 Indian COVID-19 Results patients, the median age was 38 years (IQR, 27-52) with 20-39 year-old males being the most affected group. The number of affected patients were 854 (73.6%) men and 307 (26.4%) women. If the current contact rate continues (0.25-27), India may have 110460 to 220575 infected persons at the end of 40 days lockdown. : The disease is majorly affecting a younger age group in India. Conclusion Interventions have been helpful in preventing the worst-case scenario in India, but will be unable to prevent the spike in number of cases.


Introduction
Since December 2019, SARS-CoV-2, a novel virus of the Coronaviridae family of RNA viruses, has caused a widespread outbreak of the disease, now known as COVID-19, and was declared to be a pandemic by the World Health Organization (WHO) on March 11 2020 [1][2][3] . Human to human transmission occurs primarily through close-contact with the infected person, through fomites in the immediate surroundings of the infected person and via droplets of respiratory secretion, although there is limited evidence pointing to a possibility of airborne and faeco-oral transmission as well 4-7 . According to a few case studies, transmission may also occur via viral shedding in "pre-symptomatic" individuals during the incubation period 8,9 .
The incubation period for COVID-19 is thought to be within 14 days of exposure, with a median incubation period of 4-5 days 4,10,11 . Globally, the median age of patients affected by COVID-19 is 47 years with the most common clinical findings being fever and cough 4,12 . About 18% of patients develop shortness of breath 4 . Severe disease (including dyspnea defined as a respiratory rate of 30/min, blood oxygen saturation of 93%, a partial pressure of arterial oxygen to fraction of inspired oxygen ratio<300, and/or lung infil-trates>50% within 24 to 48 hours) has been reported to occur in 14% of elderly patients with pre-existing chronic diseases 13  Since the beginning of the outbreak in India, there have been a number of interventions done by both state level and central level governments ( Figure 1). These included restricting the inflow of international passengers, self-quarantine measures, directives on testing and management strategies, and in-country travel restrictions. During this period, the testing strategy also changed from being focused on foreign travel and contact history initially to including all individuals with severe acute respiratory illness and symptomatic health care providers Gradually there were more social distancing measures in March, which were then followed by state-wise lockdowns, ultimately culminating in a nation-wide 21 day lockdown from 25 March 2020. As of 14th April, the lockdown had been extended until 3rd May 2020 (40 days).
In addition to the epidemiological parameters of COVID-19 in India, the other important question in the current scenario concerns the mathematical parameters of the initial spread of COVID-19 in India, and what are the epidemiological aspects that can predict this spread. We acknowledge there are certain difficulties in making precise calculations due to the rapidly changing dynamicity of the epidemic in the early stages, limited availability of data in the public domain, and limited testing capacity. Nonetheless, mathematical models with reasonable assumptions based on available information can help in analysis of the currently available data to provide important insights for guiding public health interventions 18 . The most basic of these models is the susceptible-infected-removed (SIR) model 19,20 , which we have used in the current Indian scenario to determine the range in which contact rate β lies, calculate the range of the current reproduction number, Rt, and predict the number of COVID-19 infections at the end of the 40 day lock down period.

Epidemiological descriptive analysis of patients
This was a cross-sectional descriptive analysis of the laboratory confirmed COVID-19 patient-wise data collected from a crowdsourced database (https://www.covid19india.org; includes data from state government and central government agencies). The data was taken for cases confirmed up to 14 April 2020, 7:20 PM Indian standard time or earlier.
Data analysis was done in regard to the age distribution, status of patients and gender distribution using Microsoft Office Excel 2007 (Microsoft, Redmond, WA, USA). Fatality rate in any category was found by dividing the number of deaths in the category by the number of affected individuals of that category.

Mathematical modelling
The SIR model 19,20 divides the (fixed) population of N individuals into three "compartments", which vary as a function of time (for purpose of this study, we have not included vital dynamics like birth and death rate): • S(t) -S(t) are those susceptible but not yet infected with the disease (in a novel disease like COVID-19, the entire population is assumed to be susceptible as there is no pre-existing immunity); • I(t) -I(t) is the number of infectious individuals; • R(t) -R(t) are those individuals who have been removed from the infected population (includes those who have recovered from the disease and also deaths).
The SIR model describes the change in the population within each of these compartments in terms of two parameters, β and γ ( Figure 2). β describes the effective contact rate of the disease: β and γ are useful in the SIR model using the following differential equations: Comparing the equation from the SIR model and the general equation of exponential growth, where r is the growth rate of the exponential curve.
We have used the new cases of COVID-19 daily data available from 24th March to 13th April 2020, to estimate the two parameters (assuming a lag period of 11 days), β and R t (time varying reproductive number) with the help of SIR model (www.statista.com used for extraction of variables). We assumed that the recovery rate γ would remain constant for the population. The removal rate followed a normal distribution, and the mean was calculated with the data available from 1 March to April 4 2020 21 . Since the effect of interventions would reflect in the contact rate β, we then took the value of γ to be constant equal to the mean (0.103) and ran the SIR model multiple times by varying the value of β, and comparing the trends with the real data.
We plotted the trend line for the real data using Microsoft Office Excel 2007, and used the equation of the curve to find out the trend of β in India in the present-day scenario by comparing it to equation 4. We then used the present trends of β to estimate the expected number of infections at the end of 40 days lockdown.

Ethics
Anonymized data available data in the public domain was used for analysis. Ethical approval was not required.

Mathematical modelling
After running multiple simulations using the SIR model, all assuming different values of β we found that the value of β in the current Indian scenario calculated from the trendline of real data lies around 0.272, which is also visible in the graph of real-time active cases lying between β=0.25 and β=0.28 curves (Figure 3).   Also visible in the graph, is the real data line shifting from 0.24 to 0.28 from day 8. Also, as R t = β/γ, the value of the reproductive number is found to be varying between 2.5-2.8 in the Indian scenario until 13th April 2020. Considering the present trend of β, the number of infected persons after the 40 day lockdown period may vary from 110,460 to 220,575.

Discussion
To the best of our knowledge, this is the first study to predict the number of COVID-19 infected persons in India after the 40 days of lockdown. We have used crowd sourced data due to lack of availability of official data. We could not include all the cases in the epidemiological analysis due to unavailability of the demographic details of all the patients. Limitations notwithstanding, there were three key findings.
First, the median age of affected individuals is lower than reported in other countries. In study cohorts of Wuhan, the median age of affected patients ranges from 49-56 years 4,22,23 . In previous studies on COVID-19, it has been established that risk increases with age and comorbidities 13,22,23 . The broad based nature of India's population pyramid means there are more people in the younger age group and very few people in the ≥80 years age group. Another reason is that as there is limited evidence of community spread in the epidemic in India so far; it has been reported to be driven by imported cases who mostly belong to the younger age group. In 2018, Indian residents between 35 and 49 years of age took the most holidays outside the nation 24 . On normalizing the percentage of patients in each age group with the corresponding percentage representation of the population, we observed that the highest number of patients is in the 60-79 years age group category. Interestingly, according to this analysis, men in the 60-79 years age group are affected more than the ≥80 years age group. This is something that has not been reported until now and it has to be seen whether this changes as the number of cases in India grows. This is new to the literature of COVID-19 and needs to be studied further.
Second, we found a case fatality rate of 2.5%, which is lower than countries like Italy 25 . The reason for this could be that Italy is in the later phases of the epidemic with widespread community transmission and high mortality due to overburdening of the health system. This has also led to focussing testing on severe patients and possible advise to suspected patients with mild symptoms to stay at home. In China, the case fatality rate was found as 2.3%, 14.8% in the above 80 years population, and 8.0% in the 70-79 years population 13 . Our estimate gives a slightly higher value, which may be due a smaller sample size or can also be because mild cases of COVID-19 have so far been missed due to restricted testing strategy during the initial stage.
Third, from the SIR model, the values of β and the trend for R t shows that the interventions which were put in place by the Indian government starting from mid-March were partially effective, preventing the scenario where R t can reach even more than four 26 . The increase in β from day eight is probably due to the identification of a cluster in Delhi 27 . Additionally, Indian Council of Medical Research recently updated the testing strategy, which might have increased the detection of the positive cases significantly. According to a study in the early days of the epidemic, Wuhan city and Hubei province reported R t between 1.85 and 4.46, which aligns with our study findings 26 . All over China, the R t varied from 1.23 to 5.77. South Korea, which has high population density like India, had a decreasing trend of R t from 9.72 on 20 February to 1.50 on 7 March. This indicates that the interventions have been helpful in preventing the worst case scenario in India but is unable to prevent the spike in number of cases 27,28 .
At the end of the 40 days lockdown, India is likely to have a significant number of infected persons (110,460 to 220,575) that would include both symptomatic and asymptomatic individuals. It may also enter the exponential growth phase and it would become very difficult to contain at that stage. The situation can still be controlled if R t can be brought down close to one. This indicates the need for more effective strategies and ensuring optimum testing to avoid underestimation of danger.
Future research is needed to support the current findings that unlike other countries, in India, COVID-19 is mostly affecting the 20-39 years age group since our data analysis was restricted to individuals in whom data was available. The SIR model used does not account for age-structure or comorbidities. Hence we are addressing more dynamicity and realness by taking into account multiple compartments with robust mathematical assumptions in our future models to study the current pandemic. Comprehensive studies based on the clinical manifestations and laboratory parameters including genomic sequencing to detect mutations of the virus also needs to be done for the Indian population to see if the clinical features and the viral strains differ from other populations.
To conclude, the COVID-19 epidemic in India is more affecting younger age groups as compared to other countries. Our mathematical model predicts that India will have a significant number of infected persons after the 40 days lockdown. Hence, strict social distancing, and optimum testing followed by isolation and quarantine are vital elements to control the COVID-19 epidemic.

Is the study design appropriate and is the work technically sound? Yes
Are sufficient details of methods and analysis provided to allow replication by others? Yes

If applicable, is the statistical analysis and its interpretation appropriate? Partly
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Yes
No competing interests were disclosed.

Competing Interests:
Reviewer Expertise: Public health and Epidemiology of infectious diseases I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias You can publish traditional articles, null/negative results, case reports, data notes and more The peer review process is transparent and collaborative Your article is indexed in PubMed after passing peer review Dedicated customer support at every stage For pre-submission enquiries, contact research@f1000.com