NaijaCovidAPI: an application programming interface for retrieval of COVID19 data from the Nigerian Center for Disease Control web platform [version 1; peer review: awaiting peer review]

Background: In this work, a COVID19 Application Programming Interface (API) was built using the Representational State Transfer (REST) API architecture and it is designed to fetch data daily from the Nigerian Center for Disease Control (NCDC) website. Methods: The API is developed using ASP.NET Core Web API framework using C# programming language and Visual Studio 2019 as the Integrated Development Environment (IDE). The application has been deployed to Microsoft Azure as the cloud hosting platform and to successfully get new data from the NCDC website using Hangfire where a job has been scheduled to run every 12:30 pm (GMT + 1) and load the fetched data into our database. Various API Endpoints are defined to interact with the system and get data as needed, data can be fetched from a single state by name, all states on a particular day or over a range of days, etc. Results: The results from the data showed that Lagos and Abuja FCT in Nigeria were the hardest-hit states in terms of Total Confirmed cases while Lagos and Edo states had the highest death causalities with 465 and 186 as of August 2020. This analysis and many more can be easily made as a result of this API we have created that warehouses all COVID19 Data as presented by the NCDC since the first contracted case on February 29, 2020. This system was tested on the BlazeMeter platform, and it had an average of 11Hits/s with a response time of 2905milliseconds. Conclusions: The extension of NaijaCovidAPI over existing COVID19 APIs for Nigeria is the access and retrieval of previous data. Our Open Peer Review Reviewer Status AWAITING PEER REVIEW Any reports and responses or comments on the article can be found at the end of the article. Page 1 of 17 F1000Research 2021, 10:1227 Last updated: 02 DEC 2021


Introduction
The Coronavirus disease 2019, known as COVID-19, was termed by the team of Taxonomy of Viruses as SARS-COV-2. 1 The first case occurred in December 2019 in a place known as WUHAN at the Hubei province in China. 2,3 Like a wildfire, it grew and reached 233 countries with millions of confirmed cases and deaths. 4 Nigeria recorded its first case on February 28, 2020, when an Italian man working in Nigeria returned from Italy to Lagos. This was confirmed by the Lagos University Teaching Hospital. 5 Since the incidence of the index case, the number of deaths, confirmed, active and discharged cases, has been on the increase in Nigeria and the rest of the world. 6 Data has become increasingly important in research to produce discoveries with the introduction of computers, which brought with them computing, simulation, and modelling. Furthermore, because most of the research on COVID-19 are funded by national governments, the data and outcomes are publicly available. [7][8][9] The World Health Organization (WHO) has designed a dashboard that gives real-time data about the global statistics of the COVID-19 outbreak. In addition, Coronatracker was developed to provide real-time news, statistics and to curb the spread of fake information about the disease. 4,10 For instance, in Nigeria, the Presidential Task Force (PTF) on COVID-19 gave daily updates from the case's onset for about 8 months. The NCDC has developed a platform to provide information in respect of the daily incidences on its website and other social media platforms. 6 These platforms were developed to provide real-time data for public health and research purposes. To analyze research datasets, Python has continued to increase its popularity amongst researchers in scientific domains where MATLAB and SPSS previously dominated. Most often, Python's Panda's library uses Comma Separated Values (CSV) file format to provide data for downloads from online platforms. 11 Another alternative is to provide an Application Programming Interface (API) for users to have access to data from web portals.
API has existed since the advent of the Internet for the exchange of data between two or more programs. It has gotten a high level of acceptance recently that "pundits" are saying we are in the API economy. This position might likely be because people are now more interconnected through different types of applications like never, and individuals are now demanding online services at a higher rate than ever. Notably, API has now made it possible for businesses to connect for enhanced capacity and profitability far beyond what has been known historically. 12 Big corporations such as Uber, Airbnb, PayPal, and a host of others depend on several APIs and developer services, with Uber paying Google the sum of $58 million for its map services. Nigeria is not left behind in this race as several start-ups have jumped into the API economy with companies such as Paystack, who developed APIs to fast-track payments using Cards (credit and debit) and direct bank transfers. 13,14 In this paper, we report the development and deployment of a RESTful API to scrape COVID-19 data from the NCDC website and expose its response in JavaScript Object Notation (JSON) format. We built a web scraper that scrapes the daily updates from the NCDC website and archives them on a cloud database. Furthermore, we developed a Web API that allows users/developers to consume the data in JSON format.
The rest of this paper contains a literature review, the methods, the results, and the conclusions.

Literature review
Application Programming Interface (API) is the intermediary between software; it states how the interaction will be as well as the data formats. 14 This section focuses on the previous works done by scholars concerning different application domains.
A RESTful API developed using JavaScript; scrapes data from the NCDC website and exposes its response in JSON. The application gives real-time data as it appears on the NCDC website with details across each state; details such as deaths, confirmed cases, cases on admission, discharged and the total number for each segregated data at the national level. 6,15 Breast cancer is the second leading cause of mortality among women worldwide; nevertheless, a tiny proportion of men are also at risk. Early detection of this condition is critical since therapy can then be started, enhancing the chance of survival. The authors created an application that can assist health professionals in the prediction and complement the detection of breast cancer. This prediction will be accomplished by training a model using Google Prediction API. The API has integration with various programming languages such as Java, GO, JavaScript, Python, PHP and Ruby. 16 Researchers, physicians, and the general public, more than ever, face a massive challenge in keeping up with the rapid pace of findings and advancements of the influx of clinical trials. To address this issue, researchers combed the ClinicalTrials.gov database for COVID-19-related clinical trials, created unique reports to summarize results, and made meta-data accessible through APIs. 17 The research was carried out on characterizing illicit COVID-19 product sales. This involved collecting COVID-19related data from Twitter and Instagram posts using a combination of site scraping on Instagram and filtering the public streaming Twitter API for keywords correlated with suspect COVID-19 marketing and sales. Data were analyzed using Natural Language Processing (NLP) and deep learning to classify possible buyers, who were then manually annotated for desired characteristics. They had used a personalized data dashboard to visualize illegal trading posts to provide public health information. 18 Also, a reusable dataset using publicly accessible and crowd-verified data on the COVID-19 outbreak in Kerala from government bulletins and media outlets as part of a citizen science initiative via a front-end Web application and a JSON repository, which acts as an API for the front end, this was visualized and produced as a dashboard. 19 The traditional APIs were previously built using Simple Object Access Protocol (SOAP) web service, an XML (Extensible Markup Language) based protocol for accessing web service over HTTP. This comes with some disadvantages that have been addressed in a new trend of APIs. API is now being migrated towards REST interface because of its simplicity, ability to support more straightforward programmatic access by returning either XML or JSON. 20 However, REST API is now becoming a new norm across various web services because of its improvement over performance, scalability and flexibility compared to SOAP services. Another advantage of REST that cannot be ignored over SOAP is that it consumes fewer resources, as well as its operation, do not include different standard and heterogeneous procedures that make it easier to tear domain and compose it when compared to SOAP.
It is understood that there is a significant difference in using SOAP web service and REST web devices. A web service was developed by TogoWS that explained these differences, an integrated web service that gives uniform access to database resources, parsers for database entries and converters for major data formats'. 21 Web services can be categorized into data-retrieval services and analysis services. Both types of services can be exposed using either the REST or the SOAP architecture. In some cases, REST is better suited for data-retrieval services, and SOAP is more suitable for analysis services because REST is easily mapped to resource URIs. In contrast, SOAP usually requires a long execution time or complex parameters. 22 Furthermore, SOAP architecture brings output in XML, while REST gives output in JSON in most cases.
From our extensive review, REST architecture is mostly utilized in the literature and is was adopted for this study. In the existing COVID API, gaps identified include the previous day's data are unavailable and so one cannot go back in time to assess the curve of the virus, data is retrieved in bulk, and one cannot select a particular state to inspect data for that state only. The work at hand overcomes these gaps by adding features that will make available useful data to researchers.

Methods
To realize the objective of this work, the following tools and frameworks were engaged to develop the web scraper, which fetches data from the NCDC website then the data is warehoused on MS SQL Server. Furthermore, a Web API that has endpoints for data to be consumed in JSON is developed. The application block diagram is shown in Figure 1. The different procedures that were carried out to develop the application are hereafter described.

Web scraper
To web scrape a web page, one has to understand the Hypertext Mark-up Language (HTML) structure and then use the XPath of the element(s). The NCDC daily update has details for every state concerning their COVID cumulative data and the national summary of cases. Figure 2 is a screenshot of the NCDC website showing the data that can be gotten from the webpage.  Upon study of the source code of the NCDC website, the HTML elements containing the confirmed cases, active cases, deaths, and discharges have an XPath of "//h2[@class = 'text-right text-white']" and that can be used to fetch all nodes that have such structure. First, the package was downloaded using NuGet Packages, then added to the project automatically by Visual Studio, and then imported into the class that will handle the scraping of data.
After data was scrapped, a validation was done to ensure that the date of the data was yesterday's date (this is because the real-time data on the NCDC website is the previous day's data). If the data validation failed, the scraping was rescheduled for another two hours, but if the validation passed, the data was stored in the database. Figure 3 shows the Unified Modelling Language (UML) Class Diagram for this system. The HomeController is the class that handles all HTTP requests, and the various endpoints are represented in the form of methods. Annotation was used to state the HTTP method to respond and the GET method for all the endpoints.
The DatabaseContext class is the intermediary between the database and the system. The created tables are defined as properties with a returned type of DbSet. The DatabaseContext class inherits from the DbContext superclass and the superclass as a method of OnModelCreating, which is used to state the properties (such as Primary Key, Foreign Key, etc.) as defined in the tables.
DataService class implements the IDataService interface. It is good programming practice to use interfaces to model multiple inheritances, a feature that some object-oriented languages support that allows a class to have more than one superclass. The DataService type makes use of Language Integrated Query (LINQ) statements to execute queries on the database.
The DataModel is the model of data fetched from the NCDC website and the data model to be saved in the database. The DataModel class has the properties that capture the data needed from the NCDC website. Other details such as Sampled Cases, Confirmed Cases, Active Cases, Discharged, and Deaths at a national level are captured by the ScrapeData() method.
After data has been scrapped, a test was performed to ensure that the data fetched is the latest from the website. To achieve that, the date of the scraped data is compared with the previous day's data (since that data in the NCDC website is a reflection of data in the last day) to ensure that it is not already captured. If the information does not pass this test, it will not be processed further, and then the routine is scheduled after two hours because it means the website has not been updated. If the data passes the test, the SaveData() method will be called to store the data in the database. Entity Framework was adopted to handle database transactions rather than writing Structured Query Language (SQL) scripts directly into the code.

Automation
Since data will need to be fetched from the NCDC website daily, the scrapping process was automated. Hangfire was adopted to achieve this automation as it is an open-source library that supports background jobs, which perform fire-andforget, delayed and recurring, long-running, short-running, CPU or I/O intensive tasks inside ASP.NET applications and No Windows Service or Task Scheduler is required.
We utilized the delayed and recurring jobs option. Under the recurring option, the job was scheduled to run by 12:30 pm (GMT +1). This decision was made because the NCDC website is not normally updated in the early hours of the day. The delayed jobs option happen in instances where after data has been fetched but failed the date test (i.e., the data fetched has not been updated), it will be rescheduled to run after two hours.
The Hangfire package (Hangfire.AspNetCore and Hangfire.SqlServer) has to be installed from the NuGet Packages before it can be used in a project. After installation, it was registered with the Startup class in the ConfigureServices to use SQL Database using the already created connection while in the Configure method. Afterwards, it registered the Hangfire Dashboard and Server, which are additional services to monitor the background service and status.
The scraper is scheduled to run at 12:30 pm (GMT + 1), it uses what is known as cron expression. Cron expression uses five (5) asterisks (*) and asterisk mean every minute of every hour of every day of every month and every day of the week as shown in Table 1. If for any reason it fails the date test, it is rescheduled to run in the next two (2) hours.
After successfully scheduling a job(s), the Hangfire Dashboard gives a graphical overview and status of each job on whether it was successfully executed or not, the scheduled jobs, the recurring jobs and much more. Figure 4 shows the Hangfire Dashboard showing the recurring job.
API endpoints API endpoints are the channels through which other applications can communicate or consume an API. Table 2 lists the various endpoints that this API will provide. Endpoints are the point of entry, i.e., the Universal Resource Locator (URL). The state parameter is expecting a valid name of a state in Nigeria even states with more than one word such as Akwa Ibom are also acceptable. The date parameters take a date in the following format only DD-MM-YYYY.   Figure 7 shows a Graphical User Interface powered by Swagger that makes the API usuable even by nontechnical person while software such as Postman, Eggplant etc. need a level a technical understanding to know how to consume the API.

Operation
The API was designed with on a system with the following properties

Results & discussion
This section briefly presents the results of the various tests conducted on the NigeriaCovid19API. Since this system is designed to fetch COVID19 data as showcased on the NCDC website and give a JSON response to API calls, data here is only warehoused and no manipulations were done on it. The tests were carried out to ascertain the functionality of the various API endpoints and ensure that data has been fetched successfully. This section presents the test we carried out with Postman, Swagger and MS SQL Server on a localhost computer.

Test with MS SQL Server
After data has been fetched, it was warehoused on an MS SQL Server and to confirm that the data has been stored successfully, we ran a READ operation (i.e., a SELECT instruction to fetch data from the database) to ascertain that data was successfully fetched and stored as shown in Figure 5.

Test with Postman
Postman was developed to help with monitoring and testing APIs. We engaged it to inspect the API endpoints and their responses. Figure 6 shows the JSON response for the/api/data endpoint which gives the latest data in the database.

Test with Swagger
Swagger is a User Interface (UI) that gives a graphical interface for interacting with a REST API. It allows a user to visually interact and test an API. It uses the XML tags and annotations to get the output alongside the additional features for testing the API. Figure 7 shows the general UI for Swagger and Figure 8 shows the JSON response for/api/data    endpoint which gives the latest data in the database. Figure 9 shows the JSON response for api/data/s/lagos, which gives all the data for the Lagos state.

Testing with Blazer
BlazeMeter is an online open source-based, enterprise-ready platform that unifies all the functionality needed in carrying out various tests such as functional testing, performance testing, API testing and much more. API monitoring enables you  to assess the impact of application performance enhancements. We may compare performance before and after the adjustments using historical response timing data.
To create a Test, logon to the BlazeMeter Website, Select Create Test, Select the Performance Test, Select the Enter URL and type in the URL request for the API, select the number of users to be used for testing, the duration (for the free package, you have to use the default location if you want to use a different location, you have to upgrade your account) and attach the necessary header variables were needed.
The API was tested on this platform and Figure 10 shows the summary, Figure 11 shows the response time and Figure 12 shows the hits. Throughput is the number of requests completed in a time interval and the system had an average of 11Hits/s with a response time of 2905 ms.
The volume of transactions created overtime during a test is referred to as throughput. It can also be described as the maximum capacity of a website or application. The number of concurrent users is the number of users who are actively using the app or website at any given moment, where we had 20 for the test. The number of hits per second generated by those concurrent users will be determined solely by their interactions with the app.
In Figure 11, Response Time is the time that passed to perform the request and receive a full response while the Latency is the time from sending the request, processing it on the server-side, to the time the client received the first byte. The time it  takes for data or a request to travel from the source to the destination is known as network latency. Network latency is measured in milliseconds.
In Figure 12, the number of HTTP requests sent by the user(s) to the Web server in a second is referred to as hits per second. The overall load put by concurrent virtual users on the server, regardless of whether they are executed successfully or not on the server-side, is measured in hits per second.
Plots from extracted data Upon successful deployment 24 and testing of the API, data were extracted, and charts plotted to show the rate of infection, death and recovery rate and state with highest numbers as shown in Figures 13, 14 and 15 respectively. Figures 13 and 14 have two vertical lines, red and blue. The red line indicates the end of the first wave and commencement of the second wave while the blue signifies the end of the second wave and the start of the third wave.  Initially as seen in Figure 13, the spread in Nigeria was slow with single digits after the first case was recorded, mainly at Lagos and Abuja but as the days goes by the number of active cases grew to reach a maximum of 12, 915 (as at August 2020) cases in Lagos state and the nationwide total confirmed cases kept raising to around 182,000 as at August 2021.
The first death in Nigeria as a result of COVID-19 was recorded on the 23 rd of March 2020, the person had returned from medical treatment from the United Kingdom and with the rising number of cases, the Federal Government of Nigeria under the advice of the Presidential Task Force on the Control of COVID-19 placed a ban on all international flights in and out of the country except for emergency and essential flights. The death has continued to increase from Ogun state to other experiencing deaths of patients and hitting a peak of 465 deaths in Lagos state, however, many patients were able to recover and were discharged from the centres with a total of 166,826 persons across the nation as seen in Figure 14. The Line graph in the Figure looks like a straight line and that is because, in comparison with the Total Rate of Discharge, the total deaths seems small with a peak value of 2219 as of August 2021.    Figure 15 shows the most affected states with a total confirmed case above 5,000 as of August 2021. The high level of cases in Abuja and Lagos can be attributed to the active community case search and testing. This public health response has led to the detection of more asymptomatic cases at the community level as stated by Ref. 2.
Nigeria shares land borders with the Republic of Benin in the west, Chad and Cameroon in the east, and Niger in the north. Figure 16 shows the confirmed cases in comparison with border countries and Nigeria having the highest number of confirmed cases and followed by Cameroun with cases above 75,000 cases as of August 2021 while Benin, Nigeria and Chad having cases below 25,000 cases.  As shown in Figure 17, in comparison with the top (5) hit countries on the continent of Africa, Nigeria is the tenth on the continent as of August 2021 in terms of the number of confirmed cases with 180,000 cases while South Africa was the most affected on the continent with over 2.5 million cases, followed by Morocco over 750,000 cases, then Tunisia coming forth with about 600,000 cases and Ethiopia coming fifth on the continent with around 300,000 cases.

Conclusion
Knowing the importance of data to research, we have designed and developed an API that gives a JSON response to HTTP requests. It was built using the REST architecture and it is designed to fetch data from the NCDC website daily with the data being warehoused on MS SQL Server. Postman and Swagger were used to testing that Data endpoints to ensure the appropriate data is fetched as defined by the API endpoint, SQL Queries to select data were used to ensure that the database is properly populated with the right data and BlazeMeter was used to test the performance of the whole system and it achieved an 11Hits/s with a response time of 2905 ms. Its advantage over existing COVID19 APIs for Nigeria is the access to previous data which means that researchers and data enthusiasts can use the API to data as required by their various research needs with a simple URL rather than having to comb through the archives of the NCDC website to get the data they need. Simply put, this system makes Nigeria's COVID19 data access to be easily accessible even by non -technical individuals with our user interface powered by swagger. The source codes and the dataset used in this paper are available as open-source on GitHub. 25