Identification Sus scrofa and Mus musculus as potential hosts of SARS-CoV-2 via phylogenetic and homologous recombination analysis [version 2; peer review: awaiting peer review]

Background: Previously, most studies focus on the wild animal  being sold in the Wuhan Huanan seafood wholesale market, neglecting that the livestock living around other place could also be the original hosts. Methods: First, relative synonymous codon usage was utilized to analyze the potential hosts of SARS-CoV-2; Then cluster SARS-CoV-2 and related coronavirus through the phylogenetic tree. Next, we used Recombination Detection Program to identify the possible recombination region, as well as verifying via Simplot. Results: Related coronavirus from porcine or murine sources may faciliatate the evolution and reorganization of SARS-CoV-2. Conclusions: Overall, to our knowledge, this is the first paper to illustrate that swine and mice could be probable reservoirs for the SARS-CoV-2.


Introduction
A novel coronavirus, SARS-CoV-2, was recently reported in the city of Wuhan, Hubei province, China, causing severe respiratory diseases as well as epidemic all around the China. The inflection point of confirmed cases didn't occur until February 2020, with the first patient hospitalized on the 12 th of December 2019 1 . On 29 th February, 573 new confirmed cases of novel coronavirus infection were reported on the Chinese mainland, bringing the total to 78,630. There were 35 new fatalities reported daily, with the cumulative fatalities were up to 2761.Meanwhile, outside the Chinese mainland, more than 5000 cases have been confirmed in Asia, in places such as Japan, Singapore, Thailand and South Korea, in Europe, in places like Germany and France, and in the Americas. Consensus that the SARS-CoV-2 originated from bats has been reached 2 . However, intermediate hosts are deemed as having mediated human infection via gradually adapting the mechanism of transcription and translation in the human body.
The exact putative parent of SARS-CoV-2 remains uncertain. Frequent contact between humans and swine could lead to a higher risk of cross-species transmission or virus recombination. For the sake of identifying intermediate host, relative synonymous codon usage (RSCU) analysis was applied to evaluate the potential diversity of species acting as reservoirs. Phylogenetic and homologous recombination analysis were also used to illuminate the correlated coronavirus hosting in the most possible host.

Sequence data collection
Porcine and murine coronavirus were refined and downloaded from NCBI virus database (Table S1, Extended data B) 3 . Here, the mitochondrial genes represent the whole genome of potential host (Table S2, Extended data B) 3 . ClustalX 1.83 was applied to align the sequences.
Phylogenetic analysis MEGA X (v1.0.3) was used to construct the phylogenetic trees using the Neighbor-Joining method 4 (The input GenBank accessions are shown in Table S1, Extended data B) 3 . The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1000 replicates) is shown next to the branches 5 . The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. All genome sequences were aligned before implementing of phylogenetic analysis.

Synonymous codon usage analysis
In order to identify the relative synonymous codon usage (RSCU) bias of the SARS-CoV-2 and its potential host, the coding sequences of suspect species were calculated using CodonW 1.4.2. (Table S4, Extended data C) 6 . Whole coding sequences of the genomes, downloaded from NCBI, were utilized for identifying the host among different viruses 7 (the input GenBank accessions are shown in Table S1, Extended data B) 3 . The RSCU were calculated with CodonW 1.4.2. The heat map clustering of RSCU was realized via applying MeV 4.9.0. The homologous analysis was executed via analyzing the pairwise distance of mitochondrial genes from the potential host animal. The pairwise distances were computed with MEGA X via the bootstrap test (1000 replicates) to evaluate potential hosts 8 .

Results
The intermediate animal reservoirs of the SARS-CoV-2 point to swine and mice RSCU has been widely utilized to analyze the association between virus and potential host 7 . The output from RSCU heat map, based on the Euclidean distance, inferred that both Sus scrofa and Mus musculus have similar synonymous codon usage bias with SARS-CoV-2 ( Figure 1A). The Euclidean distance between Sus scrofa and SARS-CoV-2 is minimal, hinting that SARS-CoV-2 could effectively use porcine translation machinery better than that of other animals, suggesting that the epidemic SARS-CoV-2 might originate from swine. The pairwise distance between the SARS-CoV-2 complete genome and the potential host mitochondrial genome also supports the above judgments. Regarding the pairwise distances of SARS-CoV-2, both swine and mice presented shorter than snakes, marmots, mink, bats and humans and ( Figure 1B). Pairwise distance between the SARS-CoV-2 and Sus scrofa is 14.89, and for SARS-CoV-2 and Mus musculus is 14.94. This result indicates that SARS-CoV-2 may originate from both swine and mice.

Amendments from Version 1
In order to avoid the risk of stigmatization with SARS-CoV-2, we shall not use the geographical hot map to depict the distribution of PEDV in China.  (Table S4), Pairwise Distance was applied to evaluate the homology as compared with SARS-CoV-2 via MEGA-X.

Any further responses from the reviewers can be found at
bootstrap support (Figure 2). RSCU revealed that related coronaviruses have similar synonymous codon usage bias with SARS-CoV-2 ( Figure 3). The other PEDV strains obtained from Hubei also showed the closely phylogenetic correlation with SARS-CoV-2. Overall, the close phylogenetic relationship to Sus scrofa provides evidence for bat-swine axis, being one of the origins of SARS-CoV-2.
Meanwhile, the result of RSCU also presented that both SARS-CoV-2 and related coronavirus lean towards to having similar synonymous codon usage bias ( Figure 3 and Table S5, Extended data C) 5 . Therefore, the origination of SARS-CoV-2 could be further focus on the coronavirus isolating from swine and mice. Particularly, the relationship with PEDV H11-SD2017 and PEDV YN15 needed to be further studied.
To our knowledge, this is the first study to report that porcine and murine coronavirus may attend the reorganization of SARS-CoV-2. The RDP4 estimated the possible reorganization regions for SARS-CoV-2 (Table 1, Figure 4A). Furthermore, SimPlot analysis confirmed the homologous recombination of sequence similarity between SARS-CoV-2 and coronavirus from potential hosts. The potential recombination breakpoints (16205-16358nt) are shown in red dashed lines ( Figure 4B), indicating recombination between PEDV YN15 and the murine hepatitis virus JHM when SARS-CoV-2 was queried. In addition,    the region between 20923 and 21181 nt also indicated a recombination event taking place between PEDV H11-SD2017 and Human coronavirus HKU1 when SARS-CoV-2 was queried ( Figure 4C).

Discussion
In China, the cumulative number of patients diagnosed as infected with SARS-CoV-2 is thought to over 100,000, with more than 2000 deaths. It is the most severe public health emergency since the outbreak of SARS 17 years ago 9 . Local residents were reined with anxiety and confused during the virus outbreak, although the Chinese government has taken substantial action to prevent and control the spread of the virus. However, multiple different messages could disrupt the attention of medical workers and science researchers. Current research about the origin of SARS-CoV-2 mostly focuses on wildlife, since the first case was thought to be highly associated with wild animals in Wuhan sea food market.
The origin of SARS-CoV-2 had caused great concern to the public. Natural variation was the dominant view holding by most scholars, believing that bats and other wild animals provide reservoirs for the virus. However, coronavirus from bats is unlikely to infect humans directly; one or two intermediate hosts may facilitate the homologous recombination, enabling the coronavirus to gradually adjust to the human genetic code, then survival and breeding successfully.
From our perspective, natural variation would be a more reasonable explanation for the origin of SARS-CoV-2. However, the wild animals in Wuhan seafood market should not accept all the liability, since there is evidence that patients with early infection didn't have any contact with the market 9 . Some researchers 10 have also pointed out that the Wuhan seafood market is not the only area of origin, indicating that another creature may act as intermediate host apart from those animals being sold at the market. In our study, the captivity animal, Sus scrofa (swine) and Mus musculus (mice) were suspected to be critical hosts of SARS-CoV-2.
Our finding supports the theory of natural variation. Natural variation believes that people get infected because they eat or come into contact with intermediate hosts. SARS-CoV-2 was found to be 96% identical at the whole-genome level to the bat coronavirus 2 . Other reports state that snakes, mink, and pangolins could be potential hosts for SARS-CoV-2 11 .
Either wild animals or reared livestock could serve as hosts.
It has been reported that Wuhan seafood market may not be the only source of novel virus spreading globally because the earliest patient became ill on 1 December 2019 and had no epidemiological link to the seafood market or later cases. The official details about the first 41 hospitalized patients showed 13 of the 41 patients had no link to the marketplace at all 9 .
One possible hypothesis is that the cross-species transmission has occurred in other places before the outbreak of Wuhan Huanan Seafood market.
Previous study into fatal swine acute diarrhea syndrome (SADS) revealed that SADS-related coronavirus was responsible for a large-scale outbreak of fatal disease in pigs in China 12 . Here we discovered Porcine epidemic diarrhea virus (PEDV) periodicity burst in China 13 . Among one of the strains, H11-SD2017 showed closely affiliation with SARS-CoV-2 via implementing relative synonymous codon usage (RSCU) and phylogenetic analysis. Swine-to-human cross species transmission may explain why many patients with coronavirus disease-19 (COVID-19) not only suffer from severe respiratory diseases, but also diarrhea 9 .
In the past 10 years, PEDV has spread into most provinces with the swine industry in China. Hubei was the first region to be infected with PEDV strain CH/HBQX/10 in 2010; from then on, more and more provinces reported the presence of PEDV 13 . The spread of PEDV has gone beyond its initial geographical limitation. There were more than 300 PEDV strains clustered into pandemic, meaning the significant natural variation took place in the spread of PEDV in different regions. Further analysis among 21 strains of PEDV and coronavirus from varied species indicate that swine and mice could be other hosts of SARS-CoV-2 ( Figure 1A, Figure 2 and Figure 3).
RDP4 and Simplot analysis helped us better understand the homologous recombination of SARS-CoV-2 (Table 1, Figure 4). It verified that not only porcine coronavirus, but also murine coronavirus, experienced recombination events. Therefore, we speculate that SARS-CoV-2 may originate from the bat firstly, undergoing a series of recombination events, with swine and mice playing critical role in mediating cross species transmission.
Pairwise analysis of distance also indicated that Mus musculus could be a possible host of SARS-CoV-2 ( Figure 1B)  A separate comment is that the authors' use of "natural variation" as the proposed mechanism for transmission is not a very precise description of what they mean, which is that the virus "naturally" transmits among people and animals frequently encountered in every day life and does not require unusual encounters of "wild" animals in meat markets. There must be a better term for this mechanism. Perhaps "routine contact"? Epidemiology must have a term for this, but that is not my area of expertise.
Competing Interests: No competing interests were disclosed.
The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias • You can publish traditional articles, null/negative results, case reports, data notes and more • The peer review process is transparent and collaborative • Your article is indexed in PubMed after passing peer review • Dedicated customer support at every stage • For pre-submission enquiries, contact research@f1000.com