Prediction of the SARS-CoV-2 (2019-nCoV) 3C-like protease (3CL pro) structure: virtual screening reveals velpatasvir, ledipasvir, and other drug repurposing candidates

We prepared the three-dimensional model of the SARS-CoV-2 (aka 2019-nCoV) 3C-like protease (3CL pro) using the crystal structure of the highly similar (96% identity) ortholog from the SARS-CoV. All residues involved in the catalysis, substrate binding and dimerisation are 100% conserved. Comparison of the polyprotein PP1AB sequences showed 86% identity. The 3C-like cleavage sites on the coronaviral polyproteins are highly conserved. Based on the near-identical substrate specificities and high sequence identities, we are of the opinion that some of the previous progress of specific inhibitors development for the SARS-CoV enzyme can be conferred on its SARS-CoV-2 counterpart. With the 3CL pro molecular model, we performed virtual screening for purchasable drugs and proposed 16 candidates for consideration. Among these, the antivirals ledipasvir or velpatasvir are particularly attractive as therapeutics to combat the new coronavirus with minimal side effects, commonly fatigue and headache. The drugs Epclusa (velpatasvir/sofosbuvir) and Harvoni (ledipasvir/sofosbuvir) could be very effective owing to their dual inhibitory actions on two viral enzymes.


Introduction
On 7 January 2020, a new coronavirus, 2019-nCoV (now officially named SARS-CoV-2) was implicated in an alarming outbreak of a pneumonia-like illness COVID-19, originating from Wuhan City, Hubei, China. Human-to-human transmission was first confirmed in Guangdong, China 1 . The World Health Organisation has declared this a global public health emergency -on 15 February 2020, there are more than 65,000 confirmed cases reported, and the death toll is over 1500. In the height of the crisis, this virus is spreading at a rate and scale far worse than previous coronaviral epidemics. By the time we finished revising this article (1 April 2020), it is a pandemic with more than 850,000 infected and total deaths of more than 42,000 affecting more than 180 countries/regions.
It was immediately evident from its genome that the coronavirus is evolutionarily related (80% identity) to the beta-coronavirus implicated in the severe acute respiratory syndrome (SARS), which originated in bats and was causative of a global outbreak in 2003. The momentum of research on developing antiviral agents against the SARS-CoV carried on after the epidemic subsided. Despite this, no SARS treatment has yet come to fruition; however, knowledge acquired from the extensive research and development efforts may be of use to inform the current therapeutic options.
The viral genome encodes more than 20 proteins, among which are two proteases (PL pro and 3CL pro ) that are vital to virus replication; they cleave the two translated polyproteins (PP1A and PP1AB) into individual functional components. The 3-chymotrypsin-like protease (3CL pro , aka main protease, M pro ) is considered to be a promising drug target. Tremendous effort has been spent on studying this protein in order to identify therapeutics against the SARS-CoV in particular and other pathogenic coronaviruses (e.g. MERS-CoV, the Middle East respiratory syndrome coronavirus) in general because they share similar active sites and enzymatic mechanisms. The purpose of this study is to build a molecular model of the 3CL pro of the SARS-CoV-2 and to carry out virtual screening to identify readily usable therapeutics. It was not our intention, however, to comment on other structure-based drug design research as these will not be timely for the current epidemic.

Analysis of protein sequences
The translated polyprotein (PP1AB) sequence was obtained from the annotation of the GenBank entry of the SARS-CoV-2 genome (accession number MN908947). By comparing this sequence with the SARS-CoV PP1AB sequence (accession number ABI96956), the protease cleavage sites and all mature protein sequences were obtained. Sequence comparison and alignment were performed with BLASTp.

Preparation of structural model
The high-resolution apo-enzyme structure of SARS-CoV 3CL pro (PDBID: 2DUC) 2 was employed as the template. The functional SARS-CoV 3CL pro is a dimer, therefore the SARS-CoV-2 enzyme was also constructed as a dimeric model, preserving all intermolecular interactions. The variant residues (Table 1) were "mutated" in silico by SCWRL4 3 , followed by manual adjustment to ensure that the best side-chain rotamer was employed ( Table 2). The rebuilt model was subjected to steepest descent energy minimisation by Gromacs 2018.4 using the Gromos 54A7 forcefield, with a restraint force constant of 1000 kJ mol -1 nm -2 applied on all backbone atoms and all atoms of the vital residues (Table 1). Accessible surface area of residues were calculated with areaimol of the CCP4 suite v7.0.

Virtual screening
MTiOpenScreen web service 4 was used for screening against its library of 7173 purchasable drugs (Drugs-lib), with 4574 unique compounds and their stereoisomers. Each library entry is identified with the name of the compound as well as an ZINC15 ID. The target binding site grid centre was specified by the active-site residues. At the MTiOpenScreen interface, the 'Mode' was set to 'List of residues' and these residues were specified: H41, M49, G143, S144, C145, H163, H164, M165, E166, L167, D187, R188, Q189, T190, A191 and Q192. The active sites on chain A and chain B, each derived from the catalytically-active dimeric model, were screened independently with AutoDock Vina 5 .

Amendments from Version 1
The manuscript was revised according to the reviewers' comments, as follows: 1. Methods: Preparation of structural model. The details of the starting dimeric model were included.
2. Table 1 now includes a caption to make it clearer. Single-letter amino-acid codes were added. The original residue ID in the SARS-CoV enzyme was included for comparison. The list of dimerisation residues was revised.
3. Methods:Virtual screening. More details were given to the Drugs-lib and its content. The options of defining the grid centre with active-site residues (of each chain from the dimeric model) were included. 4. Methods:Virtual screening. Now includes a description of how the top list was assembled from individual screening results, with multiple stereoisomers of a compound merged. 5. Results:Virtual screening. The full range of binding energies of all screening results, and the mean scores are given for comparison.
6. Table 3 and Table S2. A column was added to indicate the compounds' molecular weights. The 'Hits' column was revised to show the number of occurrences of a compound (different stereoisomers, each has a unique ZINC15 ID) found in the topscoring positions, out of the total number of stereoisomers of that compound. At the bottom of the tables, a 'Reference' section was added indicating the mean binding energies of each screen; as well as the binding energies of lopinavir and ritonavir. 7. Results: Assessment. More discussion and additional reference were made on hesperidin.
8. Discussion. The discussion of lopinavir/ritonavir now included their scores and the comparison with the top scorers. The results of the latest clinical trial were included, with reference. 9. Data: the DOI to the extended data was updated (Table S2 was  updated). 10. Minor changes: updated with the latest statistics and additional references.

REVISED
When the crystal structure was released, it was stripped of its inhibitor and subjected to a screening.
The results returned from MTiOpenScreen is a list of 4500 target:ligand docking combinations (1500 ligands, each with 3 binding modes) ranked by binding energies. We listed the top 10 scorers of each chain as results. Stereoisomers of a compound (with the same drug name but unique ZINC15 IDs) that appear in the top list are collected together and presented as hits. The top ranking candidates for chains A and B were examined visually in PyMOL (version 1.7.X) 14 .

Results
High sequence homology with SARS-CoV The first available genome was GenBank MN908947, now NCBI Reference Sequence NC_045512. From it, the PP1AB sequence of SARS-CoV-2 was extracted and aligned with that of SARS-CoV. The overall amino-acid sequence identity is very high (86%). The conservation is noticeable at the polyprotein cleavage sites. All 11 3CL pro sites 2 are highly conserved or identical (Extended data 15 , Table S1), inferring that their respective proteases have very similar specificities. The 3CL pro sequence of SARS-CoV-2 has only 12 out of 306 residues different from that of SARS-CoV (identity = 96%).
3D model of the SARS-CoV-2 3CL pro The amino acids that are known to be important for the enzyme's functions are listed in Table 1. Not unexpectedly, none of the 12 variant positions are involved in major roles. Therefore, we are confident to prepare a structural model of the SARS-CoV-2 3CL pro by molecular modelling (Extended data 15 , Figure S1), which will be immediately useful for in silico development of targeted treatment. After we submitted the first draft of this study, the crystal structure of SARS-CoV-2 3CL pro was solved and released (PDB ID 6LU7) 16 , which confirms that the predicted model is good within experimental errors (Extended data 15 , Figure S2).

Virtual screening for readily available drugs
The list of 1500 results has Autodock Vina binding energies ranging from -10.1 to -7.6 (mean = -8.2) kcal mol -1 from chain A active site; and -8.7 to -6.5 (mean = -7.1) kcal mol -1 for that of chain B. When examined in molecular graphics 14 , Table 2. In silico mutagenesis to make the SARS-CoV-2 3CL pro . The 12 variant residues with reference to the SARS-CoV enzyme are shown with the respective treatment of rotamer. "A" and "B" refers to the individual chains of the dimeric model. Both chains are in the crystal asymmetric unit and are not identical. The rotamer symbol (bracketed) is defined according to the conventions of Richardson 13 , followed by its respective rank of popularity. 'ASA': accessible surface area (average of A and B chains) of the residue in the SARS-CoV 3CL pro structure, in Å 2 and in % relative to the ASA of a residue X in the Gly-X-Gly conformation.

Residue Rotamer
ASA, Å 2 (%)  Table 1. Important residues of 3CL pro from SARS-CoV (conserved) and the SARS-CoV-2 variant residues. The residues that play functional roles in SARS-CoV 3CL pro are listed on the top three rows. These are absolutely conserved in the SARS-CoV-2 protein. The variant residues found in the SARS-CoV-2 protein are listed in the bottom row, with the SARS-CoV residues in brackets.

SARS-CoV
Catalytic H41, C145 6 Substrate binding H41, M49, G143, S144, 163-167, 187-192 2,7 Dimerisation R4, M6, S10, G11, E14, N28, S139, F140, S147, E166, E290, R298 8-12 This work all solutions were found to fit into their respective active sites convincingly. The binding energies of chain A complexes were generally higher than those of chain B by approximately 1.4 kcal mol -1 among the top scorers (Table 3). This presumably demonstrates the intrinsic conformational variability between the A-and B-chain active sites in the crystal structure (the average root-mean-square deviation (rmsd) in Cα atomic positions of active-site residues is 0.83 Å). In each screen, the differences in binding energies are small, suggesting that the ranking is not discriminatory, and all top scorers should be examined. We combined the two screens, merged stereoisomers, and found 16 candidates which give promising binding models (etoposide and its phosphate counted as one) (Table 3). One drug (dirlotapide) which is not intended for human use was excluded. All possible isomers of compounds with multiple stereoisomers are found in the full screening results of 1500, in particular: 38 of hesperidin, 34 of teniposide, 32 of etoposide and 21 of etoposide-phosphate.
The flavonoid glycosides diosmin ( Figure 1B) and hesperidin ( Figure 1E), obtained from citrus fruits, fit very well into and block the substrate binding site. Yet, these compounds cause mild adverse reactions (Table 4). Hesperidin has 38 stereoisomeric forms and several of these showed up among the top scorers (Table 3; Figure 1E). It has been reported to be a good inhibitor of the SARS-CoV 3CL pro with an IC 50 of 8.3 µM in a cell-based assay 18 .
Teniposide and etoposide (and its phosphate) are chemically related and exhibited good binding models ( Figure 1F).

Assessment of the candidate drugs
We checked the actions, targets and side effects of the 16 candidates. Among these, we first noticed velpatasvir ( Figure 1A, Figure 1D) and ledipasvir, which are inhibitors of the NS5A protein of the hepatitis C virus (HCV). Both are marketed as approved drugs in combination with sofosbuvir, which is a prodrug nucleotide analogue inhibitor of RNA-dependent RNA polymerase (RdRp, or NS5B). Interestingly, sofosbuvir has recently been proposed as an antiviral for the SARS-CoV-2 based on the similarity between the replication mechanisms of the HCV and the coronaviruses 17 . Our results further strengthen that these dual-component HCV drugs, Epclusa (velpatasvir/ sofosbuvir) and Harvoni (ledipasvir/sofosbuvir), may be attractive candidates to repurpose because they may inhibit two coronaviral enzymes. A drug that can target two viral proteins substantially reduces the ability of the virus to develop resistance. These direct-acting antiviral drugs are also associated with very minimal side effects and are conveniently orally administered (Table 4). These computational results provide a rationale for experimental validation of inhibiting the SARS-CoV-2 with velpatasvir and ledipasvir, which is underway. However, these chemotherapy drugs have a lot of strong side effects and need intravenous administration (Table 4). The approved drug venetoclax ( Figure 1C) and investigational drugs MK-3207 and R428 scored well in both screens. Venetoclax is another chemotherapy drug that is burdened by side effects including upper respiratory tract infection (Table 4). Not much has been disclosed about MK-3207 and R428.
We subjected the crystal structure to the same virtual screening procedures. A very similar list of candidates showed up consistently (Extended data 15 , Table S2) with high scores although ledipasvir was not found.
We noticed that most of the compounds on the list have molecular weights (MW) over 500 (Table 3), except lumacaftor (MW=452). The largest one is ledipasvir (MW=889). This is because the size of the peptide substrate and the deeply buried protease active site demand a large molecule that has many rotatable dynamics to fit into it.

Discussion
We identified five trials on ClinicalTrials.gov involving antiviral and immunomodulatory drug treatments for SARS (Table 5), all without reported results; i.e., at present, there are no safe and effective drug candidates against SARS-CoV. This is because once the epidemic is over, there are no patients to recruit for clinical trials. Only the study with streptokinase succeeded in completion of phase 3. It is disappointing that little progress in SARS drug development has been made in the past 17 years. After the 2003 outbreak, numerous inhibitors for the 3CL pro enzyme have been proposed 19,20 , yet no new drug candidates have succeeded to enter the clinical phase 1.
One record which receives a lot of attention amid the current outbreak is the lopinavir/ritonavir combination 21 . They are protease inhibitors originally developed against HIV. During the 2003 SARS outbreak, despite lacking a clinical trial, they were tried as an emergency measure and found to offer improved clinical outcome 21 . However, some scientists did express scepticism 22 . By analogy, these compounds were speculated to act on SARS-CoV 3CL pro specifically, but there is as yet no crystal structure to support that, although docking studies were carried out to propose various binding modes 23-26 . The IC 50 value of lopinavir is 50 µM (K i = 14 µM) and that for ritonavir cannot be established 27 . These two compounds turned up in our virtual screening results, with scores slightly lower than the mean scores (Table 3). Based on our results that the two CoV 3CL pro enzymes are identical as far as active sites and substrate specificities are concerned, we were of the opinion "that it was still one of the recommended routes for immediate treatment at the time of writing the first version (mid-February 2020). Disappointedly, the latest trial of lopinavir/ritonavir on COVID-19 showed no clinical benefit 28 .
If we look beyond the 3CL pro , an earlier screen produced 27 candidates that could be repurposed against both SARS-CoV and MERS-CoV 29 . In addition, the other coronaviral proteins could be targeted for screening. Treatment of the COVID-19 with remdesivir (a repurposed drug in development targeting the RdRp) showing improved clinical outcome has earlier been reported and clinical trial is now underway 30 .
We consider this work part of the global efforts responding in a timely fashion to fight this deadly communicable disease. We are aware that there are similar modelling, screening and repurposing exercises targeting 3CL pro reported or announced 23,31-37 (up to mid-February 2020). Our methods did not overlap, and we share no common results with these studies. During revision, another crystal structure paper was published 38 . The "Extended Results" folder contains the following extended data: • Tab S1.docx (Sequence homology of the 3CL pro cleavage junctions of PP1AB between SARS-CoV-2 and SARS-CoV).
• Tab S2-v2.docx (The results of virtual screening of drugs on the active site of SARS-CoV-2 3CL pro crystal structure).
• Compare Crystal.docx (A comparison, with Figure  S2, of the active sites of model chains A, B and the crystal structure).
Data are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication).

of Isatis indigotica root and plant-derived phenolic compounds. Antiviral Res.
In this article, Chen YW and his colleagues carried out virtual screening using the computational molecular modeling of the viral protein from SARS-CoV-2 or COVID-19 to identify therapeutics targets. The authors presented that 3CLpro protease enzyme of the SARS-CoV-2 is considered to be a promising drug target and repurposing accessible drugs to challenge the globally outbreaking of the SARS-CoV-2. The authors initially used the translated polyprotein (PP1AB) sequence of SARS-CoV-2 and SARS-CoV to prepare for protein structural model. Subsequently, they performed computational virtual screening against this against its using a library of purchasable drugs with the binding site grid. Among > 7,000 repurposing drugs in the screening with their known side effects, the antivirals ledipasir or velpatasvir are potentially used against SARS-CoV-2 infection with minimal side effects. The manuscript is straightforward in both terminology and structure. The manuscript can be considered to be accepted with a minor revision and could be further improved with following points: Table 2 is mentioned prior Table 1 in the manuscript.
More details of the setting and cut-off used in the virtual screening and analysis should be provided in the Method section. Table 1 is quite confusing. The importance residues of the SARS-CoV 3CLpro previously reported and the variant residues found in SARS-CoV-2 (this work) should be separated. The amino acid variants of each position should be included. Using the image with the annotation could be an alternative and more informative presentation.
In Table 3, some information should be included in the table, such as molecular weight. In addition, 4.
In Table 3, some information should be included in the table, such as molecular weight. In addition, the authors should discuss more about the results shown in the Table 3 to compare the binding energy different between A and B chains.
In the conclusion, the authors proposed velpatasvir and ledipasvir as an attractive candidate. However, based on the virtual screening on the active sites of SARS-CoV-2 3CLpro model, both of them are not ranked as the top list in Chain A screening. Could you please explain this scenario?
The results from other virtual screening package (such as Glide or FlexX) should be compared?
To extend the interest of the topic as well as to compare the potential for using repurposing drug in COVID-19 treatment, the drug virtual screening with other viral enzymes might be performed and compared. In this case, since there are several clinical researches for using this drug family (e.g. Lopinavir/ritonavir) in COVID-19 treatment, therefore, the authors can compare the virtual screening model with the clinical outcomes.

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Yes
No competing interests were disclosed.
This is now added to Methods:VS. We revised the Methods:VS section to make the procedures in MTiOpenScreen clearer.
is revised so that important residues in SARS-CoV and variant residues in SARS-CoV-2 3. Table 1 are now separated. The amino acids are identified with their single-letter codes.
We keep the two lists together because the reader can immediately compare them and see that the two sets do not overlap. The same information is conveyed in 3D on in . Fig. S1 Extended data Molecular weights were added accordingly. The 'hits' column was revised to represent the 4. number of stereoisomers found, out of the total possible numbers. caption was also Table 3 revised to explain in detail.
The two chains are structurally non-identical (rmsd in Cα=0.83Å) in the dimer and in the crystal structure. The differences lie primarily in side-chain conformations, especially for residues with long side chains. Thus, the two chains are two slightly-different conformational states of the protein. It is advantageous to have two states for VS because it allows some degrees of conformational variability of the active sites to be taken into account. AutoDock Vina implemented in MTiOpenScreen does not allow for the flexibility of the active site. Therefore, one would not expect the active sites on the two chains to yield the same results in terms of binding energies and ligand ranking. Further, the AutoDock Vina 'binding energy' is not a true binding free energy per se but an analogous empirical scoring function for the sake of assessment. We interpreted these results semi-quantitatively to extract trends, in VS results section, instead of exercising detailed energetic comparisons.
In VS results, we identified 16 candidates. Our first intention was to present all these to the 5. readers as top scorers of docking ranked by AutoDock Vina binding energies. Next, in the Assessment section, we discussed the known properties of some of these which stood out, also considering their side effects. We highlighted velpatasvir and ledipasvir mainly because of their minimal side effects, which is a crucial factor in repurposing. In addition, we noticed the two drugs which correspond to these two compounds also contain sofosbuvir which was identified as an anti-SARS-CoV-2 candidate in a separate study. It was because of all these factors, velpatasvir and ledipasvir stood out among the 16 candidates. Therefore we highlighted them in the conclusion. We also mentioned other top-ranking candidates but they may have strong side effects. We briefly assessed these candidates to inform the clinicians who may be interested in these results.
We did not intend for a comprehensive study comparing with other VS packages. Here we employed AutoDock Vina, one of the most popular VS utilities, and produced results for follow-up development. We are moving quickly to the next stage of experimental verification in order to respond to the rapidly worsening global crisis.
This work was intended to be a fast response to the current pandemic -thus, we focused on 6. one of the several potential targets for antiviral development. To this day, there are several parallel efforts on this and other targets at different scales. At the time we revised it (end of March, 2020), this article is outdated in some aspects. Therefore, we shall not extend the work further on the computational methods, as this is not the most pressing.
As the reviewer mentioned, lopinavir and ritonavir have been clinically tried. These two compounds are also expected to be specific inhibitors of 3CL . Therefore, we checked to see if VS picked up these two drugs: indeed it did, with a medium score (now included in ) quite a Table 3 pro VS picked up these two drugs: indeed it did, with a medium score (now included in ) quite a Table 3 bit lower than the top candidates. Docking of lopinavir/ritonavir into the SARS-CoV 3CL has been performed previously (referenced in the main text) and our models have basically the same active sites. Disappointedly, the latest trial of lopinavir/ritonavir on COVID-19 showed no clinical benefit. We added a sentence in Discussion to update the readers about this clinical trial.
No competing interests were disclosed.
If there are no such experimental data to support the claim, the authors may consider revising their conclusion to "the computational results provide a rationale for further experimental validation of treating SARS-CoV-2 with velpatasvir and ledipasvir".

Is the work clearly and accurately presented and does it cite the current literature? Yes
Is the study design appropriate and is the work technically sound? Yes

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Yes
No competing interests were disclosed.

Competing Interests:
Reviewer Expertise: Medicinal Chemistry, Drug Discovery, Chemical Biology I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

© 2020 Huang J et al. This is an open access peer review report distributed under the terms of the Creative Commons
, which permits unrestricted use, distribution, and reproduction in any medium, provided the original Attribution License work is properly cited.

Qiaozhu Tan
School of Life Sciences, Westlake University, Hangzhou, China Jing Huang School of Life Sciences, Westlake University, Hangzhou, China Yu Wai Chen and co-workers presented a molecular modeling and docking study of the 3CL protease in the SARS-CoV-2 virus. The manuscript started with comparing polyprotein PP1AB sequences of SARS-CoV-2 and SARS-CoV, based on which the 3D structure of SARS-CoV-2 3CLPro protein was constructed. The authors then performed virtual screening against SARS-CoV-2 3CLPro using a library of 7173 purchasable drugs. Considering both binding affinities and known side effects, the authors recommend velpatasvir and ledipasvir, and further suggest combining them with another HCV RdRp inhibitor sofosbuvir, aka repurposing the Epclusa and Harvoni for treating the coronavirus. This is a concise and timely report, and has proposed new therapeutic possibilities for the SARS-CoV-2 virus. The manuscript could be further improved by addressing the following comments.
More details of the docking should be provided. What's the binding energy cutoff used? How is the hits (reported in Table 3) used? 3CLpro is catalytically active as a dimer. How is this considered in the virtual screening? What does the "(B Top scorers)" mean?
In the extended data of virtual screening, one compound could have multiple entries with different ZINC numbers. For example hesperidin corresponds to at least 20 different compounds. What are the difference? And how are different results assembled? Table 1 is not clear. Please do a column-by-column comparison between different sites of SARS-CoV and SARS-CoV-2. Also please add one-letter amino acid codes for the residues.
The constructed protein structure is very similar to the recently solved crystal structure (6LU7), as "... confirms that the predicted model is good within experimental errors", but the docking results seem to differ significantly. Could the authors explain?
Is the work clearly and accurately presented and does it cite the current literature? Yes

If applicable, is the statistical analysis and its interpretation appropriate? Yes
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results?
No competing interests were disclosed.

Competing Interests:
The benefits of publishing with F1000Research: Your article is published within days, with no editorial bias You can publish traditional articles, null/negative results, case reports, data notes and more The peer review process is transparent and collaborative Your article is indexed in PubMed after passing peer review Dedicated customer support at every stage For pre-submission enquiries, contact research@f1000.com