In silico analysis of natural compounds targeting structural and nonstructural proteins of chikungunya virus

Background: Chikungunya fever presents as a high-grade fever during its acute febrile phase and can be prolonged for months as chronic arthritis in affected individuals. Currently, there are no effective drugs or vaccines against this virus. The present study was undertaken to evaluate protein-ligand interactions of all chikungunya virus (CHIKV) proteins with natural compounds from a MolBase library in order to identify potential inhibitors of CHIKV. Methods: Virtual screening of the natural compound library against four non-structural and five structural proteins of CHIKV was performed. Homology models of the viral proteins with unknown structures were created and energy minimized by molecular dynamic simulations. Molecular docking was performed to identify the potential inhibitors for CHIKV. The absorption, distribution, metabolism and excretion (ADME) toxicity parameters for the potential inhibitors were predicted for further prioritization of the compounds. Results: Our analysis predicted three compounds, Catechin-5-O-gallate, Rosmarinic acid and Arjungenin, to interact with CHIKV proteins; two (Catechin-5-O-gallate and Rosmarinic acid) with capsid protein, and one (Arjungenin) with the E3. Conclusion: The compounds identified show promise as potential antivirals, but further in vitro studies are required to test their efficacy against CHIKV.


Introduction
Chikungunya virus (CHIKV) is an alphavirus belonging to the Togaviridae family 1 . These are small, spherical, enveloped viruses that constitute a positive-sense single-stranded RNA genome of approximately 11.8 kb 2,3 . The genome encodes for five structural proteins (Capsid (CP), E3, E2, 6K and E1) and four nonstructural polyproteins (nsP1-4). Recently, CHIKV has spread widely and is the cause of a febrile illness of global concern with the potential to affect millions of people worldwide. As of 2016, Chikungunya fever has been identified in nearly 60 countries (WHO Chikungunya report; accessed 3 August 2017). Some recent outbreaks have been observed in Africa, Asia, Europe, islands in the Indian and Pacific Oceans, and recently on the Caribbean islands in America 4-7 . CHIKV infection is characterized by severe debilitating muscle and joint pain, and polyarthralgia, which persists for about 3-12 months and could last up to 1-3 years 8-10 . In some instances, severe CHIKV infection may cause neurological disorders and ocular manifestations [11][12][13] . Other symptoms include headache, myalgia, vomiting and rash 14,15 . Until now, there is no effective antiviral treatment, or vaccine, is commercially available for the treatment of CHIKV, and patients are treated symptomatically.
Studies on antivirals for chikungunya generally target the replication machinery (nsP2 and nsP3 proteins) 16-21 and surface receptors responsible for the binding of the virus during endocytosis (E1 and E2 proteins) 19,21 . Recent studies have shown that CHIKV is able to affect the central nervous system (CNS) like new world alphaviruses, such as Venezuelan equine encephalitis virus and Eastern equine encephalitis virus. Thus, it is important to evaluate CHIKV as a transition between new and old world viruses. Old world viruses use nsP2 to inhibit transcription of host proteins, whereas new world viruses have developed an alternative mechanism of transcription inhibition that is mainly determined by their CP protein 22 . Hence, CP could be an important target protein for potential antivirals. Up until now, the other structural protein of CHIKV, E3, has not been evaluated as a target for antivirals till now. E3 is the only protein in the CHIKV genome with a secretory signal.
Alphavirus CP is a multifunctional protein known to act as serine protease for self-cleavage and viral genomic RNA binding. It is also known to bind to other CP molecules during nucleocapsid formation, and interact with viral spike proteins during virion formation and budding 23 . CHIKV CP is 261 amino acids long protein and has a molecular weight of approximately 30kDa, and contains two major domains. N-terminal domain is positively charged and is involved in non-specific RNA binding, while the C-terminal domain regulates globular protease and acts as a binding site for the spike protein 24 . In addition, nuclear import export signals are present on the CP's amino acid terminal, forming immobile aggregations with nsP3 and E2 proteins of CHIKV 25 .
The structural protein E3 is approximately 6KDa, and is found not to be associated with the mature virion 2 . It serves as the signal sequence for the translocation of E3-E2-6K-E1 polyprotein into the endoplasmic reticulum, working in a clade-specific manner, and its cleavage from E2 is essential for virus maturation 26 . E3 also mediates pH protection of E1 during virus biogenesis via interactions strongly dependent on Y47 at the E3-E3 interface 27 .
In the present study, we performed an in silico analysis of proteinligand interactions of all CHIKV proteins using a natural compound library from MolBase to predict potential antiviral compounds for CHIKV infection. Our analysis predicted three compounds that interacted with CHIKV proteins (two with the E3 protein, and one with the CP), making them potential antiviral candidates against CHIKV.

Target identification and homology modeling
Structures of CHIKV proteins E1, E2, E3, nsP2 and nsP3 were downloaded directly from RCSB Protein Data Bank (PDB). For the rest of the CHIKV proteins, CP, 6K, nsP1 and nsP4, whose structures are unavailable, CHIKV sequences present in NCBI, belonging to ECSA (East/Central/South Africa) genotype were downloaded. These sequences were utilized to form a consensus sequence with MEGA 6 28 using clustalW pairwise multiple alignment algorithm with all parameters set at default. Using these consensus sequences, homologous proteins from the PDB were identified using Protein BLAST 29 where the algorithm parameters were as follows: Max target sequences=100, Expect threshold=10 using BLOSUM62 scoring parameters, Gap cost=Existence:11 & Extension:1 with conditional compositional score matrix adjustment. The suitable templates for nsP1 and CP with highest query coverage, sequence identity and lowest E-value were selected for homology modeling. For proteins 6K and nsP4, no templates were available, and thus these structures were created using threading and looping method (see next section).
The template and target sequences of all CHIKV proteins were then aligned using CLUSTALW 30 . MODELLER (version 9.16) was used to generate homology models 31 . Further, the homology model having the lowest MODELLER objective function (molpdf) or DOPE or SOAP assessment scores, or the one having highest GA341 score was selected as the best model structures and were further utilized for model validation. Nonstructural protein, nsP4, and the small accessory peptide of structural protein 6K did not have any template in PDB; therefore a threading and looping approach was implemented for them using LOMETS (Local Meta Threading Server) 32 . Both online server and standalone program present as a module of I-TASSER Suite version 5.1, which provides 3D models by combining alignment scores of template to target of 9 different threading programs (FFAS-3D, HHsearch, MUSTER, pGenTHREADER, PPAS, PRC, PROSPECT2, SP3, and SPARKS-X). All parameters were set as default. All structural and nonstructural CHIKV protein sequences were selected as potential drug targets.

REVISED
Validation of homology modeled structures Generated models were validated using MolProbity-(v4.3.1) 33 . Ramachandran plot analysis was performed for the best protein models by analyzing the phi (Φ) and psi (ψ) torsion angles. To check reliability of the modeled structures, the root mean square deviation (RMSD) was calculated by superimposing it on template protein structure using PyMOL (v1.7.0.0) visualization software 34 . Consistency between templates and the modeled structures were assessed by ProSA-web 35 (online server), a statistical analysis tool of all the proteins structures available at RCSB PDB. Here, a statistical average is obtained over the known structures with the help of combined potentials of mean force from the PDB database.

Molecular dynamic simulations
Stability of the domain regions of CHIKV protein structures was examined by molecular dynamics (MD) simulation using GROMACS (version 5.0) software package 36 . Optimized Potential for Liquid Simulations All-Atom 37 force field was used to energy minimize the structures. Through this energy minimization, the high-energy intramolecular interactions were discarded. In order to avoid the steric clashes, overall geometry and atomic charges were also optimized. The proteins were kept at the center of the rectangular box, which was filled with SPC water model system to create the same environmental behavior of the molecules. All the atoms of the protein and the boundary of the rectangular box were separated by a minimum distance of 10 Å. 0.01M NaCl was used as a solvent exposure.
The system was further energy minimized without any restraints for 50,000-time steps; the steepest descent having step size of 0.01 ps. Then the system was equilibrated to reach a stable temperature by conducting NVT ensemble. Pressure was further equilibrated by NPT ensemble performance. The long-range electrostatic interactions were calculated by using particle mesh Ewald 38 method with a cut-off of 0.9 nm for Vander Waals interactions. All the bonds were constrained by LINCS 39 , where only the water molecule moves to equilibrate with respect to protein structure keeping protein molecule as static. To couple the system Berendsen thermostat (V-rescale) and Parrinello-Rahman barostat were utilized to maintain the constant temperature (300 K) and pressure (1 bar). Further MD analysis was performed to observe structural changes and dynamic behavior of the protein by calculating RMSD, radius of gyration and root mean square fluctuation (RMSF) along with changes in temperature, pressure, density and total energy.
Virtual screening and molecular docking Simulated computational models of CHIKV proteins were prepared and their binding sites were predicted using SiteMap (Version 2.3, 2009, Schrödinger, LLC, New York, USA). These were then used to perform molecular docking. The protein preparation wizard was used to prepare CHIKV proteins and a natural remedies library from MolBase database was prepared using the LigPrep module 40 . Virtual screening of modeled proteins against a natural remedy library from MolBase was done by using GLIDE module in an Extra Precision (XP) mode (Version 5.5. 2009, Schrödinger, LLC). It produces the minimal ranks of inappropriate poses and determines the appropriate binding energy of the three dimensional (3D) structure of the protein along with a ligand 41,42 .

Analysis and output visualization of drug target and protein
After the completion of molecular docking, the docked poses were listed depending upon the respective docking scores. Glide Score (obtained using GLIDE Module of Schrödinger Software Suite 9.0) was used as an empirical scoring function to predict free energy for ligands binding to the receptor. The structure showing minimum binding energy was filtered and subjected for further analysis. The 3D conformation ligand receptor was analyzed using PyMOL 34 and Chimera 43 v1.10.1 visualization software.
Absorption, distribution, metabolism and excretion (ADME) screening and toxicity analysis Pharmacokinetics properties and percent human oral absorption values were further predicted for the potential lead molecules using QikProp module (Version 3.2, 2009, Schrödinger, LLC) 44 . Both the physically remarkable descriptors and pharmaceutically admissible properties were predicted for neutralized ligands by QikProp. The program predicts 44 different properties, including log P (octanol/ water), % human oral absorption in intestine (QP%) and predicted IC 50 value for blockage of HERG K+ channels (log HERG). The Lipinski's rule of five 45 , an important criteria for oral absorption, was evaluated for the acceptability of the compounds. In addition, admetSAR 46 v1.0 was used to calculated various attributes of the drugs, including the blood brain barrier (BBB), human intestinal absorption, Caco-2 permeable, aqueous solubility, P-gp substrate and inhibitor, CYP450 substrate and inhibitor, CYP IP, ROCT, HERG inhibition, and toxicity parameters. For Lipinski score calculations, the ligand in SMILE format was uploaded to QikProp. The physicochemical properties and Lipinski Rule of Five were also analyzed by PERL script, "CalculatePhysicochemicalProperties.pl" of MayaChemTools 47 .

Ligand-protein interaction studies
The protein-ligand complex interaction at the atomic level was analyzed using Maestro 11.0 (LLC Schrodinger 2016) 48 and LigPlot+ 49 v1.4.5. The protein and the docked ligand were merged together and uploaded to Maestro Suite vMaestro 11. Further, the "Ligand Interaction Diagram" option was selected to draw the proteinligand binding interactions in the 2D visualization workspace.

Results
In silico protein preparation, homology modeling and validation CHIKV consists of four nonstructural proteins (nsP1-nsP4), three structural proteins (E1-E3), along with two sub-pro regions 6K and CP, which makes a part of structural protein unit ( Figure 1). Structures of CHIKV proteins E1, E2, E3, nsP2 and nsP3 were downloaded directly from PDB (Figure 2a), and for other CHIKV proteins (CP, 6K, nsP1 and nspP4) homology modeling and threading and looping methods were utilized to predict their structures. For proteins with templates available, homology modeling was done with five models for every protein created based on sequence similarity using different model generation tools (MODELLER and LOMET), and validated by their internal scoring functions (molpdf,  DOPE, SOAP and GA341 scores). Further, ProSA Z-score for all modeled structures were calculated to analyze the quality of models based on the Cα positions. Individual validation and ProSA Z scores for top ranked models are given in Table 1 and their structures are given in Figure 2b. The top ranked models were also analyzed by Ramachandran plot (Figure 3). The Ramachandran plot shows the distribution of phi (ϕ) and psi (ψ) angles for each amino acid residues of the modeled structures. The respective percentages of the favored and allowed regions for all the residues of all those validated are also shown in Table 1.

Molecular dynamic simulation and analysis
Molecular dynamic simulations were employed to analyze the protein structure-function complexities, such as structural stability,  conformational flexibility and folding. Domain regions of the structures (Table 2) were simulated for 20 ns. Moreover, various parameters, such as temperature, pressure, density and total energy, were calculated to check the stability of these structures along with steric properties. Further, RMSD values for the backbone atoms of proteins were plotted against time of MD simulations. Average RMSD during the simulations was 22.93. Radius of gyration on the other hand also supports the stability and compactness of the proteins. The RMSF with respect to each residue depicts the flexibility of the proteins. Average RMSF during the simulations was 1.45. The RMSD, radius of gyration and RMSF plots for all CHIKV proteins are shown in Figure 4A-C. The resulting graphs contributed to protein modeling, as they show a constant RMSD deviation throughout the 20ns simulation except for a small deviation in E2 after 14ns. Depending upon these simulation parameters, the proteins showed conformational stability.

Molecular docking
Drug discovery relies heavily on molecular docking to understand the interactions between ligand/inhibitor and target protein 50 . In this study, we resorted to the docking of available protein structures (wherever applicable), as well validated, refined and simulated ligands, (1,3,6, -Trigalloyl-β-D-Glucose and Quercetin-3-rutinoside (Compound ID 164 and 153) were found to interact with all the proteins and were discarded from further analysis.  modeled proteins to screen against a natural remedy library from MolBase. The binding sites of all protein structures were predicted by SiteMap. The predicted binding pockets were further validated using Glide in XP mode. Top ten ligand/compounds having docking score (Glide Score) above -4, glide energy of -20 kcal mol -1 and potential energy of a considerable range were considered for the next level of screening. The combined results of all the docked ligand along with the glide energy and potential energy have been provided in Table 3. Of these, two       ADME analysis of all potential leads ADME screening was performed for all the top hits. Here, 44 various physically remarkable descriptors 51 and pharmaceutically admissible properties of the top four lead compounds for every CHIKV protein were calculated using QikPro-P (Table 4). The Lipinski's rule of five was further employed to evaluate oral absorption along with ADME. Compounds violating more than 2 Lipinski's rule of 5 were discarded from further analysis.
Compounds Catechin-5-O-gallate (Compound ID 42), Rosmarinic acid (Compound ID 151) and Agnuside (Compound ID 18) against CP; Mangiferin (Compound ID 122) and Arjungenin  CNS as well; therefore, compounds that were predicted to cross the BBB were also considered as potential antivirals. Of all the compounds considered for toxicity analysis using AdmetSAR, Arjunetin (Compound ID 10) was considered ineffective for oral consumption and is also carcinogenic. Also, Agnuside (Compound ID 18) and Mangiferin (Compound ID 122) were not considered as potential antivirals as they are predicted to have positive AMES toxicity (Table 5).
(Compound ID 12) against E3; and Arjunetin (Compound ID 10) against 6K were studied further in greater detail for their toxicity.

Toxicity analysis
The efficacy and unexpected toxicity of a drug to penetrate biological barriers, such as the intestinal wall or BBB, were considered as a prime determinant of the compounds taken forwards for toxicity tests. CHIKV is an old world virus, but is now seen to affect the The compounds that were judged to be potential antivirals were Catechin-5-O-gallate (Compound ID 42) and Rosmarinic acid (Compound ID 151) against CP and Arjungenin (Compound ID 12) against E3 structural protein of CHIKV. Thus, the ligand/ drug-protein interaction was studied for these three compounds to understand their interaction pattern and strength of interaction with the protein for their role as potential antivirals against CHIKV (Table 5).

Ligand protein interaction
A ligand protein interaction study was done for validated protein structures as discussed earlier. CP residues (Peptidase S3 domain) were predicted to bind to Catechin-5-O-gallate and Rosmarinic acid (Compound IDs 42 and 151, respectively) and E3 residues (Endopeptidase domain) bind to Arjungenin (Compound ID 12). The top docking conformation of Catechin-5-O-gallate showed a predicted binding energy of -6.26 kcal mol-1, whereas Rosmarinic acid and Arjungenin showed similar binding energy of -6.11 kcal mol-1 and -6.01 kcal mol-1, respectively. The binding energy (Glide Score) and the interaction energy (Potential, Vander Waals and Electrostatic) are shown in Table 3. The intermolecular hydrogen bonds and hydrophobic residues showing close contact between receptor proteins (CP and E3) and ligand (Compound ID 42, 151 and 12) are shown in Table 6 and Figure 6A-C, respectively.
The interaction result showed that most of the hydrogen bond donors are from the protein that bind to the acceptors on the respective ligands. The compound Catechin-5-O-gallate (Compound ID 42) binds to Glu260 and Lys141 residues (HBond distance of 2.57 and 2.93 Å) of the CP protein and forms hydrophobic interactions with Asp161, His139, Val140 and Trp261 residues (Figure 7a). Further 2-D workspace revealed that when the ligand-protein interactions were observed both in the presence and absence of solvent the compound Catechin-5-O-gallate binds to the CP protein, HIS139 forms the hydrogen backbone; GLU259, GLU260, ASP161 form the hydrogen side chain. The ligand forms hydrophobic interactions with TRP261, VAL140, LYS141 (Figure 7b). We were unable to acquire the Ligplot for the interaction of Rosmarinic acid (Compound ID 151) with CP protein as the coordinates were undetectable; however, using 2-D workspace, we identified that Rosmarinic acid binds to the CP protein, TRP261 forms the hydrogen backbone; GLU260, LYS141 form the hydrogen side chain. The ligand forms hydrophobic interactions with HIS139, VAL140, GLU259, ASP161 (Figure 7b). The third compound Arjungenin (Compound ID 12), binds with Arg63 and Gln52 residues (HBond distance of 2.72 and 3.11 Å) of the E3 protein and Ser18, His60 and Ser58 residues are involved in hydrophobic interactions (Figure 7a). Its 2-D workspace revealed that SER18, GLN49 form the hydrogen backbone; GLN52, SER58, ARG63 form the hydrogen side chain. The ligand forms hydrophobic interactions with PRO5, GLN19, ALA53, HIS60 (Figure 7b). Overall docking and interaction results for the best three natural compounds have been compiled in Table 7.    Rosmarinic acid (RA), a phenolic compound found in various Labiatae herbs 64 , possesses several properties, such as anti-inflammatory 65,66 and antioxidative, as it reduces liver injury induced by d-galactosamine 67 and lipopolysaccharides 68 . Besides these, RA acts as a potent antiviral agent against Japanese encephalitis virus (JEV), another alphavirus closely related to CHIKV. The study indicated that RA reduced viral replication within the brain along with the secondary inflammation resulting from microglial activation. These observations suggested that RA exhibited efficient antiviral as well as anti-inflammatory activity against Japanese equine encephalitis virus infection and hence was able to reduce disease severity 66 .
The compound Arjungenin, a popular triterpenoid isolated from Terminalia arjuna/ T. chebula, shows inhibitory effects on HIV-1 Protease 69,70 . Arjungenin has been previously used for a wide range of activities that includes anti-inflammatory, anti-microbial, anticancer and even anti-viral 71 , but no work has been done on this particular natural compound to date.

Conclusion
Treatment of chikungunya includes antipyretic drugs during the febrile stage and depends heavily on symptomatic relief during the chronic arthritic phase. Our present study has identified natural compounds that may be antiviral and might be good candidates as drugs for chikungunya treatment. Further in vitro validation is required for these compounds to provide insights into their mode of action against the different stages of chikungunya infection.

Data availability
All source data relating to this article can be found in Supplementary File 1. 1.
Overall comment: The article in the present form needs some revision including through checking of grammar and sentence structure. Further, I would request the authors to consider following points. Authors have written that "All bonds were constrained by LINCS". This claim needs to revisit. Generally, bonds involving hydrogen atoms are restrained using . LINCS Authors have mentioned that "In order to avoid the steric clashes, overall geometry and atomic charges were also optimized". I do not think authors have optimized the atomic charges. Rather atomic charges were obtained from the OPLS forcefield that was used in the simulations.
The cutoff used for PME was 0.9 Angstrom. This is very small. It should rather be 9 Angstrom.
Berendsen thermostat with V-rescale is not a good option for conformational sampling. A better option is Langevin dynamics.
Authors have written that "The system was further energy minimized without any restraints for 50,000-time steps; the steepest descent having step size 0f 0.01 ps". This claim needs to revisit. No time-step is required for energy minimization. Only to study dynamics, i.e, to integrate Newton's equation of motion, we need to set time-step.
It was not mentioned what time-step was used for the heating stage and production simulation.
Docking is not a good method for predicting the binding free energy. It is the least accurate method for estimating the binding free energy. However, it is a very good method for predicting the binding pose. Authors should have conducted MM-PBSA type analysis to estimate the binding free energy. Looking at Figure 4D, one can see that the for a couple of cases, longer simulation is required. The simulation length is too small (20 ns) to investigate the dynamics of proteins. They should extend all simulations to 50 ns. Authors have written that "RMSF with respect to each residue....". RMSF is calculated with respect to the average structure obtained from simulations. Do you use C-alpha atoms of each residue?

Are sufficient details of methods and analysis provided to allow replication by others? Yes
If applicable, is the statistical analysis and its interpretation appropriate? I cannot comment. A qualified statistician is required. Time-step of 20 ns was used for production simulations. Below is the parameter details Answer: provided in mdp file for production simulations. nsteps = 10000000; 2 * 10000000 = 20000 ps , 20 ns Docking is not a good method for predicting the binding free energy. It is the least accurate method for estimating the binding free energy. However, it is a very good method for predicting the binding pose. Authors should have conducted MM-PBSA type analysis to estimate the binding free energy.
The study was to screen the natural compounds against CHIKV structural and Answer: non-structural proteins and identify potential inhibitors. MM-PBSA for a large number of compounds was out of the scope of the present study and hence not pursued. Figure 4 A, B, C are not required. We abide by the reviewer's recommendation. The figures have been removed and Answer: changes figure is attached to the revised manuscript.
Looking at Figure 4D, one can see that the for a couple of cases, longer simulation is required. The simulation length is too small (20 ns) to investigate the dynamics of proteins. They should extend all simulations to 50 ns. Except for E2, every simulation has reached to a stable state as shown in the RMSD Answer: graph. Figure 4: Use bigger fonts. It's not readable. We apologize for this inconvenience. The figure has been changed.

Answer:
Calculate average RMSD and Radius of Gyration during the simulations. RMSD Average -22.93 Answer: Radius of Gyration Average -1. 45 Authors have written that "RMSF with respect to each residue....". RMSF is calculated with respect to the average structure obtained from simulations. Do you use C-alpha atoms of each residue? As one can see in graph 4F, the x-axis represents the atomic positions in the trajectory Answer: after fitting to a reference frame. Yes, C-alpha atoms of each residue have been used for the analysis.