Characterizing alpha helical properties of Ebola viral proteins as potential targets for inhibition of alpha-helix mediated protein-protein interactions

Ebola, considered till recently as a rare and endemic disease, has dramatically transformed into a potentially global humanitarian crisis. The genome of Ebola, a member of the Filoviridae family, encodes seven proteins. Based on the recently implemented software (PAGAL) for analyzing the hydrophobicity and amphipathicity properties of alpha helices (AH) in proteins, we characterize the helices in the Ebola proteome. We demonstrate that AHs with characteristically unique features are involved in critical interactions with the host proteins. For example, the Ebola virus membrane fusion subunit, GP2, from the envelope glycoprotein ectodomain has an AH with a large hydrophobic moment. The neutralizing antibody (KZ52) derived from a human survivor of the 1995 Kikwit outbreak recognizes a protein epitope on this AH, emphasizing the critical nature of this secondary structure in the virulence of the Ebola virus. Our method ensures a comprehensive list of such `hotspots'. These helices probably are or can be the target of molecules designed to inhibit AH mediated protein-protein interactions. Further, by comparing the AHs in proteins of the related Marburg viruses, we are able to elicit subtle changes in the proteins that might render them ineffective to previously successful drugs. Such differences are difficult to identify by a simple sequence or structural alignment. Thus, analyzing AHs in the small Ebola proteome can aid rational design aimed at countering the `largest Ebola epidemic, affecting multiple countries in West Africa' ( http://www.cdc.gov/vhf/ebola/outbreaks/2014-west-africa/index.html).


Amendments from Version 2
In the current version, we provide previous research that corroborates our hypothesis that helices with characteristically unique properties are involved in host protein interactions. Specifically, 3FKEA.HELIX2 from VP35, 4U2XA.HELIX7 from VP24 and 3FKEA.HELIX1 from VP35 are shown to have significance in the viral protein interactions. Figure 3d and Table 5 (with a corresponding shift in the subsequent table numbering) are additional in this version.

Introduction
The Ebola virus was first discovered in 1976 1 , and has been since known as a rare, but deadly disease 2 . However, the current outbreak in West African countries (Guinea, Liberia, Nigeria, Sierra Leone and Senegal) has rapidly deteriorated into a full blown epidemic 3 , and poses grave humanitarian dangers to these countries 4 . Ebola, along with the Marburg virus, belongs to the Filoviridae family 5 , and causes haemorrhagic fever 2 by quickly suppressing innate antiviral immune responses to facilitate uncontrolled viral replication 6 .
Interestingly, the genome of the Ebola virus encodes seven proteins 7 , although their extreme 'plasticity allows multiple functions' 8,9 . Protein structures are formed by well ordered local segments, of which the most prevalent are alpha helices (AH) and β sheets. AHs are right-handed spiral conformations which have a hydrogen bond between the carbonyl oxygen (C=O) of each residue and the alpha-amino nitrogen (N-H) of the fourth residue away from the N-terminal. AH domains are often the target of peptides designed to inhibit viral infections 10-12 . Recently, we have provided open access to software that has reproduced previously described computational methods 13 to compute the hydrophobic moment of AHs (PAGAL 14 ).
In the current work, we characterize the helices in the Ebola proteome using PAGAL, and demonstrate that the helices with characteristically unique feature values are involved in critical interactions with the host proteins. The PDB database is queried for the keyword 'Ebola', and the structures obtained are analyzed using DSSP 15 for identifying AHs. We process all PDB structures, and do not filter out redundant structures based on sequence. These helices are analyzed using PAGAL, and the results are sorted based on three criteria -hydrophobic moment and high proportion of positive or negative residues. The helices that are ranked highest in these sorting criteria are involved in critical interactions with either antibodies or host proteins. For example, the Ebola virus membrane fusion subunit, GP2, from the envelope glycoprotein ectodomain has an AH with the largest hydrophobic moment in all helices analyzed 16 . This helix has part of the epitope recognized by the neutralizing antibody (KZ52) derived from a human survivor of the 1995 Kikwit outbreak, emphasizing the critical nature of this helix in the virulence of Ebola 17 . Another example, obtained by choosing the helix with the highest proportion of negatively charged residues, is the interaction between the human karyopherin alpha nuclear transporters C terminus and the Ebola virus VP24 protein (eVP24) 18 , which suppresses tyrosine-phosphorylated STAT1 nuclear import 19 .
These helices probably are, or can be, the target of molecules designed to inhibit AH mediated protein-protein interactions 20 . Our method provides a comprehensive list of such targets. Further, each protein can be individually queried using PAGAL, and thus identify helices that might have a poor global rank, but still be critical in the particular proteins context.
Although, Ebola and Marburg viruses are members of the Filoviridae family 21 , they have different antigenicity of the virion glycoprotein 22 . By comparing the AHs in proteins of Marburg and Ebola viruses, we are able to elicit subtle changes in the proteins that might render them ineffective against previously successful drugs. These differences are not apparent from a simple sequence or structural alignment. Thus, in the current work, we elucidate a simple methodology that can aid rational design of drugs and vaccine, an important aspect of the global effort to counter the deadly Ebola epidemic.

Materials and methods
We searched for the keyword 'Ebola' in the PDB database (Table 1). Subsequently, each protein was split based on the chain id, resulting in 146 single chained proteins (See ALPHA.zip in Dataset 1). We have not reduced the set based on sequence similarity since the proteins might have different conformations based on their ligands. Note, this list might include non-Ebola proteins which might have been co-crystallized with the Ebola protein. However, they have been put through the same analysis since they might provide insights into the Ebola proteins themselves.
These proteins were then analyzed using DSSP 15 , and resulted in 758 helices in all (See ALPHA.zip in Dataset 1). These helices were then analyzed using PAGAL. The PAGAL algorithm has been detailed previously 14 . Briefly, the Edmundson wheel is computed by considering a wheel with centre (0,0), radius 5, first residue coordinate (0,5) and advancing each subsequent residue by 100 degrees on the circle, as 3.6 turns of the helix makes one full circle. We compute the hydrophobic moment by connecting the center to the coordinate of the residue and give it a magnitude obtained from the hydrophobic scale (in our case, this scale is obtained from 13 ). These vectors are then added to obtain the final hydrophobic moment.
The color coding is as follows: all hydrophobic residues are colored red, while hydrophilic residues are colored in blue: dark blue for positively charged residues, medium blue for negatively charged residues and light blue for amides.
The raw file generated by analyzing all 146 proteins through PAGAL is provided as PAGALRAWDATA.txt (Dataset 1), and contains the hydrophobic moment, percent of positive charges and the total number of charged residues for every helix. These are then sorted based on the charge (negative or positive) or the hydrophobic moment. We ignore the helices that have none or a single charged residue, and those that are smaller than 10 residues in length. The proportion of charged residues is computed based on the total number of charged residues, and not the length of the helix.
All protein structures were rendered by PyMol (http://www.pymol. org/). The sequence alignment was done using ClustalW 23 . The alignment images were generated using Seaview 24 . Protein structures have been superimposed using MUSTANG 25 . A PDB database search using the keyword 'Ebola' generate 146 single chained proteins, which were analyzed using Define Secondary Structure of Proteins, resulting in 758 alpha helices (ALPHA.zip). Note, this list might include non-Ebola proteins which might have been co-crystallized with the Ebola protein. These helices were analyzed using PAGAL (PAGALRAWDATA.txt), which details the hydrophobic moment, percent of positive charges and the total number of charged residues for every helix.

Helices with large hydrophobic moment
We began by analyzing the helices which have a large hydrophobic moment (hydrophobic scale is obtained from 13 ) ( Table 2). The Edmundson wheel for the helix 1EBOE.HELIX1 from the structure of GP2 from the Ebola virus membrane fusion glycoprotein (PDBid:1EBO) 16 is shown in Figure 1a. Figure 1b shows the residues comprising these helices (in magenta) in the apo form (PDBid:1EBO) 16 . The neutralizing antibody (KZ52) derived from a human survivor of the 1995 Kikwit outbreak (PDBid:3CSY) 17 recognizes an epitope on this AH, emphasizing the critical nature of this AH in the virulence of the Ebola virus (Figure 1c,d). The antibody most likely inhibits the rearrangement of GP2 segments, which abrogates the fusion of the internal loop in the host membrane 17 . Table 3 shows the residues in the specified helix (residues 553-597, chain J, PDBid:3CSY) making possible hydrogen bonds Table 2. Identifying helices with unique properties. Property based on which the sorting is done is either the Hydrophobic moment (HM) and the percentage of negative (NEG) or positive residues (POS). HM: Hydrophobic moment, RPNR: Ratio of the positive to the negative residues, Len: length of the helix, NCH: number of charged residues. GP: glycoprotein from Ebola, VP24: Membrane-associated protein from Ebola, VP35: Polymerase cofactor. (a) with different residues in the human Fab KZ52 heavy chain (residues 1-228, chain A, PDBid:3CSY). Among all the interactions, only Gly553 is on 1EBOE.HELIX1 (at a distance of 2.7 Å from Thr100/OG1), although the others are sequentially proximal. These few interactions are sufficient to bind to this helix, rendering the virus non-virulent, and leading to human recovery. The importance of interfacial hydrophobicity in viral proteins involved in host entry through membrane fusion has recently been discussed in detail, and remains 'an underutilized therapeutic target' 26 . 1EBOE.HELIX0 (Table 2) also has a high hydrophobic moment, but is actually an isoleucine zipper derived from GCN4 27 (Figure 1b).

Property
Helices with high proportion of negatively charged residues. Identifying difference among related species We then analyzed the helices having a high proportion of negatively charged residues, sorted based on the length of the helix when the percentage of negatively residues are the same (Table 2). Figure 2a shows the Edmundson wheel for the helix 4U2XA. HELIX5 (which has only two charged residues -the basic E113 and D124), while Figure 2b,c shows this helix in the protein complex marked in magenta. Note, that we exclude AHs with either zero or one charged residues (see Methods). Protein PDBid:4U2XD is The next helix having a high proportion of negatively charged residues (3FKEA.HELIX2) is from a VP35, a classic example of a moonlighting protein, that can be a component of the viral RNA polymerase complex, a viral assembly factor, or inhibitor of host interferon production 33 . This helix is part of the dsRNA-binding domain of VP35 that is involved in the formation of the asymmetric VP35 RBD dimeric interface in Reston Ebola virus through a hydrogen-bonding network of residues and a solvent molecule 34 . Interestingly, this helix is homologous (100.0% similar and 78% identity in 9 amino acid overlap) to helix '1A' of an ATP-dependent transcriptional activator 35 . This helix interacts with another '1B' helix from a different monomer in an anti-parallel fashion to facilitate dimerization.
VP35 consists of several helices, and is reasonably conserved in the Marburg virus from the same Filoviridae family (42% identity, 58% similarity) (Figure 3a). Often, it is difficult to identify the regions of the protein that differ from a sequence or structural   (Figure 3b), in case there is interest in understanding different responses of the proteins to known drugs or even the immune system. Table 4 compares the characteristics of the helices in the VP35 from Ebola and Marburg (the helix numbering is offset by one, due to a small N-terminal helix in the Marburg protein (which might be due to crystallization technique differences and probably is not critical). Thus, we have numbered these helices using alphabets. It can be seen that most of the helices have the same properties, barring helices E and F, where the acidic residue is present in the E helix in Marburg and in the F helix in Ebola. These helices are marked in yellow in Figure 3b. Also, it can be seen that helix C, which has a high proportion of acidic residues in VP35, has a fewer number of those residues in Marburg. The difference in the pathogenicity of these viruses are encoded in the structure of the expressed proteins, and the design of drugs and vaccines to counter virulence should take these differences into account.
Helices with high proportion of positively charged residues 4U2XA.HELIX7 from VP24 is a helix having a high proportion of positively charged residues (Table 2), and contains two (L147P and R154L) of three mutations (L147P, M71I and R154L) that sensitizes guinea pigs to the Zaire Ebola virus 36 . This helix is marked in yellow in Figure 2c. The second helix (3FKEA.HELIX1) is from VP35, which was discussed previously 33 . This helix spans residues 238-252 and includes Lys248 and Lys251, a basic patch which is '100% identical among members of the Ebola viral isolates' 33 , and Ala238, Gln241, Leu242, Val245, Ile246, Leu249 which interacts with a β sheet to create a hydrophobic subdomain 33 . This helix is marked in magenta in Figure 3b, and the Edmundson wheel is shown in Figure 3c. Recently, antifiloviral compounds were shown to bind and inhibit the polymerase cofactor activity of VP35 37 . Figure 3d shows one of the compounds (1D5) in complex with VP35 (PDBid:4IBFA). It can be seen that atoms in the compounds make hydrogen bonds with residues on the AH spanning residues 238-252 (Table 5). These structures were used to derive a receptorligand pharmacophore, which was found to have similar features to the ligand based pharmacophore derived from four FDA approved drugs that inhibit the Ebola virus 38 . Once again, we demonstrate that unique values of an AH is a strong indicator of its significance in the viral functionality.

Multifunctional/moonlighting
The multifunctional roles played by many of these Ebola proteins is probably due to stretches of intrinsically disordered regions within the structure -'fuzzy objects with fuzzy structures and fuzzy functions' 39 . The conformational plasticity 9 and moonlighting abilities of these proteins are key determinants for immune evasion 40 .
The above examples have analyzed all helices from the Ebola proteome. However, it also possible to analyze the helices in a single protein, and probe those for unique features. Table 6 shows the values obtained from PAGAL for helices of the C-terminal domain of the Zaire Ebola virus nucleoprotein 41 . It can be seen that 4QAZA. HELIX0 (residues 646-658) has a reasonably high hydrophobic moment (although it will not rank highly if we analyze all helices present in this proteome), and also a high number of charged residues (Figure 4a,b). It has been observed that 'the side chains of Glu645, His646, Glu649, Lys684, Glu695, Glu709, Lys728 and Gln739 are partly disordered so that some or all of their atoms are not visible in the electron density' 41 . Glu645, His646, Glu649 are part of this helix, and are thus critical to the disorderedness of the protein, which is critical for its moonlighting roles. Note, that Glu has been observed to be the second most disorder promoting residue (after proline) 42 . Furthermore, Tyr652 and Leu656, which lie in this helix, are residues that have been hypothesized to be part of the protein-protein interaction site involving this protein 41 .

Conclusions
The ability of a genome as small as the Ebola virus to inflict a dishearteningly high percentage of mortality in human subjects is a humbling experience in the context of the tremendous technological advancements achieved in the last few decades 3,4 . The Ebola virus potently suppresses the human immune response 2,6,43 by binding with key human proteins involved in the immune pathway 18 . These protein-protein interactions are often mediated through well structured secondary regions within the protein structures (alpha helices), and the design of molecules that inhibit these 'hotspots' 20,44 has been a well known strategy to develop drugs to counter bacterial and viral infections 10-12 . For example, synthetic peptides derived from the oligomerization domain of polymerase subunits has been shown to inhibit viral proteins 45,46 . In addition, there might exist other protein domains that might be exploited by non-native viral peptides to obstruct viral functionality. In the current work, we characterize alpha helices in the Ebola virus proteome using a recently implemented open access software (PAGAL) 14 , thus identifying potential targets for inhibition of the helix mediated interactions.
Through several examples, we demonstrate that helices with unique features are involved in interactions with host proteins (either antibodies from survivors, or proteins regulating the immune response). Further, we also provide an alternate way of analyzing differences in related proteins (from the Marburg virus) by focusing on the properties of corresponding helices. As future work, we intend to develop methodologies to design peptides that would target these 'hotspots' 44 . It has to be kept in mind that it has been a challenge to design small ligands that disrupt protein-protein interactions, and designers resort to several innovative techniques to overcome thermodynamic instability or proteolytic susceptibility 47-50 . These helices can essentially be epitopes 51,52 for developing antibodies against the virus 53,54 . Interestingly, ZMapp, a cocktail of three antibodies has shown reversion of advanced Ebola symptoms in non-human primates 55 , and uses only glycoprotein-specific epitope generated antibodies 52,56 . It is interesting to hypothesize that additions to this cocktail with antibodies derived from other epitopes (for example, 4U2XA.HELIX5 from VP24 that is involved in immune response suppression) could prove more effective. Thus, we provide a comprehensive list of potential targets within the small proteome of the Ebola virus that can directed rational design to quickly innovate therapies.
Author contributions SC wrote the computer programs. All authors analyzed the data, and contributed equally to the writing and subsequent refinement of the manuscript.

Competing interests
No competing interests were disclosed. The authors have responded to my previous questions and improved the manuscript. Although, the described computational approach can identify alpha helices with unique features, the author's proposal that the helical propensities can be linked to host protein interactions is rather weak and requires experimental data to validate the method.

Grant information
There are still a conceptual error in the manuscript: Page 4: "The antibody most likely inhibits the rearrangement of GP2 segments, which abrogates the ." fusion of the internal loop in the host membrane Do the authors mean that the antibody binding prevents Gp2 conformational changes required for membrane fusion?

I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
No competing interests were disclosed. We appreciate your positive comments to the changes we had made in accordance to your previous suggestions. By the statement `The antibody most likely inhibits the rearrangement of GP2 segments, which abrogates the fusion of the internal loop in the host membrane.', we meant the same thing as antibody binding prevents Gp2 conformational changes required for membrane fusion', which is the accepted hypothesis. We could not detect any difference in the meaning of the two wordings.
Please also find a new version of the manuscript which includes further corroboration of our hypothesis. best regards, Sandeep The authors suggest that they can identify alpha helices and predict their propensities to be targeted by small molecules. Their test case is the small Ebola virus genome, where several crystal structures are available.
First they compute the hydrophobic moment of identified helices with their previously published program PAGAL and classify them based on hydrophobicity, positive or negative charges. They conclude that helices with unique feature values are involved in host protein interaction.
Page 4: It is not correct to state that " this helix is disrupted by a neutralizing antibody derived from a human survivor …". HR1 or helix 1 from Gp2 is split into 4 small helices in the native GP structure and antibody binding prevents its refolding into the post fusion conformation represented by the Gp2 structure. Now one can argue that small molecules could interfere with the formation of the triple stranded coiled coil formed by HR1 in the post fusion structure. This needs to be clarified in the text.
Next they identified a charged helix in Vps24 that interacts with karyopherin. Why was this chosen? Because of the available structure? This helix contains only two charged residues and would not fall under the classification of carrying a high charge! The third helices described in detail are from Vps35 and the authors identify several helices with carry charges, but no clear targets are discussed.
Page 6: The authors make a connection between the number of acidic residues in a helix from Ebola Vps35 compared to Marburg Vps35 and the frequency of outbreaks, which is a complete over interpretation of their data.
In summary the manuscript describes an interesting approach to identify or validate potential drug targets. However, the authors need to be more cautious in interpreting their results. Without any experimental validation their approach to link helical properties to protein interaction propensities is extremely weak.
I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
No competing interests were disclosed. We would like to thank you for taking the time to review this paper, and for your suggestions to improve the manuscript. In the interim period, we have applied other computational methods to correlate the different immunosuppressive and pathogenicity mechanisms in Ebola and Marburg viruses to variations in their structures/sequences . Please find our detailed responses to your comments below. The authors suggest that they can identify alpha helices and predict their propensities to be targeted by small molecules. Their test case is the small Ebola virus genome, where several crystal structures are available. First they compute the hydrophobic moment of identified helices with their previously published program PAGAL and classify them based on hydrophobicity, positive or negative charges. They conclude that helices with unique feature values are involved in host protein interaction. Page 4: It is not correct to state that this helix is disrupted by a neutralizing anti-body derived from a human survivor . HR1 or helix 1 from Gp2 is split into 4 small helices in the native GP structure and antibody binding prevents its refolding into the post fusion conformation represented by the Gp2 structure. Now one can argue that small molecules could interfere with the formation of the triple stranded coiled coil formed by HR1 in the post fusion structure. This needs to be clarified in the text.
We appreciate this point, ('KZ52 likely neutralizes by preventing rearrangement of the GP2 ' ), and HR1A/HR1B segments and blocking host membrane insertion of the internal fusion loop have made the correction.
Next they identified a charged helix in Vps24 that interacts with karyopherin. Why was this chosen? Because of the available structure? This helix contains only two charged residues and would not fall under the classification of carrying a high charge! VP24 came up in the sorted list since it has a 'high proportion of negatively charged residues', and not high charge. The proportion of charged residues is computed based on the total number of charged residues, and not the length of the helix. We could also create a category of high charge by combining the previous feature (high proportion) to high number of charged residues.
Our search criteria excludes AHs with zero or one charged residue. We had stated this in the Methods section -We ignore the helices that have none or a single charged residue. We also had a cutoff on the length of the AH as 10 -i.e. we are looking for reasonably long AHs -we had not mentioned this constraint. We have modified the Methods section to reflect this. An AH having just two similarly charged residues in a reasonably long AH (and none other) is relatively significant. For example, one charged residue in VP24 (D124) makes an electrostatic contact with human karyopherin, while the other one E113 makes a contact to Arg140 in another helix (α6) in VP24 .
The third helices described in detail are from Vps35 and the authors identify several helices with carry charges, but no clear targets are discussed.
We have stated that 'we have not been able to identify a critical role for this helix in the protein from current literature', which does not preclude the importance of these helices. This, in fact, highlights the ability of our method to extract helices that might be of significance, yet not probed sufficiently as targets. At the same time, it is also equally possible that this helix is not functionally significant.
Page 6: The authors make a connection between the number of acidic residues in ahelix from Ebola Vps35 compared to Marburg Vps35 and the frequency of outbreaks, which is a complete over interpretation of their data. We agree with this criticism, and have made the corrections.
In summary the manuscript describes an interesting approach to identify or validate 3 2