Characterizing alpha helical properties of Ebola viral proteins as potential targets for inhibition of alpha-helix

Ebola, considered till recently as a rare and endemic disease, has dramatically transformed into a potentially global humanitarian crisis. The genome of Ebola, a member of the Filoviridae family, encodes seven proteins. Based on the recently implemented software (PAGAL) for analyzing the hydrophobicity and amphipathicity properties of alpha helices (AH) in proteins, we characterize the helices in the Ebola proteome. We demonstrate that AHs with characteristically unique features are involved in critical interactions with the host proteins. For example, the Ebola virus membrane fusion subunit, GP2, from the envelope glycoprotein ectodomain has an AH with a large hydrophobic moment. The neutralizing antibody (KZ52) derived from a human survivor of the 1995 Kikwit outbreak recognizes a protein epitope on this AH, emphasizing the critical nature of this secondary structure in the virulence of the Ebola virus. Our method ensures a comprehensive list of such `hotspots'. These helices probably are or can be the target of molecules designed to inhibit AH mediated protein-protein interactions. Further, by comparing the AHs in proteins of the related Marburg viruses, we are able to elicit subtle changes in the proteins that might render them ineffective to previously successful drugs. Such differences are difficult to identify by a simple sequence or structural alignment. Thus, analyzing AHs in the small Ebola proteome can aid rational design aimed at countering the `largest Ebola epidemic, affecting multiple countries in West Africa' ( ). http://www.cdc.gov/vhf/ebola/outbreaks/2014-west-africa/index.html 1,2 2 3


Introduction
The Ebola virus was first discovered in 1976 1 , and has been since known as a rare, but deadly disease 2 . However, the current outbreak in West African countries (Guinea, Liberia, Nigeria, Sierra Leone and Senegal) has rapidly deteriorated into a full blown epidemic 3 , and poses grave humanitarian dangers to these countries 4 . Ebola, along with the Marburg virus, belongs to the Filoviridae family 5 , and causes haemorrhagic fever 2 by quickly suppressing innate antiviral immune responses to facilitate uncontrolled viral replication 6 .
Interestingly, the genome of the Ebola virus encodes seven proteins 7 , although their extreme 'plasticity allows multiple functions' 8,9 . Protein structures are formed by well ordered local segments, of which the most prevalent are alpha helices (AH) and β sheets. AHs are right-handed spiral conformations which have a hydrogen bond between the carbonyl oxygen (C=O) of each residue and the alpha-amino nitrogen (N-H) of the fourth residue away from the N-terminal. AH domains are often the target of peptides designed to inhibit viral infections [10][11][12] . Recently, we have provided open access to software that has reproduced previously described computational methods 13 to compute the hydrophobic moment of AHs (PAGAL 14 ).
In the current work, we characterize the helices in the Ebola proteome using PAGAL, and demonstrate that the helices with characteristically unique feature values are involved in critical interactions with the host proteins. The PDB database is queried for the keyword 'Ebola', and the structures obtained are analyzed using DSSP (Define Secondary Structure of Proteins) 15 for identifying AHs. We process all PDB structures, and do not filter out redundant structures based on sequence. These helices are analyzed using PAGAL, and the results are sorted based on three criteria -hydrophobic moment and high proportion of positive or negative residues. The helices that are ranked highest in these sorting criteria are involved in critical interactions with either antibodies or host proteins. For example, the Ebola virus membrane fusion subunit, GP2, from the envelope glycoprotein ectodomain has an AH with the largest hydrophobic moment in all helices analyzed 16 . This helix is disrupted by a neutralizing antibody derived from a human survivor of the 1995 Kikwit outbreak, emphasizing the critical nature of this helix in the virulence of Ebola 17 . Another example, obtained by choosing the helix with the highest proportion of negatively charged residues, is the interaction between the human karyopherin alpha nuclear transporters C terminus and the Ebola virus VP24 protein (eVP24) 18 , which suppresses tyrosine-phosphorylated STAT1 nuclear import 19 . These helices probably are, or can be, the target of molecules designed to inhibit AH mediated protein-protein interactions 20 . Our method provides a comprehensive list of such targets. Further, each protein can be individually queried using PAGAL, and thus identified helices that might have a poor global rank, but still be critical in the particular proteins context.
Although, Ebola and Marburg viruses are members of the Filoviridae family 21 , they have different antigenicity of the virion glycoprotein 22 . These differences are probably the rationale for the lesser mortality observed in Marburg outbreaks. By comparing the AHs in proteins of Marburg and Ebola viruses, we are able to elicit subtle changes in the proteins that might render them ineffective against previously successful drugs. These differences are not apparent from a simple sequence or structural alignment. Thus, in the current work, we elucidate a simple methodology that can aid rational design of drugs and vaccine, an important aspect of the global effort to counter the deadly Ebola epidemic.

Materials and methods
We searched for the keyword 'Ebola' in the PDB database (Table 1). Subsequently, each protein was split based on the chain ID, resulting in 146 single chained proteins (See ALPHA.zip in Dataset 1). We have not reduced the set based on sequence similarity since the proteins might have different conformations based on their ligands. Note, this list might include non-Ebola proteins which might have been co-crystallized with the Ebola protein. However, they have been put through the same analysis since they might provide insights into the Ebola proteins themselves.
These proteins were then analyzed using DSSP 15 , and resulted in 758 helices in all (See ALPHA.zip in Dataset 1). These helices were then analyzed using PAGAL. The PAGAL algorithm has been detailed previously 14 . Briefly, the Edmundson wheel is computed by considering a wheel with centre (0,0), radius 5, first residue coordinate (0,5) and advancing each subsequent residue by 100 degrees on the circle, as 3.6 turns of the helix makes one full circle. We compute the hydrophobic moment by connecting the center to the coordinate of the residue and give it a magnitude obtained from the hydrophobic scale (in our case, this scale is obtained from 13 ). These vectors are then added to obtain the final hydrophobic moment.
The color coding is as follows: all hydrophobic residues are colored red, while hydrophilic residues are colored in blue: dark blue for positively charged residues, medium blue for negatively charged residues and light blue for amides.
The raw file generated by analyzing all 146 proteins through PAGAL is provided as PAGALRAW-DATA.txt (Dataset 1), and contains the hydrophobic moment, percent of positive charges and the total number of charged residues for every helix. These are then sorted based on the charge (negative or positive) or the hydrophobic moment. We ignore the helices that have none or a single one charged residue.
All protein structures were rendered by PyMol (http://www.pymol. org/). The sequence alignment was done using ClustalW 23 . The alignment images were generated using Seaview 24 . Protein structures have been superimposed using MUSTANG 25 . A PDB database search using the keyword 'Ebola' generate 146 single chained proteins, which were analyzed using Define Secondary Structure of Proteins, resulting in 758 alpha helices (ALPHA.zip). Note, this list might include non-Ebola proteins which might have been co-crystallized with the Ebola protein. These helices were analyzed using PAGAL (PAGALRAWDATA.txt), which details the hydrophobic moment, percent of positive charges and the total number of charged residues for every helix.

Helices with large hydrophobic moment
We began by analyzing the helices which have a large hydrophobic moment (hydrophobic scale is obtained from 13 ) ( Table 2). The Edmundson wheel for the helix 1EBOE.HELIX1 from the structure of GP2 from the Ebola virus membrane fusion glycoprotein (PDBid:1EBO) 16 is shown in Figure 1a. Figure 1b shows the residues comprising these helices (in magenta) in the apo form (PDBid:1EBO) 16 . This helix is disrupted by a neutralizing antibody derived from a human survivor of the 1995 Kikwit outbreak (PDBid:3CSY) 17 , emphasizing the critical nature of this helix in the virulence of Ebola (Figure 1c,d). Table 3 shows the residues in the The color coding is as follows: all hydrophobic residues are colored red, while hydrophilic residues are colored in blue: dark blue for positively charged residues, medium blue for negatively charged residues and light blue for amides. (b) Structure of PDBid:1EBOE, 1EBOE.HELIX1 is marked in magenta and the leucine zipper is in blue. (c) 1EBOE.HELIX1 is disrupted by an antibody derived from a human survivor of the 1995 Kikwit outbreak (PDBid:3CSY). (d) Gly553/N on 1EBOE.HELIX1 makes a possible hydrogen bond to Thr100/OG1 at a distance of 2.7 Å. Table 2. Identifying helices with unique properties. Property based on which the sorting is done is either the Hydrophobic moment (HM) and the percentage of negative (NEG) or positive residues (POS). HM: Hydrophobic moment, RPNR: Ratio of the positive to the negative residues, Len: length of the helix, NCH: number of charged residues, GP: glycoprotein from Ebola, VP24: Membraneassociated protein from Ebola, VP35: Polymerase cofactor.

Property
specified helix (residues 553-597, chain J, PDBid:3CSY) making possible hydrogen bonds with different residues in the human Fab KZ52 heavy chain (residues 1-228, chain A, PDBid:3CSY). Among all the interactions, only Gly553 is on 1EBOE.HELIX1 (at a distance of 2.7 Å from Thr100/OG1), although the others are sequentially proximal. These few interactions are sufficient to disrupt this helix, rendering the virus non-virulent, and leading to human recovery. The importance of interfacial hydrophobicity in viral proteins involved in host entry through membrane fusion has recently been discussed in details, and remains 'an underutilized therapeutic target' 26 . It is also interesting that the helix is also involved in a disulphide bond after its disruption (Cys556 and Cys511). 1EBOE. HELIX0 (Table 2) also has a high hydrophobic moment, but is actually an isoleucine zipper derived from GCN4 27 (Figure 1b).
Helices with high proportion of negatively charged residues. Identifying difference among different species We then analyzed the helices having a high proportion of negatively charged residues, sorted based on the length of the helix when the percentage of negatively residues are the same (Table 2). Figure 2a shows the Edmundson wheel for the helix 4U2XA.HELIX5 (which has only two charged residues -the basic E113 and D124), while Figure 2b,c shows this helix in the protein complex marked in magenta. Protein PDBid:4U2XD is the human karyopherin alpha nuclear transporter (KPNA) C terminus in complex with the Ebola virus VP24 protein (eVP24) 18 . eVP24 interferes with the immune response by selectively targeting tyrosine-phosphorylated STAT1 nuclear import 19 . It does not hinder the transport of other cargo that may be required for viral replication. 4U2XA.HELIX5 is responsible for forming the complex with the KPNA protein through a helix (4U2XD.HELIX9, in blue), and K481 from KPNA is in contact with D124 from eVP24 (distance between K481/NZ and D124/ OD2 is 3.98 Å). Their interaction is probably electrostatic, since the atoms have opposite charges. VP24 has also been shown to directly bind to STAT1, further compromising the immune response 28 .
The next helix having a high proportion of negatively charged residues (3FKEA.HELIX2) is from a VP35, a classic example of a moonlighting protein, that can be a component of the viral RNA polymerase complex, a viral assembly factor, or inhibitor of host interferon production 29 . We have not been able to identify a critical role for this helix in the protein from current literature. However, These differences are definitely encoded in the proteins expressed by these viruses, and the design of drugs and vaccines to counter them should take these differences into account.
Helices with high proportion of positively charged residues For helices having a high proportion of positively charged residues, we could not find any reference to the critical nature of the first helix (Table 2, 4U2XA.HELIX7). This helix is marked in yellow in Figure 2c. The second helix (3FKEA.HELIX1) is from VP35, which was discussed previously 29 . This helix spans residues 238-252 and includes Lys248 and Lys251, a basic patch which is '100% identical among members of the Ebola viral isolates' 29 , and Ala238, Gln241, Leu242, Val245, Ile246, Leu249 which interacts with a β sheet to create a hydrophobic subdomain 29 . This helix is marked in magenta in Figure 3b, and the Edmundson wheel is shown in Figure 3c.
Once again, we demonstrate that unique values of an AH is a strong indicator of its significance in the viral functionality.

Multifunctional/moonlighting
The multifunctional roles played by many of these Ebola proteins is probably due to stretches of intrinsically disordered regions within VP35 consists of several helices, and is reasonably conserved in the Marburg virus from the same Filoviridae family (42% identity, 58% similarity) (Figure 3a). Often, it is difficult to identify the regions of the protein that differ from a sequence or structural alignment (Figure 3b), in case one is interested in understanding different responses of the proteins to known drugs or even the immune system. Table 4 compares the characteristics of the helices in the VP35 from Ebola and Marburg (the helix numbering is offset by one, due to a small N-terminal helix in the Marburg protein (which might be due to crystallization technique differences and probably is not critical). Thus, we have numbered these helices using alphabets. It can be seen that most of the helices have the same properties, barring helices E and F, where the acidic residue is present in the E helix in Marburg and in the F helix in Ebola. These helices are marked in yellow in Figure 3b. Also, it can be seen that helix C, which has a high proportion of acidic residues in VP35, has a fewer number of those residues in Marburg. Marburg outbreaks (http:// www.who.int/mediacentre/factsheets/fs_marburg/en/) have been fewer in comparison to Ebola outbreaks (http://www.who.int/mediacentre/factsheets/fs103/en/). It is known that even for Ebola, the Zaire strain had a much higher mortality rate than the Sudan one 30 . the Zaire Ebola virus nucleoprotein 33 . It can be seen that 4QAZA. HELIX0 (residues 646-658) has a reasonably high hydrophobic moment (although it will not rank highly if we analyze all helices from the proteome), and also a high number of charged residues (Figure 4a,b). It has been observed that 'the side chains of Glu645, His646, Glu649, Lys684, Glu695, Glu709, Lys728 and Gln739 are partly disordered so that some or all of their atoms are not visible in the electron density' 33 . Glu645, His646, Glu649 are part of this helix, and are thus critical to the disorderedness of the protein, which is critical for its moonlighting roles. Note, that Glu has been observed to be the second most disorder promoting residue (after proline) 34 . Furthermore, Tyr652 and Leu656, which lie in this helix, are residues that have been hypothesized to be part of the proteinprotein interaction site involving this protein 33 .

Conclusions
The ability of a genome as small as the Ebola virus to inflict a dishearteningly high percentage of mortality in human subjects is a humbling experience in the context of the tremendous technological  the structure -'fuzzy objects with fuzzy structures and fuzzy functions' 31 . The conformational plasticity 9 and moonlighting abilities of these proteins are key determinants for immune evasion 32 .
The above examples have analyzed all helices from the Ebola proteome. However, it also possible to analyze the helices in a single protein, and probe those for unique features. Table 5 shows the values obtained from PAGAL for helices of the C-terminal domain of (a) (b) advancements achieved in the last few decades 3,4 . The Ebola virus potently suppresses the human immune response 2,6,35 by binding with key human proteins involved in the immune pathway 18 . These protein-protein interactions are often mediated through well structured secondary regions within the protein structures (alpha helices), and the design of molecules that inhibit these 'hotspots' 20,36 has been a well known strategy to develop drugs to counter bacterial and viral infections 10-12 . For example, synthetic peptides derived from the oligomerization domain of polymerase subunits has been shown to inhibit viral proteins 37,38 . On the other hand, there might exist other protein domains that might be exploited by non-native viral peptides to obstruct viral functionality. In the current work, we characterize alpha helices in the Ebola virus proteome using a recently implemented open access software (PAGAL) 14 , thus identifying potential targets for inhibition of the helix mediated interactions.
Through several examples, we demonstrate that helices with unique features are involved in interactions with host proteins (either antibodies from survivors, or proteins regulating the immune response). Further, we also provide an alternate way of analyzing differences in related proteins (from the Marburg virus) by focusing on the properties of corresponding helices. As future work, we intend to develop methodologies to design peptides that would target these 'hotspots' 36 . It has to be kept in mind that it has been a challenge to design small ligands that disrupt protein-protein interactions, and designers resort to several innovative techniques to overcome thermodynamic instability or proteolytic susceptibility 39-42 . These helices can essentially be epitopes 43,44 for developing antibodies against the virus 45,46 . Interestingly, ZMapp, a cocktail of three antibodies has shown reversion of advanced Ebola symptoms in non-human primates 47 , and uses only glycoprotein-specific epitope generated antibodies 44,48 . It is interesting to hypothesize that additions to this cocktail with antibodies derived from other epitopes (for example, 4U2XA.HELIX5 from VP24 that is involved in immune response suppression) could prove more effective. Thus, we provide a comprehensive list of potential targets from the small proteome of the Ebola virus that can directed rational design to quickly innovate therapies.

Data availability
F1000Research: Dataset 1. PAGAL analysis of Ebola-related alpha helices, 10.5256/f1000research.5573.d37453 49 Author contributions SC wrote the computer programs. All authors analyzed the data, and contributed equally to the writing and subsequent refinement of the manuscript.

Competing interests
No competing interests were disclosed.

Winfried Weissenhorn
Unit for Virus Host-Cell Interactions (UVHCI), Grenoble, France The authors suggest that they can identify alpha helices and predict their propensities to be targeted by small molecules. Their test case is the small Ebola virus genome, where several crystal structures are available.
First they compute the hydrophobic moment of identified helices with their previously published program PAGAL and classify them based on hydrophobicity, positive or negative charges. They conclude that helices with unique feature values are involved in host protein interaction.
Page 4: It is not correct to state that " this helix is disrupted by a neutralizing antibody derived from a human survivor …". HR1 or helix 1 from Gp2 is split into 4 small helices in the native GP structure and antibody binding prevents its refolding into the post fusion conformation represented by the Gp2 structure. Now one can argue that small molecules could interfere with the formation of the triple stranded coiled coil formed by HR1 in the post fusion structure. This needs to be clarified in the text.
Next they identified a charged helix in Vps24 that interacts with karyopherin. Why was this chosen? Because of the available structure? This helix contains only two charged residues and would not fall under the classification of carrying a high charge!
The third helices described in detail are from Vps35 and the authors identify several helices with carry charges, but no clear targets are discussed.
Page 6: The authors make a connection between the number of acidic residues in a helix from Ebola Vps35 compared to Marburg Vps35 and the frequency of outbreaks, which is a complete over interpretation of their data.
In summary the manuscript describes an interesting approach to identify or validate potential drug targets. However, the authors need to be more cautious in interpreting their results. Without any experimental validation their approach to link helical properties to protein interaction propensities is extremely weak. We would like to thank you for taking the time to review this paper, and for your suggestions to improve the manuscript. In the interim period, we have applied other computational methods 1 to correlate the different immunosuppressive and pathogenicity mechanisms in Ebola and Marburg viruses to variations in their structures/sequences 2 . Please find our detailed responses to your comments below.
The authors suggest that they can identify alpha helices and predict their propensities to be targeted by small molecules. Their test case is the small Ebola virus genome, where several crystal structures are available. First they compute the hydrophobic moment of identified helices with their previously published program PAGAL and classify them based on hydrophobicity, positive or negative charges. They conclude that helices with unique feature values are involved in host protein interaction. Page 4: It is not correct to state that this helix is disrupted by a neutralizing anti-body derived from a human survivor . HR1 or helix 1 from Gp2 is split into 4 small helices in the native GP structure and antibody binding prevents its refolding into the post fusion conformation represented by the Gp2 structure. Now one can argue that small molecules could interfere with the formation of the triple stranded coiled coil formed by HR1 in the post fusion structure. This needs to be clarified in the text.
We appreciate this point, ('KZ52 likely neutralizes by preventing rearrangement of the GP2 HR1A/HR1B segments and blocking host membrane insertion of the internal fusion loop' 3 ), and have made the correction.
Next they identified a charged helix in Vps24 that interacts with karyopherin. Why was this chosen? Because of the available structure? This helix contains only two charged residues and would not fall under the classification of carrying a high charge! VP24 came up in the sorted list since it has a 'high proportion of negatively charged residues', and not high charge. The proportion of charged residues is computed based on the total number of charged residues, and not the length of the helix. We could also create a category of high charge by combining the previous feature (high proportion) to high number of charged residues.
Our search criteria excludes AHs with zero or one charged residue. We had stated this in the Methods section -We ignore the helices that have none or a single charged residue. We also had a cutoff on the length of the AH as 10 -i.e. we are looking for reasonably long AHs -we had not mentioned this constraint. We have modified the Methods section to reflect this. An AH having just two similarly charged residues in a reasonably long AH (and none other) is relatively significant. For example, one charged residue in VP24 (D124) makes an electrostatic contact with human karyopherin, while the other one E113 makes a contact to Arg140 in another helix (α6) in VP24 2 .
The third helices described in detail are from Vps35 and the authors identify several helices with carry charges, but no clear targets are discussed.
We have stated that 'we have not been able to identify a critical role for this helix in the protein from current literature', which does not preclude the importance of these helices. This, in fact, highlights the ability of our method to extract helices that might be of significance, yet not probed sufficiently as targets. At the same time, it is also equally possible that this helix is not functionally significant.
Page 6: The authors make a connection between the number of acidic residues in ahelix from Ebola Vps35 compared to Marburg Vps35 and the frequency of outbreaks, which is a complete over interpretation of their data.
We agree with this criticism, and have made the corrections.
In summary the manuscript describes an interesting approach to identify or validate potential drug targets.
We appreciate the positive and encouraging note on our efforts to use computational methods to identify critical regions of interaction in the Ebola proteins, which could be easily extended to other organisms as well.
However, the authors need to be more cautious in interpreting their results. Without any experimental validation their approach to link helical properties to protein interaction propensities is extremely weak.
We hope that we have addressed your concerns by the changes that we have made. We also expect future results to corroborate some of our predictions, and will make the updates on the f1000 site (which their format allows us to). We sincerely hope that the manuscript will be found suitable in the modified form for publication.