PAGAL - Properties and corresponding graphics of alpha helical structures in proteins [version 1; peer review: 2 approved with reservations]

Alpha helices (AH) are peptide fragments characterized by regular patterns of hydrogen bonding between the carbonyl oxygen and amino nitrogen of residues regularly spaced in sequence, resulting in spiral conformations. Their preponderance in protein structures underlines their importance. Interestingly, they are invariably present in all anti-microbial peptides. For example, the cecropin component of the chimeric anti-microbial protein designed previously by our group comprises of two AHs linked by a short stretch of random coil. These anti-microbial peptides are often amphipathic (quantified by a hydrophobic moment), aligning hydrophobic residues on one surface and charged residues on the others. In the current work, we reproduce previously described computational methods to compute the hydrophobic moment of AHs - and provide open access to the source code (PAGAL). We simultaneously generated input files for TikZ (a package for creating high resolution graphics programmatically) to obtain the Edmundson wheel and showing the direction and magnitude of the hydrophobic moment, and Pymol scripts to generate color coded protein surfaces. Additionally, we have observed an empirical structural property of AHs: the distance between the Cα atoms of the ith and (i+4)th residue is equal to the distance between the carbonyl oxygens of the ith The authors aimed at further define features for helices, which are key structural elements in polypeptides. Based on the traditional hydrophobic moment, they proposed the concept of charge moment. It is interesting but I am a little bit of doubtful how useful the charge moment will be. In particular, the authors found the “Swapping one positive residue (R5) from the hydrophilic side with I7 and replacing it with a negative residue (D7), results in the same ‘charge moment’” (Figure 5). In the case of helical antimicrobial peptides, such a swap may have a detrimental effect on peptide antimicrobial activity. For example, even a change of a hydrophilic residue serine on the hydrophobic face with a hydrophobic residue influenced peptide activity (Wang G et al. , 2012). The authors may refine this idea and conceive the possible use of charge moment. Based on charge moment, is there any clue that interfacial charged residues of antimicrobial peptides play a larger role than non-interfacial ones in determining antimicrobial activity? Will it be possible to incorporate the observation that arginines are usually more important than lysines in determining peptide activity (see Mishra B et al. , 2013)? Competing Interests: No competing interests disclosed. I an of to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined

Any reports and responses or comments on the article can be found at the end of the article.

Introduction
A protein structure is formed by well ordered local segments defined by the hydrogen-bonding pattern of the peptide backbone (secondary structures), and conformations that lack any regular arrangement (random coils). The most prevalent secondary structures are alpha helices (AH) and β sheets, while other conformations like π-helix occur rarely in natural proteins 1 . AHs are right-handed spiral conformations which have a hydrogen bond between the carbonyl oxygen (C=O) of every residue and the alpha-amino nitrogen (N-H) of the fourth residue away from the N-terminal.
DSSP is the official program used to assign secondary structure to a protein when the atomic coordinates are known 2,3 . Several methods can also predict an AH from the sequence 4,5 . Essentially, any structure prediction tool can be used to predict an AH from the sequence by first predicting the structure and then applying DSSP to the predicted structure 6-8 .
The niche of AHs in protein structures is widespread. AHs are the functionally significant element in several motifs (DNA binding motifs) 9 , and the key components of any protein that permeates biological membranes 10 . AHs are also almost invariably present in antimicrobial peptides (AMP) 11 . For example, cecropin B, a component of a chimeric protein with anti-microbial properties that provides grapevines with enhanced resistance against the Gram-negative pathogen Xylella fastidiosa 12 , is composed of two AHs connected by a small random coil 13 . Other AMPs comprise only a single AH 14,15 . These peptides are characterized by a strong hydrophobic surface (defined by a hydrophobic moment 16 ), and often have charged residues, either anionic or cationic, aligned on the opposite surface 16 . Previously, Jones et al. have implemented computational methods to extract the characteristics of AHs 17 .
In the current work, we first observe and propose an empirical structural property of AHs: that the distance between the Cα atoms of the ith and (i+4)th residue is equal to the distance between the carbonyl oxygens of the ith and (i+4)th residue. This hypothesis is validated on a set of high resolution non-homologous 100 proteins (775 AHs) taken from the PISCES database 18 . Next, we implement the methodologies described previously 17 to compute the hydrophobic moments for AHs using the hydrophobicity scale used in 19 : PAGAL -Properties and corresponding graphics of alpha helical structures in proteins. There are other programs available online to do similar processing (http://rzlab.ucr.edu/scripts/wheel/ for example). We also specify a metric associated with each helix -the ratio of the positive to the negative residues (RPNR) in the AH -which helps identify AHs with a particular kind of charge distribution on their surface. The results are outputted as the input to a graphical program TikZ (for the Edmundson wheel 20 and hydrophobic moment), and Pymol scripts (for showing the peptide surface). The source code and manual available at http://github.com/sanchak/ pagal and on http://dx.doi.org/10.5281/zenodo.11136.

Materials and methods
We first outline the method to obtain the coordinates of each residue in the Edmundson wheel, and the computation of the hydrophobic moment (Algorithm 1). The input to the function is an alpha helixeither as a PDB structure or as a fasta sequence. The center of the wheel is taken as (0,0) and the radius as 5. The first residue has coordinates (0,5). Each subsequent residue is advanced by 100 degrees on the circle, as 3.6 turns of the helix makes one full circle.
To compute the hydrophobic moment, we obtain the vector by connecting the center to the coordinate of the residue and giving it a magnitude obtained from the hydrophobic scale (in our case, this scale is obtained from 17 ). These vectors are then added to obtain the final hydrophobic moment.
The results are outputted as the input to a graphical program TikZ (for the Edmundson wheel 20 and hydrophobic moment), and Pymol scripts (for showing the peptide surface). The protein structures have been rendered using Pymol, while the figures showing the Edmundson wheel has been obtained from TikZ. The source code is written in Perl, and made available at https://github.com/sanchak/pagal and permanently available on http://dx.doi.org/10.5281/ zenodo.11136.

Validation of empirical property
We have observed an empirical structural property that applies to the residues of any AH: the distance between the Cα atoms of the ith and (i+4)th residue (denoted by D(Cα i /Cα i+4 )) is (almost) equal to the distance between the carbonyl oxygens of the ith and (i+4)th residue (D(O i /O i+4 )). We validate our hypothesis on a set of 100 high resolution, non-homologous proteins (which have 775 AHs) taken from the PISCES database (http://dunbrack.fccc.edu/ PISCES.php) 18 . Figure 1 shows the plot of the difference between D(Cα i /Cα i+4 ) and D(O i /O i+4 ) for AHs specified in the PDB files (in red, mean=0.16 Å, standard deviation (sd)=0.34 Å), and for all residues separated by four residues but not part of a helix (in blue, mean=0.71 Å, sd=0.75 Å).
These results are conservative, since there are residues that are annotated as part of a helix in the PDB file which seems to be incorrect. For example, in PBD 1JET, the ninth helix spans from residues All residues separated by four residues but not part of a helix are in blue (mean=0.71 Å, sd=0.75 Å). All AHs specified in the PDB files after correction are in green (mean=0.095 Å and sd=0.14 Å). 169 to 178 -"HELIX 9 9 LYS A 169 LYS A 178 1 10". However, the Pymol helix identification program shows part of this stretch as a random coil (Lys178 in Figure 2a). Moreover, the distance between the carbonyl oxygen (C=O) and the alpha-amino nitrogen (N-H) of the fourth residue away from the N-terminal is 7.6 Å, which makes it improbable for them to have a hydrogen bond, the primary requisite to be part of a AH. The D(Cα i /Cα i+4 ) and D(O i /O i+4 ) for this pair is 9 Å and 8 Å, respectively: a difference of 1 Å. Even in cases where the distance between C=O and N-H is within the 3.6 Å typically required for a hydrogen bond, (PDBid: 1ELU, 12th helix), the distances D(Cα i /Cα i+4 ) and D(O i /O i+4 ) for the residue pair His292-Gly296 is 6.9 Å and 3.4 Å, respectively: a difference of 3.4 Å (Figure 2b). In short, the helix annotation in the PDB database is often incorrect. Removing these problematic residues reduces the mean distance to 0.095 Å and the sd to 0.14 Å (Figure 1).
There is variation in the D(Cα i /Cα i+4 ) even when considering the same pair of residues. For example, taking all pairs of Arg and Lys in the 775 AHs analyzed (Table 1), we see that the values can vary from 6.5 Å in PDBid:1H16 (helix26, pair Arg583-Lys587) to 5.8 Å in PDBid:1EYH (helix5, pair Arg72-Lys76). However, as hypothesized, D(O i /O i+4 ) is the same as D(Cα i /Cα i+4 ).

Edmundson wheel and the hydrophobic moment
The Edmundson wheel 20 has been the standard way of visualizing AHs for a long time now, although there are other methods (Wenxiang diagram 21 ) to represent AHs. The Edmundson wheel shows the alignment of residues as one looks through the helix, and gives an approximate idea of the various properties of the AH. For example, a color coding differentiation of the polar and non-polar residues gives an approximation of the hydrophobic propensity of the AH. A more mathematical representation of the hydrophobic propensity is to represent each residue with a value and a sign (direction). This results in a vector representation, called the hydrophobic moment 16 .
We have chosen the hydrophobic scale from 17 (Table 2), although any other hydrophobic scale could be also used. The color coding is as follows: all hydrophobic residues (positive values in Table 2 are colored red, while hydrophilic residues (negative values in Table 2) are colored in blue: dark blue for positively charged residues, medium blue for negatively charged residues and light blue for amides. We now show the PAGAL representation of a few AH peptides.

Cecropin.
A synergistic combination of two critical immune functions, pathogen surface recognition and lysis, resulted in a chimeric protein with anti-microbial properties against the Gram-negative Xylella fastidiosa 12 . The lytic domain is cecropin B, which attacks conserved lipid moieties and creates pores in the X. fastidiosa outer membrane 13 . Cecropin B consists of two AHs, joined by a short stretch of random coil. Figure 3a and b shows the Edmundson wheel and hydrophobic moment of the two AHs. It can be seen that the N-Terminal AH has a large hydrophobic moment, as well as a specific positive charge distribution. The hydrophobicity of this amphipathic AH has significant bearing on the anti-microbial properties of the peptide 22 . This can also be seen in a Pymol rendering of the peptide surface ( Figure 4). The Pymol script for this ren-

Residue Pair Dhbond D(Cα i /Cα i+4 ) D(O i /O i+4 ) δ
1E58.helix12 Arg188-Lys192 2.9 6.0 6.   dering is automatically generated by PAGAL. On the other hand, the C-Terminal AH comprises mostly of hydrophobic residues. Cecropinlike peptides use the synergy of these two helices -the N-terminal attaches to charged ion on the membrane, and the hydrophobic C-terminal permeates the hydrophobic inter-membrane region (known as the 'carpet' model 23 ).
Cathelicidin LL-37. Cathelicidin LL-37 is a critical component of the innate human immune system that protects humans against infectious diseases by targeting anionic phosphatidylglycerols in the pathogenic bacterial membranes 24 . Recent work has demonstrated a 12-residue peptide (KR-12) corresponding to residues 18 to 29 of LL-37 is toxic to bacterial, but not human cells 14 . Figure 3c shows the Edmundson wheel and hydrophobic moment of KR-12. The demarcation of the polar and non-polar residues is quite evident. The predominance of positively charged residues in the polar side of the peptide is also clearly visible.

De novo designed AMPs for plant protection.
The de novo design of small AMPs that inhibit plant pathogens was the focus of a recent work 15 . One of the most promising candidates was a small peptide (SP1-1 -RKKRLKLLKRL, Figure 3d), which was "highly active against a broad spectrum of bacteria, but showed low hemolytic activity" 15 . Although the hydrophobic moment of this peptide is much smaller than that of KR-12 (Figure 3c), possibly due to the presence of Arg4 on the hydrophobic surface, the distribution of positively charged residues in this peptide is greater than for KR-12.
Ratio of the positive to the negative residues (RPNR) Often, it is desirable to choose a large distribution of charged residues of a certain kind (anionic or cationic) on the hydrophilic surface. One possible method for quantifying this would be to compute a 'charge moment', similar to the computation of hydrophobic moments. However, such an evaluation would determine certain clearly distributions to be the same. For example, assume one semicircle of the wheel comprised only positive residues, and the other hydrophobic residues (Figure 5a). This is a slightly modified version of KR-12 from cathelicidin LL-37. If one positive residue (R5) were moved from the hydrophilic side to the hydrophobic side (I7) and replaced with a negative residue (D7) (Figure 5b), the 'charge moment' would remain the same, although the two conformations are clearly not the same. Note that the hydrophobic moment is also different, as expected. Thus, we resort to a simple metric to allow one to choose peptides with a large proportion of charged residue of a single kind: the ratio of the positive to the negative residues (RPNR). The two peptides mentioned above will have different RPNRs: 1 (Figure 5a) and 0.85 (Figure 5b). Output formats PAGAL generates a TikZ input file for drawing the Edmundson wheel and showing the hydrophobic moment (Supplementary File TikzInput.doc). TikZ is a package "for creating graphics programmatically" -http://www.texample.net/tikz/. PAGAL also generates a Pymol script to the peptide structure using the same color coding used in for the Edmundson wheel (Supplementary File PymolInput.doc).