The PDB database is a rich source of alpha-helical anti-microbial peptides to combat disease causing pathogens

The therapeutic potential of α-helical anti-microbial peptides (AH-AMP) to combat pathogens is fast gaining prominence. Based on recently published open access software for characterizing α-helical peptides (PAGAL), we elucidate a search methodology (SCALPEL) that leverages the massive structural data pre-existing in the PDB database to obtain AH-AMPs belonging to the host proteome. We provide in vitro validation of SCALPEL on plant pathogens ( Xylella fastidiosa, Xanthomonas arboricola and Liberibacter crescens) by identifying AH-AMPs that mirror the function and properties of cecropin B, a well-studied AH-AMP. The identified peptides include a linear AH-AMP present within the existing structure of phosphoenolpyruvate carboxylase (PPC20), and an AH-AMP mimicing the properties of the two α-helices of cecropin B from chitinase (CHITI25). The minimum inhibitory concentration of these peptides are comparable to that of cecropin B, while anionic peptides used as control failed to show any inhibitory effect on these pathogens. Substitute therapies in place of conventional chemotherapies using membrane permeabilizing peptides like these might also prove effective to target cancer cells. The use of native structures from the same organism could possibly ensure that administration of such peptides will be better tolerated and not elicit an adverse immune response. We suggest a similar approach to target Ebola epitopes, enumerated using PAGAL recently, by selecting suitable peptides from the human proteome, especially in wake of recent reports of cationic amphiphiles inhibiting virus entry and infection.


Abstract
The therapeutic potential of -helical anti-microbial peptides (AH-AMP) to α combat pathogens is fast gaining prominence. Based on recently published open access software for characterizing -helical peptides (PAGAL), we α elucidate a search methodology (SCALPEL) that leverages the massive structural data pre-existing in the PDB database to obtain AH-AMPs belonging to the host proteome. We provide validation of SCALPEL on plant in vitro pathogens ( , and Xylella fastidiosa Xanthomonas arboricola Liberibacter ) by identifying AH-AMPs that mirror the function and properties of crescens cecropin B, a well-studied AH-AMP. The identified peptides include a linear AH-AMP present within the existing structure of phosphoenolpyruvate carboxylase (PPC20), and an AH-AMP mimicing the properties of the two α -helices of cecropin B from chitinase (CHITI25). The minimum inhibitory concentration of these peptides are comparable to that of cecropin B, while anionic peptides used as control failed to show any inhibitory effect on these pathogens. Substitute therapies in place of conventional chemotherapies using membrane permeabilizing peptides like these might also prove effective to target cancer cells. The use of native structures from the same organism could possibly ensure that administration of such peptides will be better tolerated and not elicit an adverse immune response. We suggest a similar approach to target Ebola epitopes, enumerated using PAGAL recently, by selecting suitable peptides from the human proteome, especially in wake of recent reports of cationic amphiphiles inhibiting virus entry and infection.
This article is included in the channel. Ebola We have modified the manuscript based on the reviewers' comments, especially with respect to clarifying two aspects a) The fact that peptides extracted from native proteins will not elicit an immune response is a hypothesis -and needs to be verified.
b) The effectiveness of alpha helical peptides in combating cancer cells is not completely proven.
We have also added three authors in this version based on their inputs to this work, they had been inadvertently excluded in the first version.

Introduction
The abundance of alpha helical (AH) structures present within proteins bears testimony to their relevance in determining functionality 1 .
AHs are key components in protein-protein interaction interfaces 2 , DNA binding motifs 3 , proteins that permeate biological membranes 4 , and anti-microbial peptides (AMP) 5,6 . Not surprisingly, these AHs are the targets for antibody binding 7,8 and therapeutic agents 9 . These therapies in turn use AH peptides against both viral 10-12 and bacterial pathogens 13 .
Some AHs have unique characteristics, which are strongly correlated to their significance in the function of a protein 7 . For example, hydrophobic residues aligned on one surface (characterized by a hydrophobic moment 14 ), is critical for virus entry into host cells 15 , and in the permeabilizing abilities of AH-AMPs 16 . Often, AHs have cationic residues on the opposite side of the hydrophobic surface, which helps them target bacterial membranes 17,18 . We have previously implemented known methods 19 of evaluating these properties, and provided this as open source software (PAGAL) 20 . PAGAL was used to characterize the proteome of the Ebola virus 7 , and to correlate the binding of the Ebola protein VP24 21 to human karyopherin 22 with the immune suppression and pathogenicity mechanisms of Ebola and Marburg viruses 23 .
Plant pathogens, like Xylella fastidiosa (Xf) 24 , Xanthomonas arboricola (Xa) 25 and Liberibacter crescens (Lc) 26 are a source of serious concern for economic 27 and humanitarian reasons 28 . Specifically, we have been involved in developing novel strategies to counter the Pierce's disease causing Xf, having previously designed a chimeric protein with anti-microbial properties that provides grapevines with enhanced resistance against Xf 29 . Cecropin B (CECB) is the lytic component of this chimeric protein 30,31 . However, the nonnativeness of CECB raises concerns regarding its viability in practical applications 32 .
In an effort to replace CECB with an equivalent peptide from the grapevine/citrus genome, we present a design methodology to select AH-AMPs from any given genome -Search characteristic alpha helical peptides in the PDB database and locate it in the genome (SCALPEL). CECB consist of two AHs, joined by a small loop. The N-terminal AH is cationic and hydrophobic, while the C-terminal AH consists of primarily hydrophobic residues. Characterizing all available AHs from plant proteins in the PDB database allowed us to identify a peptide with a large hydrophobic moment and a high proportion of positively charged residues, present in both grapevine and citrus (our organisms of interest), mirroring the linear cationic CECB N-terminal AH. One such match was a twenty residue long AH from phosphoenolpyruvate carboxylase in sunflower 33 . The sequence of this peptide was used to find homologous peptides in the grapevine and citrus genome (PPC20). Subsequently, we used the SCALPEL algorithm to detect two contiguous AHs connected with a loop, mirroring the properties of CECB in a chitinase (CHITI25) from Nicotiana tobaccum (PDBid:3ALG) 34 . Subsequently, we demonstrate through bioassay experiments that PPC20 from the grapevine and citrus genome, and CHITI25 from the tobacco genome, inhibit Xf, Xa and Lc growth. The minimum inhibitory concentration of these peptides are comparable to that of CECB, while anionic peptides used as controls failed to show any inhibitory effect with these pathogens. Further, we observed variation in the susceptibility of the pathogens to these peptides.

In silico
The PDB database was queried for the keyword 'plants', and proteins with the exact same sequences were removed. This resulted in a set of ~2000 proteins (see list.plants.txt in Dataset 1). These proteins were analyzed using DSSP 35 to identify the AHs, and AHs with the same sequence were removed. This resulted in ~6000 AHs (see ALPHAHELICES.zip in Dataset 1). PAGAL was applied to this set of AHs (see RawDataHelix.txt in Dataset 1). This data was refined to obtain peptides with different characteristics. We also computed the set of all pairs of AHs that are connected with a short (less than five residues) loop (see HTH in Dataset 1). This set is used to extract a pair of AHs, such that one of them is cationic with a large hydrophobic moment, while the other comprises mostly of hydrophobic residues. The PAGAL algorithm has been detailed previously 20 . Briefly, the Edmundson wheel is computed by considering a wheel with centre (0,0), radius 5, first residue coordinate (0,5) and advancing each subsequent residue by 100 degrees on the circle, as 3.6 turns of the helix makes one full circle. We compute the hydrophobic moment by connecting the center to the coordinate of the residue and give it a magnitude obtained from the hydrophobic scale (in our case, this scale is obtained from Jones et al. 19 ). These vectors are then added to obtain the final hydrophobic moment. The color coding for the Edmundson wheel is as follows: all hydrophobic residues are colored red, while hydrophilic residues are colored in blue: dark blue for positively charged residues, medium blue for negatively charged residues and light blue for amides. All protein structures were rendered by PyMol (http://www.pymol.org/). The sequence alignment was done using ClustalW 36 . The alignment images were generated using Seaview 37 . Protein structures have been superimposed using MUSTANG 38 .

In vitro
Synthesized chemical peptides were obtained from GenScript USA, Inc. The protein molecular weight was calculated per peptide then diluted to 2000µM or 3000µM stock solutions with phosphate buffered saline. Stock solutions were stored in -20°C and thawed on ice before use.
Using the stock solutions, we made dilute solutions of 300µM, 250µM, 200µM, 150µM, 100µM, 75µM, 50µM, 30µM, 25µM, and 10µM to a final volume of 100µl of phosphate buffered saline. Dilute peptide solutions were stored in -20°C and thawed on ice before use.
Bacteria were inoculated and allowed to grow in liquid medium at 28°C: Xf (5 days), Xa (3 days), and Lc (3 days) to reach the exponential phase. The inoculum was diluted to a working OD of 0.5 (1×10 7 cells/ml). 10µl of the OD 0.5 was plated with 90µl of liquid media and spread on the pre-made agar plates to create a confluent lawn of bacteria. The bacteria were given an hour to set at room temperature. 10µl of each peptide concentration was spotted onto a plate of agar preseeded with a layer of bacterium. After spotting the plates were incubated at 28°C for 2 to 10 days till zones of clearance were clearly visible and the plates were scored for the minimum inhibitory concentration (MIC) as that beyond which no visible clearance was observed. Data presented is in triplicate, and were identical. Existing AH-AMPs: the positive controls Cecropin B (CECB) was used as a positive control, as it is known to target membrane surfaces and creates pores in the bacterial outer membrane 30,31 . CECB consists of an cationic amphipathic N-Terminal with a large hydrophobic moment (Figure 1a), and a C-Terminal comprising mostly of hydrophobic residues, which consequently has a low hydrophobic moment, (Figure 1b) joined by a short loop. Another positive control was a linear AH-AMP consisting of the residues 2-22 of the N-Terminal in CECB (CBNT21) (Figure 1a). The sequences of these are shown in Table 1. SCALPEL: Identifying native AH-AMP peptides from the host proteome Linear AH-AMPs. In order to choose a peptide mimicking CBNT21 (cationic, amphipathic, large hydrophobic moment), we directed our search to 'locate a small peptide with a large hydrophobic moment and a high proportion of positively charged residues' on the raw data computed using PAGAL (See RawDataHelix.txt in Dataset 1). A small peptide is essential for quick and cost effective iterations. Table 2 shows the best matching AHs. Next, we used the sequence of these AHs to search the grapevine and citrus genomes, choosing only those that are present in both genomes. This allowed us to locate an AH from phosphoenolpyruvate carboxylase from sunflower, a key enzyme in the C4-photosynthetic carbon cycle which enhances solar conversion efficiency (PDBid:3ZGBA.α11) 33 . Figure 2a shows the specific AH located within the protein structure, marked in green and blue. Although DSSP marks the whole peptide stretch as one AH, we chose the AH in blue due to the presence of a small π helix preceding that. We named this peptide PPC20 ( Figure 2, Table 1). This peptide is fully conserved (100% identity in the 20 residues) in both grapevine (Accession id:XP_002285441) and citrus (Accession id:AGS12489.1). Figure 2b,c shows the Pymol rendered AH surfaces of PPC20. The Asp259 stands out as a negative residue in an otherwise positive surface (Figure 2c). Since previous studies have noted dramatic transitions with a single mutation on the polar face, it would be interesting to find the effect of mutating Asp259 to a cationic residue 42 .

Dataset 1. Data used for SCALPEL search methodology to identify plant alpha helical - antimicrobial peptides in the PDB
Non-linear AH-AMPs consisting of two AHs. Next, we located two AHs within chitinase from Nicotiana tobaccum (PDBid:3ALGA. α4 and 3ALGA.α5) 34 connected by a short random coil such that one of the AHs is cationic and hydrophobic, while the other AH is comprised mostly of hydrophobic, uncharged residues (CHITI25, Figure 3a, Table 1). This peptide mimics the complete CECB protein (Figure 3b). While the properties of the AHs in CHITI25 is reversed from that of CECB, the order in which these AHs occur is not important for functionality. The multiple sequence alignment of CHITI25 from grapevine, citrus and tobacco is shown in Figure 3c. CHITI25 from tobacco is the most cationic (five), followed by citrus (four) and grapevine (three). Thus, it is possible that the antimicrobial properties of CHITI25 from grapevine would be lower  Table 2. Identifying AHs with cationic properties from plant proteins with known structures. All AHs in plant proteins are analyzed using PAGAL, and the data is pruned for AHs with a high proportion of positive residues, and finally sorted based on their hydrophobic moment. The first match is present in both grapevine and citrus (PDBid:3ZGBA.α11, which is a phosphoenolpyruvate carboxylase from sunflower). We ignored a small π AH in the beginning of this peptide comprising four residues. This peptide has been named PPC20. HM: Hydrophobic moment, RPNR: Relative proportion of positive residues among charged residues, Len: length of the α, NCH: number of charged residues.   than CHITI25 from tobacco. These peptides can be subjected to mutations to enhance their natural anti-microbial properties in such a scenario 43 .

Negative control -an anionic AH-AMP.
We also located an anionic AH-AMP using a similar strategy -a 13 residue peptide present within the structure of isoprene synthase from gray poplar (PDBid: 3N0FA.α18) 44 . We also used phosphate buffered saline as a negative control. We have extended this helix on both terminals by including one adjacent residue from both terminals to obtain ISS15 (Table 1).

In vitro results
We have validated our peptides using plating assays (Table 3, Figure 4). CECB, the well-established AH-AMP, is the most Table 3. Minimum Inhibitory Concentration of peptides tested (µM). It can be seen that CECB is the most efficient among all the peptides for all three pathogens, while the anionic ISS15 does not show any effect even at higher concentrations. However, while CHITI25 is almost as effective as CECB for Xf, it fails to inhibit Lc growth. Also, Xa is much more susceptible to these peptides compared to the other two pathogens. Finally, the anionic ISS15 has no effect on these pathogens. Data is in triplicate, and were identical. potent among all the peptides tested, having minimum inhibitory concentrations of between 25µM (for Xa) to 100µM (for Xf and Lc). This shows the variations in susceptibilities of different organisms. Understanding this differential susceptibility would require a deeper understanding of the underlying mechanism by which these AH-AMPs work 45 , as well as the difference in the membrane composition of these gram-negative pathogens 46 . Mostly, CBNT21 has a slightly lower potency, indicating a role for the C-terminal AH in CECB, which comprises of mostly hydrophobic residues for Xf and Lc. This results corroborates a plausible mechanism suggested by others in which the anionic membranes of bacteria is targeted by the cationic N-terminal, and followed by the insertion of the C-terminal AH into the hydrophobic membrane creating a pore. PPC20 and CHITI25 have comparable potencies with CECB and CBNT21, although Lc appears to be resistant to CHITI25. Finally, the anionic peptide used as a negative control shows no effect on these pathogens.

Discussion
The repertoire of defense proteins available to an organism is being constantly reshaped through genomic changes that confer resistance to pathogens. Genetic approaches aim at achieving the same goal of enhancing immunity through rational design of peptides 13,47 , which are then incorporated into the genome 29,31,48 . Also, it is important to ensure that these non-endogenous genomic fragments have minimal effect on humans for their commercial viability 32 . Identifying peptides from the same genome helps allay these concerns to a significant extent. The key innovation of the current work is the ability to identify peptides with specific properties (cationic AHs with a hydrophobic surface, linear or otherwise) from the genome of any organism of interest. Such peptides also present less likelihood of eliciting an adverse immune response from the host.

Alternate methods
Alternate computational methods for finding such new AMPs based on known AMPs could be of two kinds, although neither method is as effective in obtaining our results. Firstly, a sequence search using BLAST can be done to find a corresponding peptide in the genome, say for cecropin B. However, a BLAST of the cecropin sequence does not give any significant matches in the grapevine or citrus genomes, and is a dead end. In principle, what we need is a peptide with cecropin B like properties -and that information is not encoded in the linear sequence, but in the Edmundson wheel of the AH. The second method for such a search is to find structural homology in the PDB database through a tool like DALILITE 49 . However, AHs are almost indistinguishable structurally, and the results will give rise to many redundancies. Thus, there are no existing methods tailored to incorporate the quantifiable properties of AHs in the search. We, for the first time, have proposed such a method in SCALPEL.
Computer-assisted design strategies have also been applied in designing de novo AMPs 50,51 . Other hand curated comprehensive databases for 'for storing, classifying, searching, predicting, and designing potent peptides against pathogenic bacteria, viruses, fungi, parasites, and cancer cells' 52 do not enjoy the automation and vastness of available data elucidated in the SCALPEL methodology.

Limitations and future directions
There are several caveats to our study. We are yet to ascertain the hemolytic nature of the identified peptides, and will be performing these experiments in the near future. In fact, the selective cytotoxicity against human cancer cells, might be used as a substitute therapy in place of conventional chemotherapy 53,54 . It must be noted that the development of a selective peptide with anti-cancer cell properties has been a challenge 55 . Although, we have not measured the lipid permeabilizing abilities of our peptides, a recent study has found that potency in permeabilizing bacteria-like lipid vesicles does not correlate with significant improvements in antimicrobial activity, rendering such measurements redundant 56 . The electrostatic context of an peptide is known to have a significant bearing on its propensity to adopt an AH structure. The ability to predict the folding of peptides requires significant computational power and modelling expertise 57 . Peptides often remain in random coil conformations, and achieve helical structures only by interacting with anionic membrane models 58 . It is also possible to measure peptide helicity through circular dichroism spectroscopy 59 . However, our results have been all positive based on selected choices of peptides arising from our search results, and suggest a high likelihood of getting anti-microbial activity from these peptides. Additionally, we may have to resort to other innovative techniques that have been previously adopted to overcome thermodynamic instability or proteolytic susceptibility 60-63 .

Conclusion
To summarize, we establish the presence of a large number of AH-AMPs 'hidden' in the universal proteome. We have designed a methodology to extract such peptides from the PDB databasethe 'Big Data' center in proteomics. We demonstrate our results on well known plant pathogens -Xf, Xa and Lc. The feasibility of using such peptides in cancer therapies is also strong 54,64 . The ability to choose a peptide from the host itself is an invaluable asset, since nativeness of the peptide allays fears of eliciting a negative immune response upon administration. The problem of antibiotic resistance is also increasing focus on peptide based therapies 9,65 , since it is 'an enigma that bacteria have not developed highly effective cationic AMP-resistance mechanisms' 66 . Lastly, in face of the current Ebola outbreak 67,68 , we strongly suggest the possibility of developing peptides derived from the human genome to target viral epitopes, such as those enumerated for the Ebola virus recently 7 . A recent study has reported the inhibition of the Ebola virus entry and infection by several cationic amphiphiles 69 , suggesting the SCALPEL generated cationic peptides with the aid of cell penetrating peptides 70 could achieve similar results. Author contributions SC wrote the computer programs. MP performed the in vitro experiments. All authors analyzed the data, and contributed equally to the writing and subsequent refinement of the manuscript. 2.

5.
Paragraph 3. Clarify which humanitarian reasons are associated with the listed plant pathogens. The central theme of the manuscript is a new search algorithm, SCALPEL and its validation. Therefore, there is a need for reasoning the development of a new algorithm in light of the existing software(s), if any. In other words, what is the necessity of a new algorithm? The introduction is not crisp and balanced.

Materials and Methods:
In silico: Is their a need to list all 2000 proteins in Dataset identified using search 'plants'. Simply, it could be stated that as many plant proteins were analyzed. Need more details for SCALPEL vis-á-vis methodology.
In vitro: Why was Kanamycin added to PD3? Results: Figure 1a: An explanation is required for omitting the first 'K' in Edmundson wheel.
The main legend to Figure 1 "Edmundson wheel for AHs in the known AMPs that were used as control" is confusing. It does not look like the peptides corresponding to wheels c, d and e were the controls.
Non-linear AH-AMPs consisting of two AHs: "While the properties of the AHs in CHITI25 is reversed from that of CECB, the order in which these AHs occur is not important for functionality" is an overstatement unless proved.  Figure. The control ISS15 does not show pathogen growth inhibition at the maximum concentration tested. Thus, to conclude its MIC is >300 microM is a speculation.

Discussion:
The discussion is entirely focused on the search software that is hardly mentioned in the Introduction, Methodology, and Results. There is more focus on the application of the current study instead of the study itself. The readership would benefit more if the discussion included scrutiny and reaffirmation of the results with relevant literature and interpretation.
We have read this submission. We believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however we have significant reservations, as outlined above.
No competing interests were disclosed. Dear Dr Berjeaud, We would like to thank you for taking the time to review this paper. Please find our responses below.
Title and Abstract: The Title of the article, "The PDB database is a rich source of alpha-helical anti-microbial peptides to combat disease causing pathogens", is appropriate for the content of the article. However because the detected peptides were solely tested toward plant pathogens the term of "plant" in the title could be considered.
We are in the process of testing these peptides on other pathogens. Furthermore, considering that the mechanism of the inhibitory effect of these peptides is independent of the pathogen host, we are quite confident of replicating our results on other 'non-plant' pathogens.
The abstract represent rather well the work presented in the article except the two last sentences which are expectations of the authors but were not studied in this work. Particularly the sentence : "The use of native" assert that peptide structure extracted from native proteins will be without adverse effect to the host but it has to be proved in my opinion.
We believe this is an important hypothesis, although unproven, which differentiates SCALPEL from other methods of identifying anti-microbial peptides. However, we have modified the statement in the abstract to make this less of an assertion, and more of a hypothesis.
Article content: The paper describes the use of a software and an in silico method developed by the authors to screen the protein database (PDB) to find new antimicrobial peptides on the basis of the secondary structures of these peptides. The method is innovative and very interesting. Moreover the authors proved the efficacy of their method as they synthesized peptides from portions of protein sequences presenting secondary structures resembling cecropin B and demonstrated the antimicrobial activity of these new peptides. We appreciate the positive comments.
However I did not understand why they used an anionic peptide as the negative control. Indeed it is well known that the global positive charge of the peptides is required for their initial stacking on the membrane of target cells. Thus it is predictable that any anionic peptide will be inactive. We have used this as a negative control. If our experimental setup in the process of adding the peptides had any undesired conditions which was inhibiting the pathogens, this anionic peptide would show positive results. So, this is slightly different from a null negative control, as it involves adding a peptide.
Conclusions: The main problem concerns the conclusions of the article. Indeed there are not sufficient experimental evidences in the article to assert that the alpha-helical peptides with antimicrobial activity have a strong potency to act toward cancer cells. In my experience I tested several alpha-helical peptides which were solely antimicrobial.
F1000Research my experience I tested several alpha-helical peptides which were solely antimicrobial. Thus I strongly suggest to moderate this conclusions part of the manuscript.
We appreciate your concern regarding the lack of confirmatory evidence of AH peptides as anti-cancer therapeutics. However, this continues to be an active front in research, and we hope that SCALPEL will provide further avenues for testing this hypothesis. We have cited two recent papers-and http://www.ncbi.nlm.nih.gov/pubmed/25270878 http://www.ncbi.nlm.nih.gov/pubmed/24101917 in this context. Also, we have modified the text to reflect the lack of conclusiveness in such studies.Once again, we are thankful for your insightful comments, and hope to have addressed your concerns. Thanking you, Sincerely, Sandeep Chakraborty Plant Sciences Department, University of California, Davis, CA 95616, USA.
No competing interests were disclosed. Competing Interests: