Pangenome guided pharmacophore modelling of enterohemorrhagic <i>Escherichia coli sdiA</i>

DJ Darwin Bandoy

doi:10.12688/f1000research.17620.1

Home Browse Pangenome guided pharmacophore modelling of enterohemorrhagic Escherichia...

ALL Metrics

Views

Downloads

Get PDF

Get XML

Export

▬

✚

Research Note

Pangenome guided pharmacophore modelling of enterohemorrhagic Escherichia coli sdiA

[version 1; peer review: 1 approved with reservations, 1 not approved]

DJ Darwin Bandoy

PUBLISHED 09 Jan 2019

Author details Author details

Department of Veterinary Paraclinical Sciences, University of the Philippines Los Baños, Los Baños, Laguna, 4031, Philippines

DJ Darwin Bandoy
Roles: Conceptualization, Data Curation, Formal Analysis, Funding Acquisition, Investigation, Methodology, Project Administration, Resources, Software, Supervision, Validation, Visualization, Writing – Original Draft Preparation, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

Abstract

Enterohemorrhagic Escherichia coli (EHEC) continues to be a significant public health risk. With the onset of next generation sequencing, whole genome sequences are a potential resource for predictive modelling of the different regulatory mechanism of pathogens, particularly quorum sensing. We used a pangenome approach to determine EHEC genome clustering, determine the synonymous and nonsynonymous mutations across the EHEC sdiA and modelled the associated amino acid changes. Across the EHEC population, nonsynonymous variants are notably absent in ligand binding site for quorum sensing, indicating that population wide conservation of sdiA ligand site can be targeted for potential prophylactic purposes. Applying pathotype-wide pangenomics as a guide for determining evolution of pharmacophore sites is a potential approach in drug discovery.

Keywords

pangenome,pharmacophore,EHEC, Escherichia coli

Corresponding author: DJ Darwin Bandoy

Competing interests: No competing interests were disclosed.

Grant information: This research was funded by the University of the Philippines Enhanced Creative Work and Research Grant.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright: © 2019 Bandoy DD. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Bandoy DD. Pangenome guided pharmacophore modelling of enterohemorrhagic Escherichia coli sdiA [version 1; peer review: 1 approved with reservations, 1 not approved]. F1000Research 2019, 8:33 (https://doi.org/10.12688/f1000research.17620.1) First published: 09 Jan 2019, 8:33 (https://doi.org/10.12688/f1000research.17620.1) Latest published: 01 Sep 2020, 8:33 (https://doi.org/10.12688/f1000research.17620.3)

Introduction

One of the more prominent strain of Escherichia coli is the enterohemorrhagic (EHEC) pathotype associated with global outbreaks of bloody diarrhea and hemolytic uremic syndrome¹. EHEC is one of the classical of examples of one health disease as the interface of animal health spills into human health. Within the cattle reservoir, sdiA gene is required by E. coli to survive the acidic rumen environment. SdiA is used by E. coli to sense acyl homoserine in a quorum sensing system². However, it is considered as an orphan as the cognate acyl homoserine synthase is absent, and hence sdiA is considered an environmental sensor to sense the nearby microbial community. SdiA is stabilized by acyl homoserine lactone and acts as transcription factor glutamate decarboxylase needed for survival in the acidic environment. Hence blocking the ability of EHEC to survive the acidic ruminal environment is a proposed mechanism to control shedding in the cattle reservoir.

Whole genome sequencing of bacterial pathogens, particularly EHEC, is quickly transforming the workflows of epidemiological investigations. However, most bioinformatic pipelines used in clinical investigation use data reduction of genomes and artificially reducing diversity due to comparison of a limited number of housekeeping genes³. While wgMLST attempts to increase the number of genes, the assignment of a single reference genome appears to be inadequate in light of the pangenome. Various studies have shown that a significant number of genes that are present to the entire universe of genes within a species is missed for variant calling if only a single reference gene is used⁴. In this study we applied multi-scale approach to generate genome wide clustering using pangenome phylogeny using genome wide gene presence absence variation (PAV). We used this genome wide clusters as guide in generating the pharmacophore of sdiA as potential design strategy to control shedding by reducing the acid survival in the rumen.

We applied the concept of the pangenome, which represents the entirety of the genes that are present within a species, to a pathotype level. The EHEC pangenome represents the combination of genes seen in the EHEC pathotype. While the pangenome of E. coli was published in 2008 contained 8 genomes, we generated an updated EHEC pangenome with 153 genomes. The pangenome enables clustering of isolates using gene presence and absence. We used the genome wide comparison to generate clusters and genomic clade specific pharmacophore model of sdiA. This strategy enables to capture the pangenome wide variation of sdiA and ensures all conserved variants are targeted by the drug discovery pipeline enabling a pangenome to pharmacophore approach.

Methods

EHEC population

Whole genome sequences with the associated EHEC metadata was downloaded from Patric Database 3.5.28 using the keyword search for E. coli as organism and EHEC as the pathotype within the E. coli species^5,6. This resulted to 196 results and 152 of which are closed genome and draft genomes while the remaining sequences are EHEC associated plasmids (Underlying data: Metadata from Patric Database of EHEC E. coli pangenome⁷).

EHEC pangenome

The genomes were annotated with Prokka 1.13.3 as per published protocol⁸. Gff files were extracted as input for the pangenome pipeline Roary 3.11.2 using the following parameters for not splitting paralogs (roary -s -p 32 *.gff) and the resulting presence absence matrix (Underlying data: EHEC E. coli pangenome presence absence matrix⁷) together with the accessory genome phylogeny visualized in Phandango 1.3.0 and is represented as Figure 1B^9,10. Each blue bar represents an individual gene and solid blue blocks represent gene clusters. The accessory genome-based phylogeny newick file was visualized using iTol 4.3^11–13.

Allelic variant calling

Snippy variant calling pipeline 4.3.5 was used to determine the synonymous and nonsynonymous protein mutations using sdiA of Escherichia coli O157:H7 str. Sakai as reference. The –contigs option was added to the standard commandline (snippy –outdir –ref sdiA_sakai.gbk). The resulting individual variants of sdiA was merged into EHEC E. coli sdiA variant calling data (Underlying data⁷).

In silico sdiA protein modelling

SdiA genes were extracted from the pangenome output of Roary and protein in silico modelling performed using SWISS-MODEL^14–18. SdiA protein sequences were used as targets to search for protein templates within the SWISS-MODEL library. Model selection was based on the template with the highest quality prediction by the target-template alignment.

Results and discussion

Three main genomic clades were generated using the pangenome PAV of genes (Figure 1A). Clade I includes the genomes isolated from the earliest reported outbreak in 1982 as well as the more recent cases. Clade II includes genomes from European (predominantly Czech and German) enterohemorrhagic E. coli isolates (EHEC) belonging to the O26 serotype, while Clade III includes cluster of cases associated with acute renal failure in children. Clade III is associated with hybrid pathotype of serogroup O80 has aside from Shiga toxin, an extra-intestinal virulence plasmid (pS88), is currently emerging in France. Specific gene clusters are found in cluster 1 and noticeably absent in cluster III, particularly prophages and associated genes (Underlying data: Table 2, Figure 1B). This indicates that specific genome wide gene presence and absence can be used for functional cluster instructive of the underlying epidemiological dynamics. The acquisition of genomic islands unique to individual isolates are well defined in the pangenome gene presence absence matrix (Figure 1B). The core genome is 2145 (Table 1) and total gene count within the EHEC pangenome is 17152, which is not that far off the total E. coli pangenome 22,000¹⁹. This enormous difference between the core gene and total gene highlights the variation between the different isolates, which can be strain specific and individual isolate specific as indicated by the pangenome data.

Figure 1A. EHEC E. coli pangenome (152 genomes) accessory genome phylogeny showing three major clades.

Figure 1B. EHEC E coli pangenome (152 genomes) showing genomic diversity with the presence absence variation (PAV) matrix.

Table 1. EHEC E coli pangenome (152 genomes) metrics.

	Percentage Occurrence	Gene Number
Core genes	(99% <= strains <= 100%)	2145
Soft core genes	(95% <= strains < 99%)	1308
Shell genes	(15% <= strains < 95%)	3085
Cloud genes	(0% <= strains < 15%)	10614
Total genes	(0% <= strains <= 100%)	17152

All of the examined genomes contain sdiA gene, which is contained within the 2145 core genome. Using sdiA of Escherichia coli O157:H7 str. Sakai for variant calling, we identified the nonsynonymous mutations that are widely distributed across the EHEC isolates (Figure 3). 45.0% (49/109) of the nonsynonymous mutation is due to conversion of arginine to lysine at position 189 of sdiA (Figure 2A). This amino acid is located with the α-6 domain, adjacent to the amino acid clusters associated with sdiA dimerization. Previous protein modelling determined role of the guanidinium group of arginine which enables interactions in three different directions enabling a more complex electrostatic interaction versus lysine as well as the higher pKa value in arginine that can yield a more stable ionic interaction compared to lysine²⁰. The arginine to lysine mutations are found mostly in clade III isolates. The second ranked nonsynonymous mutation asparagine to serine at amino acid position 101 with 17.4% (19/109) which is located adjacent to η-4 phenylalanine which is associated with the ligand and distributed among the different clades (Figure 2B). β-5 domain alanine to threonine change at amino acid position 140 is the third ranked nonsynonymous mutation with 9.2 % (10/109) (Figure 2C). The fourth ranked nonsynonymous mutation is at amino acid position 138 with the conversion of cysteine to serine (Figure 2D) also within the β-5 domain. Both nonsynonymous variants of amino acid position 140 and 138 are found within clade II and III. None of the highly ranked nonsynonymous mutations impact the ligand interaction, indicating the conservation of the sdiA motif across the population in geographic and temporal distribution, which suggests the possibility of targeting sdiA for quorum sensing inhibition.

Figure 2. sdiA nonsynonymous variants protein modelling.

Figure 3. EHEC E coli sdiA allelic variants showing synonymous and nonsynonymous mutations.

Conclusion

While EHEC pangenome is remarkably diverse, the allelic variants of sdiA, particularly nonsynonymous mutants, indicate the conservation of quorum sensing domain, indicating that targeting this structure can be effective across the different lineages of EHEC pathotype.

Data availability

All underlying and extended data available from Open Science Framework: Supplemental Data for Pangenome guided pharmacophore modelling of enterohemorrhagic Escherichia coli sdiA, https://doi.org/10.17605/OSF.IO/BNZ85⁷

Data are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication).

Underlying data

Table 1 Metadata from Patric Database of EHEC E. coli pangenome

Table 2 EHEC E. coli pangenome presence absence matrix

Table 3 EHEC E. coli sdiA variant calling data

Extended data

SWISS-MODEL Homology Modelling Report available at osf.io/bnz85.

Grant information

This research was funded by the University of the Philippines Enhanced Creative Work and Research Grant.

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Faculty Opinions recommended

References

1. Rohde H, Qin J, Cui Y, et al.: Open-source genomic analysis of Shiga-toxin-producing E. coli O104:H4. N Engl J Med. 2011; 365(8): 718–724. PubMed Abstract | Publisher Full Text
2. Sperandio V: SdiA sensing of acyl-homoserine lactones by enterohemorrhagic E. coli (EHEC) serotype O157:H7 in the bovine rumen. Gut Microbes. 2010; 1(6): 432–435. PubMed Abstract | Publisher Full Text | Free Full Text
3. Maiden MC, Bygraves JA, Feil E, et al.: Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc Natl Acad Sci U S A. 1998; 95(6): 3140–3145. PubMed Abstract | Publisher Full Text | Free Full Text
4. Méric G, Yahara K, Mageiros L, et al.: A reference pan-genome approach to comparative bacterial genomics: identification of novel epidemiological markers in pathogenic Campylobacter. PLoS One. 2014; 9(3): e92798. PubMed Abstract | Publisher Full Text | Free Full Text
5. Antonopoulos DA, Assaf R, Aziz RK, et al.: PATRIC as a unique resource for studying antimicrobial resistance. Brief Bioinform. 2017; bbx083. PubMed Abstract | Publisher Full Text
6. Wattam AR, Davis JJ, Assaf R, et al.: Improvements to PATRIC, the all-bacterial Bioinformatics Database and Analysis Resource Center. Nucleic Acids Res. 2017; 45(D1): D535–D542. PubMed Abstract | Publisher Full Text | Free Full Text
7. Bandoy D: Supplemental Data for Pangenome Guided Pharmacophore Modelling of Enterohemorrhagic Escherichia Coli sdiA. OSF. 2019. http://www.doi.org/10.17605/OSF.IO/BNZ85
8. Seemann T: Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014; 30(14): 2068–2069. PubMed Abstract | Publisher Full Text
9. Hadfield J, Croucher NJ, Goater RJ, et al.: Phandango: an interactive viewer for bacterial population genomics. Bioinformatics. 2018; 34(2): 292–293. PubMed Abstract | Publisher Full Text | Free Full Text
10. Page AJ, Cummins CA, Hunt M, et al.: Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics. 2015; 31(22): 3691–3693. PubMed Abstract | Publisher Full Text | Free Full Text
11. Letunic I, Bork P: Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic Acids Res. 2016; 44(W1): W242–245. PubMed Abstract | Publisher Full Text | Free Full Text
12. Letunic I, Bork P: Interactive Tree Of Life v2: online annotation and display of phylogenetic trees made easy. Nucleic Acids Res. 2011; 39(Web Server issue): W475–478. PubMed Abstract | Publisher Full Text | Free Full Text
13. Letunic I, Bork P: Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation. Bioinformatics. 2007; 23(1): 127–128. PubMed Abstract | Publisher Full Text
14. Waterhouse A, Bertoni M, Bienert S, et al.: SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res. 2018; 46(W1): W296–W303. PubMed Abstract | Publisher Full Text | Free Full Text
15. Bienert S, Waterhouse A, de Beer TA, et al.: The SWISS-MODEL Repository-new features and functionality. Nucleic Acids Res. 2017; 45(D1): D313–D319. PubMed Abstract | Publisher Full Text | Free Full Text
16. Bertoni M, Kiefer F, Biasini M, et al.: Modeling protein quaternary structure of homo- and hetero-oligomers beyond binary interactions by homology. Sci Rep. 2017; 7(1): 10480. PubMed Abstract | Publisher Full Text | Free Full Text
17. Benkert P, Biasini M, Schwede T: Toward the estimation of the absolute quality of individual protein structure models. Bioinformatics. 2011; 27(3): 343–350. PubMed Abstract | Publisher Full Text | Free Full Text
18. Guex N, Peitsch MC, Schwede T: Automated comparative protein structure modeling with SWISS-MODEL and Swiss-PdbViewer: a historical perspective. Electrophoresis. 2009; 30 Suppl 1: S162–173. PubMed Abstract | Publisher Full Text
19. Robins-Browne RM, Holt KE, Ingle DJ, et al.: Are Escherichia coli Pathotypes Still Relevant in the Era of Whole-Genome Sequencing? Front Cell Infect Microbiol. 2016; 6: 141. PubMed Abstract | Publisher Full Text | Free Full Text
20. Sokalingam S, Raghunathan G, Soundrarajan N, et al.: A study on the effect of surface lysine to arginine mutagenesis on protein stability and structure using green fluorescent protein. PLoS One. 2012; 7(7): e40410. PubMed Abstract | Publisher Full Text | Free Full Text

Comments on this article Comments (0)

Version 3

VERSION 3 PUBLISHED 09 Jan 2019

Author details Author details

Department of Veterinary Paraclinical Sciences, University of the Philippines Los Baños, Los Baños, Laguna, 4031, Philippines

Competing interests

No competing interests were disclosed.

Grant information

This research was funded by the University of the Philippines Enhanced Creative Work and Research Grant.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Article Versions (3)

version 3

Revised

Published: 01 Sep 2020, 8:33

https://doi.org/10.12688/f1000research.17620.3

version 2

Revised

Published: 01 Oct 2019, 8:33

https://doi.org/10.12688/f1000research.17620.2

version 1

Published: 09 Jan 2019, 8:33

https://doi.org/10.12688/f1000research.17620.1

© 2019 Bandoy DD. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

SEE MORE DETAILS

CITE

how to cite this article

Bandoy DD. Pangenome guided pharmacophore modelling of enterohemorrhagic Escherichia coli sdiA [version 1; peer review: 1 approved with reservations, 1 not approved]. F1000Research 2019, 8:33 (https://doi.org/10.12688/f1000research.17620.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Version 1

VERSION 1

PUBLISHED 09 Jan 2019

Views

Reviewer Report 15 May 2019

Olivier Tenaillon, IAME (Infection Antimicrobials Modelling Evolution), UMR 1137, French Institute of Health and Medical Research (INSERM), Paris, France

Approved with Reservations

https://doi.org/10.5256/f1000research.19267.r46991

The present manuscript presents a state-of-the-art pan genome analysis of EHEC strains and a subsequent analysis of the variation in sdiA.

The analysis of sdiA could have been completed with simple KA/Ks analysis and compared to that of the core genome. For now, there is no connection between the two analysis, the authors could have simply performed the analysis of sdiA.

Some mutants have frame shifts in the gene, this is not discussed.

I think the author could try to connect more the two parts of the analysis.

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Partly

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Microbial genomics

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Author Response 16 May 2019

DJ Darwin Bandoy, Department of Veterinary Paraclinical Sciences, University of the Philippines Los Baños, Los Baños, 4031, Philippines

16 May 2019

Author Response

I thank the reviewer for the effort in doing the review. I accept all the suggestions and will add the population genetic analysis in the next version of the paper.
Competing Interests: No competing interests were disclosed.
I thank the reviewer for the effort in doing the review. I accept all the suggestions and will add the population genetic analysis in the next version of the paper.
I thank the reviewer for the effort in doing the review. I accept all the suggestions and will add the population genetic analysis in the next version of the paper.
Competing Interests: No competing interests were disclosed. Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response 16 May 2019

DJ Darwin Bandoy, Department of Veterinary Paraclinical Sciences, University of the Philippines Los Baños, Los Baños, 4031, Philippines

16 May 2019

Author Response

I thank the reviewer for the effort in doing the review. I accept all the suggestions and will add the population genetic analysis in the next version of the paper.
Competing Interests: No competing interests were disclosed.
I thank the reviewer for the effort in doing the review. I accept all the suggestions and will add the population genetic analysis in the next version of the paper.
I thank the reviewer for the effort in doing the review. I accept all the suggestions and will add the population genetic analysis in the next version of the paper.
Competing Interests: No competing interests were disclosed. Close
Report a concern

Views

Reviewer Report 24 Apr 2019

Kerry K. Cooper, School of Animal and Comparative Biomedical Sciences, University of Arizona, Tucson, AZ, USA

Not Approved

https://doi.org/10.5256/f1000research.19267.r47169

I would state the biggest issue with the manuscript is the work is not technically sound, because upon examining the metadata file from Patric, numerous strains included in the EHEC pangenome were in fact not EHEC strains. Many of the isolates are UPEC, STEC or EPEC strains that should not be included in the analysis and are resulting in potential errors in the results and conclusions of the manuscript. All O55 strains are EPECs, O104 from the Germany outbreak is a STEC not EHEC, O127 is an EPEC, and CFT073 is an UPEC strain, just as some examples.

Furthermore, the analysis also includes O157:H7 strains that are wild type and mutant strains, and the mutants should not be included in the analysis. Additionally, the authors mention three clades include O157 (Clade I), O26 (Clade II) and O80 (Clade III), however these clades are formed because the vast majority of the strains used in the analysis come from those three serotypes. The analysis is missing three serotypes from the "big 6" serotypes, including O45, O121 and O145, and only include one or two representatives of two of the other serotypes O111, O103. There are numerous genomes available for each of these serotypes through NCBI that should be included in this analysis. Particularly as the "big 6" serotypes represent >50% of the infections, and in the United States represent adulterants in ground beef or other meat products. Therefore, they are a vital aspect for the development of pharmacophore modelling of sdiA to prevent colonization in cattle.

Additionally, several genomic studies by Ogura et al (2009)¹ and Cooper et al (2014)² have shown that many of these "big 6" serotypes arise along different evolutionary pathways or split from O157 at different time points thus acquire different genes. It would vital to include these in analysis to see if these different pathways impacted the conservation of sdiA. The author should also provide a much cleaner version of the metadata as a separate tab in the spreadsheet that includes only those strains that were included in the analysis. Unfortunately, the above-mentioned issue means that all of the results in the manuscript are potential erroneous and need to be completely re-done with the elimination of non-EHEC strains and the inclusion of additional "big 6" genomes to provide a scientifically sound analysis.

It would also be helpful to include in the methods section the date of the search, as the database is constantly changing making reproduction a little bit easier by other researchers. Upon the new analysis it would be helpful to include a brief table or statement of the serotype breakdown included in the EHEC pangenome that would also eliminate some of the above-mentioned issues and make it easier for readers to get a sense of those serotypes included in the analysis.

There are a number of grammatical errors or poor phrasing in the manuscript that should be reviewed and corrected. Such as there is only one author and no indication of other researchers on the manuscript, yet the manuscript keeps stating we instead of I.

Finally, there are several points made in the introduction and discussion that do not have references. For example, in the introduction the author mentions "pangenome of E. coli was published in 2008 contained 8 genomes" but does not reference the paper. Additionally, the author mentions "serogroup O80 has aside from Shiga toxin, an extra-intestinal virulence plasmid (pS88), is currently emerging in France" but do not reference anything indicating the emergence in France. I have also provided the citations for Ogura et al.¹ and Cooper et al.² for the author to review and potentially cite in the manuscript.

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

No
Are sufficient details of methods and analysis provided to allow replication by others?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Not applicable
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Partly

References

1. Ogura Y, Ooka T, Iguchi A, Toh H, et al.: Comparative genomics reveal the mechanism of the parallel evolution of O157 and non-O157 enterohemorrhagic Escherichia coli.Proc Natl Acad Sci U S A. 2009; 106 (42): 17939-44 PubMed Abstract | Publisher Full Text
2. Cooper KK, Mandrell RE, Louie JW, Korlach J, et al.: Comparative genomics of enterohemorrhagic Escherichia coli O145:H28 demonstrates a common evolutionary lineage with Escherichia coli O157:H7.BMC Genomics. 2014; 15: 17 PubMed Abstract | Publisher Full Text

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: I am an expert in foodborne bacterial genomics, epidemiology and pathogenesis, particularly E. coli, Salmonella, Campylobacter, and Listeria.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

CITE

Report a concern

Author Response 16 May 2019

DJ Darwin Bandoy, Department of Veterinary Paraclinical Sciences, University of the Philippines Los Baños, Los Baños, 4031, Philippines

16 May 2019

Author Response

I appreciate the work of the reviewer for going through the metadata. The inclusion of non-EHEC is necessary as outgroup comparison group and does not debunk the validity of the ... Continue reading I appreciate the work of the reviewer for going through the metadata. The inclusion of non-EHEC is necessary as outgroup comparison group and does not debunk the validity of the analysis pipeline. In light of this clarification, since this is the sole reason for the non-approval, I appeal for an approval with reservations as the analysis is technically and scientifically sound.

I am in the process of including the big 6 serotypes in the analysis based on the reviewer's comments, as well as the additional references which are very constructive additions to the paper. Again, as the reviewer sees the value in redoing a more inclusive analysis of EHEC serotypes, this is another justification to approve with reservation the paper submitted.

I beg to disagree with the comment of using "inclusive we" in place of I as a grammatical error. The use of royal or inclusive we in lieu of I is a matter of preference. This is the only part of the review I do not agree with.
I appreciate the work of the reviewer for going through the metadata. The inclusion of non-EHEC is necessary as outgroup comparison group and does not debunk the validity of the analysis pipeline. In light of this clarification, since this is the sole reason for the non-approval, I appeal for an approval with reservations as the analysis is technically and scientifically sound.

I am in the process of including the big 6 serotypes in the analysis based on the reviewer's comments, as well as the additional references which are very constructive additions to the paper. Again, as the reviewer sees the value in redoing a more inclusive analysis of EHEC serotypes, this is another justification to approve with reservation the paper submitted.

I beg to disagree with the comment of using "inclusive we" in place of I as a grammatical error. The use of royal or inclusive we in lieu of I is a matter of preference. This is the only part of the review I do not agree with.
Competing Interests: No competing interests were disclosed. Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response 16 May 2019

DJ Darwin Bandoy, Department of Veterinary Paraclinical Sciences, University of the Philippines Los Baños, Los Baños, 4031, Philippines

16 May 2019

Author Response

I appreciate the work of the reviewer for going through the metadata. The inclusion of non-EHEC is necessary as outgroup comparison group and does not debunk the validity of the ... Continue reading I appreciate the work of the reviewer for going through the metadata. The inclusion of non-EHEC is necessary as outgroup comparison group and does not debunk the validity of the analysis pipeline. In light of this clarification, since this is the sole reason for the non-approval, I appeal for an approval with reservations as the analysis is technically and scientifically sound.

I am in the process of including the big 6 serotypes in the analysis based on the reviewer's comments, as well as the additional references which are very constructive additions to the paper. Again, as the reviewer sees the value in redoing a more inclusive analysis of EHEC serotypes, this is another justification to approve with reservation the paper submitted.

I beg to disagree with the comment of using "inclusive we" in place of I as a grammatical error. The use of royal or inclusive we in lieu of I is a matter of preference. This is the only part of the review I do not agree with.
I appreciate the work of the reviewer for going through the metadata. The inclusion of non-EHEC is necessary as outgroup comparison group and does not debunk the validity of the analysis pipeline. In light of this clarification, since this is the sole reason for the non-approval, I appeal for an approval with reservations as the analysis is technically and scientifically sound.

I am in the process of including the big 6 serotypes in the analysis based on the reviewer's comments, as well as the additional references which are very constructive additions to the paper. Again, as the reviewer sees the value in redoing a more inclusive analysis of EHEC serotypes, this is another justification to approve with reservation the paper submitted.

I beg to disagree with the comment of using "inclusive we" in place of I as a grammatical error. The use of royal or inclusive we in lieu of I is a matter of preference. This is the only part of the review I do not agree with.
Competing Interests: No competing interests were disclosed. Close
Report a concern

Comments on this article Comments (0)

Version 3

VERSION 3 PUBLISHED 09 Jan 2019

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2
Version 3 (revision) 01 Sep 20		read
Version 2 (revision) 01 Oct 19	read	read
Version 1 09 Jan 19	read	read

Kerry K. Cooper, University of Arizona, Tucson, USA
Olivier Tenaillon, French Institute of Health and Medical Research (INSERM), Paris, France

Comments on this article

All Comments(0)

Add a comment

Browse by related subjects

Back to all reports

Reviewer Report

11 Views

03 Nov 2020 | for Version 3

Olivier Tenaillon, IAME (Infection Antimicrobials Modelling Evolution), UMR 1137, French Institute of Health and Medical Research (INSERM), Paris, France

11 Views Cite this report Responses(0)

Approved

Table 2 is not properly labelled. Positions should be presented without the _240 and eventually the precise mutation named. It should be mentionned that the numbers refers to number of strains carrying the mutation.

Results from the MK test indicate low conservation. Which outgoup has been used? If the outgroup is too close too few mutations will be present to have reliable value. More details should be given on how the tests have been preformed

Apart from these minor comments the manuscript is fine.

Competing Interests

No competing interests were disclosed.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

16 Views

06 Nov 2019 | for Version 2

Kerry K. Cooper, School of Animal and Comparative Biomedical Sciences, University of Arizona, Tucson, AZ, USA

16 Views Cite this report Responses(0)

Approved

While I disagree with the author that "The inclusion of non-EHEC is necessary as outgroup comparison groups and does not debunk the validity of the analysis pipeline". As including non-EHEC genomes in an EHEC core genome analysis does alter the results of the output EHEC core genome, which is evident by the author removing those genomes from the analysis.

However, the author has done an excellent job of incorporating a huge amount of only EHEC serotype genomes into the analysis, and as a result has generated a much stronger study. The second version of this manuscript is tremendously better, and the author's additional work has made it significantly higher quality paper.

The only comments are extremely minor:

In the results and discussion section: Amino acid position 140 is ranked third with 34.9%, but the position 189 is 2nd with 24.4% frequency, these should be switched.
I would also recommend switching the Figure 2A,B, and C around, so that they are introduced in order of A, B and C.

Otherwise, no further comments.

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

I am an expert in foodborne bacterial genomics, epidemiology and pathogenesis, particularly E. coli, Salmonella, Campylobacter, and Listeria.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

18 Views

29 Oct 2019 | for Version 2

Olivier Tenaillon, IAME (Infection Antimicrobials Modelling Evolution), UMR 1137, French Institute of Health and Medical Research (INSERM), Paris, France

18 Views Cite this report Responses(0)

Approved With Reservations

I think this version is better than the previous one, but I still think the connection to sidA could be made stronger and the relevance of the focus on that gene and this subgroup also. Any conserved gene could be a target for drug. So is this gene more interesting than others in that group, the question should be more precise from the beginning.
The focus on some nonsynymous mutations is interesting, but if these mutations are neutral there is not much interest in showing a detailed structure. A precise Ka/Ks study should be done to tell if these few non synonymous are more than expected.

Competing Interests

No competing interests were disclosed.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

28 Views

15 May 2019 | for Version 1

Olivier Tenaillon, IAME (Infection Antimicrobials Modelling Evolution), UMR 1137, French Institute of Health and Medical Research (INSERM), Paris, France

28 Views Cite this report Responses(1)

Approved With Reservations

Is the work clearly and accurately presented and does it cite the current literature?

Yes
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Yes
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Partly

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Microbial genomics

Respond to this report

Responses (1)

Back to all reports

Reviewer Report

30 Views

24 Apr 2019 | for Version 1

Kerry K. Cooper, School of Animal and Comparative Biomedical Sciences, University of Arizona, Tucson, AZ, USA

30 Views Cite this report Responses(1)

Not Approved

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

No
Are sufficient details of methods and analysis provided to allow replication by others?

Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Not applicable
Are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions drawn adequately supported by the results?

Partly

References

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

I am an expert in foodborne bacterial genomics, epidemiology and pathogenesis, particularly E. coli, Salmonella, Campylobacter, and Listeria.

Respond to this report

Responses (1)

Author Response

16 May 2019

DJ Darwin Bandoy, Department of Veterinary Paraclinical Sciences, University of the Philippines Los Baños, Los Baños, 4031, Philippines

I appreciate the work of the reviewer for going through the metadata. The inclusion of non-EHEC is necessary as outgroup comparison group and does not debunk the validity of the analysis pipeline. In light of this clarification, since this is the sole reason for the non-approval, I appeal for an approval with reservations as the analysis is technically and scientifically sound.

I am in the process of including the big 6 serotypes in the analysis based on the reviewer's comments, as well as the additional references which are very constructive additions to the paper. Again, as the reviewer sees the value in redoing a more inclusive analysis of EHEC serotypes, this is another justification to approve with reservation the paper submitted.

I beg to disagree with the comment of using "inclusive we" in place of I as a grammatical error. The use of royal or inclusive we in lieu of I is a matter of preference. This is the only part of the review I do not agree with.

View more View less

Competing Interests

No competing interests were disclosed.

Alongside their report, reviewers assign a status to the article:

Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested

Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.

Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions

[1] 1. Rohde H, Qin J, Cui Y, et al.: Open-source genomic analysis of Shiga-toxin-producing E. coli O104:H4. N Engl J Med. 2011; 365(8): 718–724. PubMed Abstract | Publisher Full Text

[2] 2. Sperandio V: SdiA sensing of acyl-homoserine lactones by enterohemorrhagic E. coli (EHEC) serotype O157:H7 in the bovine rumen. Gut Microbes. 2010; 1(6): 432–435. PubMed Abstract | Publisher Full Text | Free Full Text

[3] 3. Maiden MC, Bygraves JA, Feil E, et al.: Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc Natl Acad Sci U S A. 1998; 95(6): 3140–3145. PubMed Abstract | Publisher Full Text | Free Full Text

[4] 4. Méric G, Yahara K, Mageiros L, et al.: A reference pan-genome approach to comparative bacterial genomics: identification of novel epidemiological markers in pathogenic Campylobacter. PLoS One. 2014; 9(3): e92798. PubMed Abstract | Publisher Full Text | Free Full Text

[5] 5. Antonopoulos DA, Assaf R, Aziz RK, et al.: PATRIC as a unique resource for studying antimicrobial resistance. Brief Bioinform. 2017; bbx083. PubMed Abstract | Publisher Full Text

[6] 6. Wattam AR, Davis JJ, Assaf R, et al.: Improvements to PATRIC, the all-bacterial Bioinformatics Database and Analysis Resource Center. Nucleic Acids Res. 2017; 45(D1): D535–D542. PubMed Abstract | Publisher Full Text | Free Full Text

[7] 7. Bandoy D: Supplemental Data for Pangenome Guided Pharmacophore Modelling of Enterohemorrhagic Escherichia Coli sdiA. OSF. 2019. http://www.doi.org/10.17605/OSF.IO/BNZ85

[8] 8. Seemann T: Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014; 30(14): 2068–2069. PubMed Abstract | Publisher Full Text

[9] 9. Hadfield J, Croucher NJ, Goater RJ, et al.: Phandango: an interactive viewer for bacterial population genomics. Bioinformatics. 2018; 34(2): 292–293. PubMed Abstract | Publisher Full Text | Free Full Text

[10] 10. Page AJ, Cummins CA, Hunt M, et al.: Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics. 2015; 31(22): 3691–3693. PubMed Abstract | Publisher Full Text | Free Full Text

[11] 11. Letunic I, Bork P: Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic Acids Res. 2016; 44(W1): W242–245. PubMed Abstract | Publisher Full Text | Free Full Text

[12] 12. Letunic I, Bork P: Interactive Tree Of Life v2: online annotation and display of phylogenetic trees made easy. Nucleic Acids Res. 2011; 39(Web Server issue): W475–478. PubMed Abstract | Publisher Full Text | Free Full Text

[13] 13. Letunic I, Bork P: Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation. Bioinformatics. 2007; 23(1): 127–128. PubMed Abstract | Publisher Full Text

[14] 14. Waterhouse A, Bertoni M, Bienert S, et al.: SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res. 2018; 46(W1): W296–W303. PubMed Abstract | Publisher Full Text | Free Full Text

[15] 15. Bienert S, Waterhouse A, de Beer TA, et al.: The SWISS-MODEL Repository-new features and functionality. Nucleic Acids Res. 2017; 45(D1): D313–D319. PubMed Abstract | Publisher Full Text | Free Full Text

[16] 16. Bertoni M, Kiefer F, Biasini M, et al.: Modeling protein quaternary structure of homo- and hetero-oligomers beyond binary interactions by homology. Sci Rep. 2017; 7(1): 10480. PubMed Abstract | Publisher Full Text | Free Full Text

[17] 17. Benkert P, Biasini M, Schwede T: Toward the estimation of the absolute quality of individual protein structure models. Bioinformatics. 2011; 27(3): 343–350. PubMed Abstract | Publisher Full Text | Free Full Text

[18] 18. Guex N, Peitsch MC, Schwede T: Automated comparative protein structure modeling with SWISS-MODEL and Swiss-PdbViewer: a historical perspective. Electrophoresis. 2009; 30 Suppl 1: S162–173. PubMed Abstract | Publisher Full Text

[19] 19. Robins-Browne RM, Holt KE, Ingle DJ, et al.: Are Escherichia coli Pathotypes Still Relevant in the Era of Whole-Genome Sequencing? Front Cell Infect Microbiol. 2016; 6: 141. PubMed Abstract | Publisher Full Text | Free Full Text

[20] 20. Sokalingam S, Raghunathan G, Soundrarajan N, et al.: A study on the effect of surface lysine to arginine mutagenesis on protein stability and structure using green fluorescent protein. PLoS One. 2012; 7(7): e40410. PubMed Abstract | Publisher Full Text | Free Full Text

Pangenome guided pharmacophore modelling of enterohemorrhagic Escherichia coli sdiA

Abstract

Keywords

Introduction

Methods

EHEC population

EHEC pangenome

Allelic variant calling

In silico sdiA protein modelling

Results and discussion

Figure 1A. EHEC E. coli pangenome (152 genomes) accessory genome phylogeny showing three major clades.

Figure 1B. EHEC E coli pangenome (152 genomes) showing genomic diversity with the presence absence variation (PAV) matrix.

Table 1. EHEC E coli pangenome (152 genomes) metrics.

Figure 2. sdiA nonsynonymous variants protein modelling.

Figure 3. EHEC E coli sdiA allelic variants showing synonymous and nonsynonymous mutations.

Conclusion

Data availability

Underlying data

Extended data

Grant information

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated