ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Research Note

Pangenome guided pharmacophore modelling of enterohemorrhagic Escherichia coli sdiA

[version 1; peer review: 1 approved with reservations, 1 not approved]
PUBLISHED 09 Jan 2019
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

Abstract

Enterohemorrhagic Escherichia coli (EHEC) continues to be a significant public health risk. With the onset of next generation sequencing, whole genome sequences are a potential resource for predictive modelling of the different regulatory mechanism of pathogens, particularly quorum sensing. We used a pangenome approach to determine EHEC genome clustering, determine the synonymous and nonsynonymous mutations across the EHEC sdiA and modelled the associated amino acid changes. Across the EHEC population, nonsynonymous variants are notably absent in ligand binding site for quorum sensing, indicating that population wide conservation of sdiA ligand site can be targeted for potential prophylactic purposes. Applying pathotype-wide pangenomics as a guide for determining evolution of pharmacophore sites is a potential approach in drug discovery.

Keywords

pangenome,pharmacophore,EHEC, Escherichia coli

Introduction

One of the more prominent strain of Escherichia coli is the enterohemorrhagic (EHEC) pathotype associated with global outbreaks of bloody diarrhea and hemolytic uremic syndrome1. EHEC is one of the classical of examples of one health disease as the interface of animal health spills into human health. Within the cattle reservoir, sdiA gene is required by E. coli to survive the acidic rumen environment. SdiA is used by E. coli to sense acyl homoserine in a quorum sensing system2. However, it is considered as an orphan as the cognate acyl homoserine synthase is absent, and hence sdiA is considered an environmental sensor to sense the nearby microbial community. SdiA is stabilized by acyl homoserine lactone and acts as transcription factor glutamate decarboxylase needed for survival in the acidic environment. Hence blocking the ability of EHEC to survive the acidic ruminal environment is a proposed mechanism to control shedding in the cattle reservoir.

Whole genome sequencing of bacterial pathogens, particularly EHEC, is quickly transforming the workflows of epidemiological investigations. However, most bioinformatic pipelines used in clinical investigation use data reduction of genomes and artificially reducing diversity due to comparison of a limited number of housekeeping genes3. While wgMLST attempts to increase the number of genes, the assignment of a single reference genome appears to be inadequate in light of the pangenome. Various studies have shown that a significant number of genes that are present to the entire universe of genes within a species is missed for variant calling if only a single reference gene is used4. In this study we applied multi-scale approach to generate genome wide clustering using pangenome phylogeny using genome wide gene presence absence variation (PAV). We used this genome wide clusters as guide in generating the pharmacophore of sdiA as potential design strategy to control shedding by reducing the acid survival in the rumen.

We applied the concept of the pangenome, which represents the entirety of the genes that are present within a species, to a pathotype level. The EHEC pangenome represents the combination of genes seen in the EHEC pathotype. While the pangenome of E. coli was published in 2008 contained 8 genomes, we generated an updated EHEC pangenome with 153 genomes. The pangenome enables clustering of isolates using gene presence and absence. We used the genome wide comparison to generate clusters and genomic clade specific pharmacophore model of sdiA. This strategy enables to capture the pangenome wide variation of sdiA and ensures all conserved variants are targeted by the drug discovery pipeline enabling a pangenome to pharmacophore approach.

Methods

EHEC population

Whole genome sequences with the associated EHEC metadata was downloaded from Patric Database 3.5.28 using the keyword search for E. coli as organism and EHEC as the pathotype within the E. coli species5,6. This resulted to 196 results and 152 of which are closed genome and draft genomes while the remaining sequences are EHEC associated plasmids (Underlying data: Metadata from Patric Database of EHEC E. coli pangenome7).

EHEC pangenome

The genomes were annotated with Prokka 1.13.3 as per published protocol8. Gff files were extracted as input for the pangenome pipeline Roary 3.11.2 using the following parameters for not splitting paralogs (roary -s -p 32 *.gff) and the resulting presence absence matrix (Underlying data: EHEC E. coli pangenome presence absence matrix7) together with the accessory genome phylogeny visualized in Phandango 1.3.0 and is represented as Figure 1B9,10. Each blue bar represents an individual gene and solid blue blocks represent gene clusters. The accessory genome-based phylogeny newick file was visualized using iTol 4.31113.

Allelic variant calling

Snippy variant calling pipeline 4.3.5 was used to determine the synonymous and nonsynonymous protein mutations using sdiA of Escherichia coli O157:H7 str. Sakai as reference. The –contigs option was added to the standard commandline (snippy –outdir –ref sdiA_sakai.gbk). The resulting individual variants of sdiA was merged into EHEC E. coli sdiA variant calling data (Underlying data7).

In silico sdiA protein modelling

SdiA genes were extracted from the pangenome output of Roary and protein in silico modelling performed using SWISS-MODEL1418. SdiA protein sequences were used as targets to search for protein templates within the SWISS-MODEL library. Model selection was based on the template with the highest quality prediction by the target-template alignment.

Results and discussion

Three main genomic clades were generated using the pangenome PAV of genes (Figure 1A). Clade I includes the genomes isolated from the earliest reported outbreak in 1982 as well as the more recent cases. Clade II includes genomes from European (predominantly Czech and German) enterohemorrhagic E. coli isolates (EHEC) belonging to the O26 serotype, while Clade III includes cluster of cases associated with acute renal failure in children. Clade III is associated with hybrid pathotype of serogroup O80 has aside from Shiga toxin, an extra-intestinal virulence plasmid (pS88), is currently emerging in France. Specific gene clusters are found in cluster 1 and noticeably absent in cluster III, particularly prophages and associated genes (Underlying data: Table 2, Figure 1B). This indicates that specific genome wide gene presence and absence can be used for functional cluster instructive of the underlying epidemiological dynamics. The acquisition of genomic islands unique to individual isolates are well defined in the pangenome gene presence absence matrix (Figure 1B). The core genome is 2145 (Table 1) and total gene count within the EHEC pangenome is 17152, which is not that far off the total E. coli pangenome 22,00019. This enormous difference between the core gene and total gene highlights the variation between the different isolates, which can be strain specific and individual isolate specific as indicated by the pangenome data.

929c5f22-5ea4-4e9d-af4b-cdb07a87ae24_figure1A.gif

Figure 1A. EHEC E. coli pangenome (152 genomes) accessory genome phylogeny showing three major clades.

929c5f22-5ea4-4e9d-af4b-cdb07a87ae24_figure1B.gif

Figure 1B. EHEC E coli pangenome (152 genomes) showing genomic diversity with the presence absence variation (PAV) matrix.

Table 1. EHEC E coli pangenome (152 genomes) metrics.

Percentage OccurrenceGene
Number
Core genes(99% <= strains <= 100%)2145
Soft core genes(95% <= strains < 99%)1308
Shell genes(15% <= strains < 95%)3085
Cloud genes(0% <= strains < 15%)10614
Total genes(0% <= strains <= 100%)17152

All of the examined genomes contain sdiA gene, which is contained within the 2145 core genome. Using sdiA of Escherichia coli O157:H7 str. Sakai for variant calling, we identified the nonsynonymous mutations that are widely distributed across the EHEC isolates (Figure 3). 45.0% (49/109) of the nonsynonymous mutation is due to conversion of arginine to lysine at position 189 of sdiA (Figure 2A). This amino acid is located with the α-6 domain, adjacent to the amino acid clusters associated with sdiA dimerization. Previous protein modelling determined role of the guanidinium group of arginine which enables interactions in three different directions enabling a more complex electrostatic interaction versus lysine as well as the higher pKa value in arginine that can yield a more stable ionic interaction compared to lysine20. The arginine to lysine mutations are found mostly in clade III isolates. The second ranked nonsynonymous mutation asparagine to serine at amino acid position 101 with 17.4% (19/109) which is located adjacent to η-4 phenylalanine which is associated with the ligand and distributed among the different clades (Figure 2B). β-5 domain alanine to threonine change at amino acid position 140 is the third ranked nonsynonymous mutation with 9.2 % (10/109) (Figure 2C). The fourth ranked nonsynonymous mutation is at amino acid position 138 with the conversion of cysteine to serine (Figure 2D) also within the β-5 domain. Both nonsynonymous variants of amino acid position 140 and 138 are found within clade II and III. None of the highly ranked nonsynonymous mutations impact the ligand interaction, indicating the conservation of the sdiA motif across the population in geographic and temporal distribution, which suggests the possibility of targeting sdiA for quorum sensing inhibition.

929c5f22-5ea4-4e9d-af4b-cdb07a87ae24_figure2.gif

Figure 2. sdiA nonsynonymous variants protein modelling.

929c5f22-5ea4-4e9d-af4b-cdb07a87ae24_figure3.gif

Figure 3. EHEC E coli sdiA allelic variants showing synonymous and nonsynonymous mutations.

Conclusion

While EHEC pangenome is remarkably diverse, the allelic variants of sdiA, particularly nonsynonymous mutants, indicate the conservation of quorum sensing domain, indicating that targeting this structure can be effective across the different lineages of EHEC pathotype.

Data availability

All underlying and extended data available from Open Science Framework: Supplemental Data for Pangenome guided pharmacophore modelling of enterohemorrhagic Escherichia coli sdiA, https://doi.org/10.17605/OSF.IO/BNZ857

Data are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication).

Underlying data

Table 1 Metadata from Patric Database of EHEC E. coli pangenome

Table 2 EHEC E. coli pangenome presence absence matrix

Table 3 EHEC E. coli sdiA variant calling data

Extended data

SWISS-MODEL Homology Modelling Report available at osf.io/bnz85.

Comments on this article Comments (0)

Version 3
VERSION 3 PUBLISHED 09 Jan 2019
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Bandoy DD. Pangenome guided pharmacophore modelling of enterohemorrhagic Escherichia coli sdiA [version 1; peer review: 1 approved with reservations, 1 not approved]. F1000Research 2019, 8:33 (https://doi.org/10.12688/f1000research.17620.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 09 Jan 2019
Views
28
Cite
Reviewer Report 15 May 2019
Olivier Tenaillon, IAME (Infection Antimicrobials Modelling Evolution), UMR 1137, French Institute of Health and Medical Research (INSERM), Paris, France 
Approved with Reservations
VIEWS 28
The present manuscript presents a state-of-the-art pan genome analysis of EHEC strains and a subsequent analysis of the variation in sdiA.

The analysis of sdiA could have been completed with simple KA/Ks analysis and compared to that ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Tenaillon O. Reviewer Report For: Pangenome guided pharmacophore modelling of enterohemorrhagic Escherichia coli sdiA [version 1; peer review: 1 approved with reservations, 1 not approved]. F1000Research 2019, 8:33 (https://doi.org/10.5256/f1000research.19267.r46991)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 16 May 2019
    DJ Darwin Bandoy, Department of Veterinary Paraclinical Sciences, University of the Philippines Los Baños, Los Baños, 4031, Philippines
    16 May 2019
    Author Response
    I thank the reviewer for the effort in doing the review. I accept all the suggestions and will add the population genetic analysis in the next version of the paper.
    Competing Interests: No competing interests were disclosed.
COMMENTS ON THIS REPORT
  • Author Response 16 May 2019
    DJ Darwin Bandoy, Department of Veterinary Paraclinical Sciences, University of the Philippines Los Baños, Los Baños, 4031, Philippines
    16 May 2019
    Author Response
    I thank the reviewer for the effort in doing the review. I accept all the suggestions and will add the population genetic analysis in the next version of the paper.
    Competing Interests: No competing interests were disclosed.
Views
29
Cite
Reviewer Report 24 Apr 2019
Kerry K. Cooper, School of Animal and Comparative Biomedical Sciences, University of Arizona, Tucson, AZ, USA 
Not Approved
VIEWS 29
I would state the biggest issue with the manuscript is the work is not technically sound, because upon examining the metadata file from Patric, numerous strains included in the EHEC pangenome were in fact not EHEC strains. Many of the isolates ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Cooper KK. Reviewer Report For: Pangenome guided pharmacophore modelling of enterohemorrhagic Escherichia coli sdiA [version 1; peer review: 1 approved with reservations, 1 not approved]. F1000Research 2019, 8:33 (https://doi.org/10.5256/f1000research.19267.r47169)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 16 May 2019
    DJ Darwin Bandoy, Department of Veterinary Paraclinical Sciences, University of the Philippines Los Baños, Los Baños, 4031, Philippines
    16 May 2019
    Author Response
    I appreciate the work of the reviewer for going through the metadata. The inclusion of non-EHEC is necessary as outgroup comparison group and does not debunk the validity of the ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 16 May 2019
    DJ Darwin Bandoy, Department of Veterinary Paraclinical Sciences, University of the Philippines Los Baños, Los Baños, 4031, Philippines
    16 May 2019
    Author Response
    I appreciate the work of the reviewer for going through the metadata. The inclusion of non-EHEC is necessary as outgroup comparison group and does not debunk the validity of the ... Continue reading

Comments on this article Comments (0)

Version 3
VERSION 3 PUBLISHED 09 Jan 2019
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.