Keywords
Vibrio parahaemolyticus, Nanopore, Illumina, quorum-sensing, shrimp, Acute Hepatopancreatic Necrosis Disease
This article is included in the Pathogens gateway.
This article is included in the Nanopore Analysis gateway.
Vibrio parahaemolyticus, Nanopore, Illumina, quorum-sensing, shrimp, Acute Hepatopancreatic Necrosis Disease
Vibrio parahaemolyticus is a marine gram-negative Enterobacteriaceae that has been recognized as an important pathogen affecting commercially-relevant shrimp species such as the giant tiger prawn (Penaeus monodon) and the Pacific white shrimp (Litopenaeus vannamei)1–3. In recent years, V. parahaemolyticus has been linked to acute hepatopancreatic necrosis disease (AHPND)4,5. An AHPND-causing V. parahaemolyticus isolate is thought to produce a homolog of the insecticidal Photorhabdus insect-related (Pir) binary toxin that induces prolonged damage to the shrimp tissue2. AHPND preferentially affects shrimp post-larvae or juveniles with nearly 100% mortality rate within 30–35 days post-stocking6. Studies suggest that AHPND originated from China in 2009 and subsequently spread to South East Asia7. In Malaysia, AHPND was first reported in 2011 where it still persists in shrimp farms8.
The implementation of effective microbial management in aquaculture requires genomic surveillance data that can provide accurate information regarding the geographical origin, virulence factors and antibiotic resistance of a sequenced microbial strain9. In Malaysia, this remains challenging due to the paucity of available genomic resources for the relevant pathogens. To date, a majority of the microbial genomic studies in Malaysia consist of single draft genome reports mostly focusing on clinical pathogens without substantial comparative genomics10–13. However, recent years have seen a modest increase in the number of studies with more comprehensive sequencing dataset and analysis14–16. For example, Yan et al. sequenced and assembled the draft genomes of 40 Malaysian V. parahaemolyticus isolates associated with shrimp aquaculture and performed comparative genomics of more than Asian 100 V. parahaemolyticus genomes from public databases. In-silico multi-locus sequencing typing and phylogenomic analyses indicate that several Malaysian V. parahaemolyticus isolates belong to previously undescribed sequence types and genomic lineages17 and recommended further studies be undertaken.
The pirABvp genes coding for AHPND-causing binary toxins are localized on a 70-kb pVa plasmid2. Surprisingly, despite the small size of the plasmid relative to the chromosomal genome, a complete de novo assembly of the pVa plasmid using Illumina-only short reads has not been achieved to date. Recent studies suggested that this is due to the presence of repetitive elements on the pVa plasmid that are longer than the Illumina read length18. As a result, pirAB-containing contigs that were assembled from Illumina reads are generally short with limited gene content, precluding detailed analysis of the pirAB gene neighborhood and their stability in the plasmidome19. V. parahaemolyticus isolate MVP1 is one of the first pirABVp -harboring V. parahaemolyticus isolated from Malaysia to have its genome sequenced11. Isolate MVP1 was previously sequenced on an Illumina MiSeq followed by de novo genome assembly using the SPAdes assembler. While the pirABvp genes could be identified in the draft genome, the assembly was problematic as these were localized on a short contig flanked by small fragments of a transposase gene.
Recent years have seen the democratization of long-read sequencing enabled by Nanopore technology. In contrast to another popular long-read sequencing platform, PacBio, Nanopore sequencing requires minimal lab footprint and very low capital investment. Importantly, it is the first sequencing technology that allows native sequencing, thus eliminating sequencing bias associated with the activity of Taq-polymerase. Long reads have been utilized for de novo genome assembly in two distinct ways. First, long reads can be used directly in genome assembly followed by polishing with Illumina short reads to improve consensus accuracy20. An alternative approach utilizes long-reads to reorder and link contigs that were initially assembled from Illumina reads21,22. While the final assembly produced with this hybrid approach is highly accurate, the contiguity of the assembly is dependent on the quality of the initial Illumina assembly23.
Leveraging on the availability of Illumina dataset for isolate MVP1, we aimed to improve on the original genome assembly11 through the addition of 40× Nanopore read coverage. We first performed a Nanopore-led assembly using the long-read Flye assembler24 followed by polishing with Illumina reads. In addition, we generated a hybrid assembly using Unicycler assembler23. Although both methods produced a complete circular pVa plasmid sequence with intact pirABvp, complete chromosome assembly was only achieved with the Nanopore-led Flye assembly. This superior assembly provides new insight into the genomic features of V. parahaemolyticus MVP1 and will allow a greater understanding of host pathogenicity through comparative genomic analysis with selected V. parahaemolyticus isolates.
Sample collection, gDNA isolation and Illumina sequencing of V. parahaemolyticus isolate MVP1 have been previously reported11. For Nanopore sequencing, 1 μg of unfragmented gDNA was processed using the now obsolete Nanopore SQK-NSK007 library preparation kit. Sequencing was subsequently performed on an R9 flowcell attached to a MinION device for 48 hours. Base-calling of the produced raw fast5 files used Guppy v3.1 (high accuracy mode; requires registration to Oxford Nanopore Technology community site).
Illumina-only assembly and hybrid assembly incorporating high-quality (q>7) Nanopore long reads were performed using Unicycler v.0.4.723. Only contigs equal or larger than 1,000 bp were retained for subsequent analyses. A long-read only de novo assembly was also performed with Flye v2.4.2 utilizing similar Nanopore sequencing data24. Illumina polishing of the Flye assembly used the unicycler polish tool in Unicycler v.0.4.7 that performed multiple rounds of bwa-mem-based Illumina read alignment to the assembly followed by polishing with Pilon v1.2225. Genome completeness was assessed with BUSCO v3 (Gammaproteobacteria odb9) based on the whole proteome from each genome assembly that was predicted using Prodigal v 2.6.3 (default setting)26,27.
Automated annotation of the whole-genome was carried out using the NCBI Prokaryotic Genome Annotation Pipeline (PGAP)28. The translated coding sequences generated from PGAP were subject to additional functional annotation using InterProScan. Intra-chromosomal structural variations and sequence similarity were visualized with BLAST ring image generator (BRIG) using the blastN (e-value = 0.001 cutoff). Sequencing depth was also calculated in BRIG based on the BAM alignment file generated from the bwa-mem mapping of Illumina paired-end reads to the whole-genome29. Structural modeling of the putative insecticidal toxin proteins identified in MVP1 used the online Phyre2 webserver30.
Vibrio proteins containing the InterPro signature IPR035304, corresponding to the AinS-family N-acyl-homoserine-lactone autoinducer synthase, were downloaded from the UniProt database31 on the 7th of November 2019 and clustered at 98% amino acid identity cutoff using cd-hit v4.632. The protein clusters were subsequently aligned with MAFFT v7.3133 (”--maxiterate 1000 –localpair” setting) followed by maximum likelihood tree construction using FastTree v2.134. Calculation and visualization of pairwise protein sequence divergence used SDT v1.235. In addition, to confirm the absence of the LuxI-type autoinducer synthase homolog in the Vibrio parahaemolyticus proteomes, additional domain search specifically for PF00765 (LuxI autoinducer synthase domain) was performed followed by subsequent InterProScan validation as previously described36.
A total of 231 megabases of Nanopore data contained in 30,315 reads (N50 = 10,908 bp) were produced from a single Nanopore MINION run on the discontinued R9 flowcell. The base-called Nanopore fastq file has been deposited in the SRA database under the accession code SRX6759854. An initial Unicycler assembly using previously generated Illumina reads (SRX6759853) yielded a relatively fragmented genome (198 contigs; N50 length = 54,109 bp). Supplementing the assembler with approximately 40 x coverage of Nanopore long reads led a substantial improvement in the Unicycler assembly contiguity (9 contigs; N50 length = 1,738,848 bp). Of the 9 assembled contigs, one of them was flagged as “complete” with a contig length of 70kb. On the other hand, the Flye assembly using only Nanopore reads produced three contigs all flagged as “complete and circular”. Despite generating the most contiguous assembly, the initial Flye assembly has the worst BUSCO score with 37.4% BUSCO single-copy and 45.1% fragmented BUSCO genes (Table 1). Polishing of the Flye assembly with Illumina paired-end reads restored the genome completeness to a level that is comparable to the hybrid and Illumina-only assemblies (Table 1 and Extended data: Supplemental File 1). The high percentage of fragmented genes in long-read-only assembly is usually a result of erroneous frameshifts from indel sequencing errors in the long reads20. Although higher coverage of long read may lead to an increase the consensus accuracy; its accuracy will unlikely match that of an Illumina-polished assembly due to Nanopore-specific systematic errors. As a result, polishing of long-read assembly using Illumina short reads can be considered as a cost-effective but less convenient approach for producing highly accurate and contiguous microbial genome assembly.
The alignment of Illumina reads to the final chromosomal assembly revealed substantial lower sequencing depths in genomic regions with low GC-content (<30%) (Figure 1). Some of these regions are unique to MVP1 and may risk being overlooked in the Illumina-only assembly at lower sequencing coverage. For example, less than 5x sequencing depth was observed in the region 3,240 to 3,280 kb that contain various putative carbohydrate metabolism genes that are unique to isolate MVP1 in this current genomic comparison. The incorporation of PCR during library preparation is known to reduce the representation of high-AT genomic regions due to the polymerase amplification bias towards GC-balanced fragments37. Reduced Illumina sequencing depth in high-AT genomic regions affecting biological interpretation has also been reported in several crustacean mitogenomic libraries as both the phylogenetically informative control region and 16S rRNA genes are relatively AT-rich38,39. Depending on the amount of DNA available for sequencing, a few modifications to the Illumina library preparation step can be considered, such as PCR-free preparation for sample with high starting DNA (> 100 ng) and reduced PCR cycle coupled with higher (>100× genome coverage) sequencing depth for samples with low starting DNA.
Left, V. parahaemolyticus MVP1 chromosome 1 compared against five other V. parahaemolyticus chromosome 1. Right, V. parahaemolyticus MVP1 chromosome 2 compared against five other V. parahaemolyticus chromosome 2. The innermost rings show GC content while the second innermost rings show Illumina sequencing depth along the genome (blue). The remaining rings indicate BLAST comparisons of other V. parahaemolyticus genomes against MVP genome assembly. Arrows indicate genes of interest and were labeled with the predicted protein names.
In the Illumina-only assembly, the pVa plasmid was assembled into three contigs (Contig24: 63.4 kb, Contig164: 3.4kb; Contig198: 1 kb). Contig198 contains a 921-bp gene that codes for an intact IS5 family transposase and could be aligned equally well to two genomic regions flanking the pirABVp genes in the complete plasmid assembly (Figure 2). This observation can be explained by the inability of Illumina reads to fully span the repeats thus causing them to be collapsed into a single contig. As a result, the localization of pirABVp on the pVa plasmid cannot be confirmed from an Illumina-only assembly. In contrast the entire pirABVp genes and their flanking transposase genes were fully covered by a number of Nanopore long reads enabling the complete assembly of the pVa plasmid in addition to supporting the localization of pirABVp genes on the pVa plasmid. Nanopore read depth compared to Illumina, is relatively even across the aligned genomic region, which is consistent with the absence of PCR bias in the Nanopore library preparation.
Direction of arrow in the annotation indicates transcription orientation. Blue and red arrows in the Nanopore read alignment indicate forward and reverse strands, respectively.
The minor structural variations previously reported in pirABVp -containing region suggest that these genes are not stably maintained and are prone to transposition mediated by the flanking IS5 family transposase genes40. Interestingly, the localization of this IS5 family transposase gene is not specific to just the pVa plasmid. Local nucleotide similarity search of this transposase gene against the complete MVP genome revealed three additional perfect hits (100% query coverage and 100% nucleotide identity) in each of the chromosomal genome (Figure 1). The localization of these additional IS5 family transposase in both chromosome 1 and chromosome 2 further raises the possibility of the IS5 transposase-flanked pirABVp genes being integrated into the chromosomal region41,42.
Chromosome 2 of Isolate MPV1 consists of a 160 kb genomic region that is mostly absent in five out of six pVa plasmid-harboring V. parahaemolyticus isolates included in this comparative genomic analysis. Functional annotation of the genes located in this region revealed two gene clusters that may further contribute to host pathogenicity in addition to the binary Pir-like toxins on the pVa plasmid. The first gene cluster located from 1,822,499 to 1,836,895 bp in chromosome 2 consists of three relatively large genes (Gene Locus Tag: BSR23_26145, BSR23_26150 and BSR23_26155) coding for another type of putative insecticidal toxins. Functional annotation using InterProScan revealed the presence of protein domains commonly associated with the toxin-complex (Tc) toxins (Figure 3). In addition, Phyre2 protein modeling also indicated their high structural homology to their respective homologous toxin components (Extended data: Supplemental Files 2–4). In Xenorhabdus nematophilus, the three toxin components formed a native toxin complex and the ingestion of this complex led to growth inhibition of two different insect larvae, Helicoverpa zea and Heliothis virescens43. In addition, further tests showed that the native toxin complex is capable of binding to solubilized gut membranes of H. zea larvae in addition to inducing pore formation in black lipid membranes43. Given the functional resemblance of such toxins to the Pir-like binary toxins, the presence of all three genes coding for the complete set of insecticidal Tc toxins in V. parahaemolyticus may contribute to mortality without AHPND lesions in shrimps, as previously reported44.
Interestingly, isolate MVP1 also harbor a fuc operon that is not commonly present in members of this species (Extended data: Supplemental Table 1). The presence of this operon in MVP1 translates into the genomic potential for the internalization of fucose for incorporation into the capsular polysaccharides or bacterial glycoproteins. While it is also possible that the fuc operon may contribute to improved utilization of fucose as an energy source, Williams et al. did not observe a greater capacity to utilize fucose in an EMS-causing V. parahaemolyticus isolate 13-028/A345.
NCBI annotation of MVP1 proteome revealed the presence of only the LuxM/OpaM-type AHL synthase. An in-house HMMsearch for the LuxI autoinducer synthase domain (PFAM signature: PF00765) in the MVP1 proteome also did not reveal significant hits, indicating that MVP1 employs a single system for the biosynthesis of AHL signal (Extended Data: Supplemental File 5). Additional UniProt query search of “PF00765 AND Vibrio parahaemolyticus” on 14th November 2019, showed only one positive hit across all currently available V. parahaemolyticus proteomes. However, further investigation of the positive hit, AAY51_09590, revealed that this belonged to a Citrobacter spp. previously misclassified as V. parahaemolyticus46. In contrast, V. harveyi was shown to utilize at least two quorum-sensing systems e.g. LuxI that synthesizes 3-oxo-C6-HSL and the AinS that synthesizes C8-HSL47. Despite being classified as members of the same family for sharing the similar protein domain signature, AinS exhibited a substantially lower pairwise amino acid divergence (~ 30%) in comparison to LuxM and OpaM. By rooting the LuxM/OpaM phylogenetic tree with the AinS from Aliivibrio fischeri as the outgroup, we observed a strongly supported monophyletic cluster consisting of the functionally validated LuxM and OpaM in addition to their homologs from other Vibrio spp (Figure 4A). The pairwise amino acid identity among members in this LuxM/OpaM clade averages around 50% (Figure 4B), consistent with the relatively short branch length among members of the clade.
(A) Maximum likelihood tree depicting the evolutionary relationships of AinS-like family protein clusters among Vibrio spp. Number after the “=” sign in each tip label indicates the number of proteins represented by the cluster. Node were colored according to the SH-like local support values and branch lengths indicate number of substitutions per site. (B) Pairwise identity matrix of the identified AinS-like family proteins in Vibrio spp.
In Indian white shrimps (Fenneropenaeus indicus), co-injection of a pathogenic V. parahaemolyticus strain DHAP1 and purified recombinant AHL lactonase, AiiA, reduced Vibrio viable counts and biofilm development in the shrimp intestine, suggesting the role of AHL-mediated quorum-sensing system in host colonization48. While the presence of intact opaM in several sequenced V. parahaemolyticus isolates supports their ability to accumulate AHL and engage in quorum-sensing activity, this is best validated using a simple biosensor or LC/MS approach that can not only verify the AHL producing phenotype but also provide insight into the structural diversity of AHL signals49–51. Furthermore, elucidating the role of opaM in mediating host colonization and pathogenesis among AHPND-causing V. parahaemolyticus via transposon mutagenesis will be instructive.
Using approximately 40× genome coverage of Nanopore long reads, we produced the first chromosome-scale genome assembly of a Malaysian pirABVp-bearing V. parahaemolyticus isolate. Although the genome completeness of the initial Flye assembly was relatively poor, polishing of the genome assembly with Illumina reads improved its completeness to a level that is comparable to a high-quality microbial genome assembly. Structural variations identified from genomic comparisons provide new insights into the genomic features of V. parahaemolyticus MVP1 that may be associated with host colonization and pathogenicity.
Vibrio parahaemolyticus strain MVP1 chromosome 1, complete sequence, Accession number CP043421: https://www.ncbi.nlm.nih.gov/nuccore/CP043421
Vibrio parahaemolyticus strain MVP1 chromosome 2, complete sequence, Accession number CP043422: https://www.ncbi.nlm.nih.gov/nuccore/CP043422
Vibrio parahaemolyticus strain MVP1 plasmid pVa, complete sequence, Accession number CP043423: https://www.ncbi.nlm.nih.gov/nuccore/CP043423
Raw Illumina reads and basecalled Nanopore Reads have also been deposited in NCBI Sequence Read Archive under the BioProject PRJNA355061.
Zenodo: Nanopore long reads enable the first complete genome assembly of a Malaysian Vibrio parahaemolyticus isolate bearing the pVa plasmid associated with acute hepatopancreatic necrosis disease, http://doi.org/10.5281/zenodo.356848552.
This project contains the following extended data:
Supplemental File 1: Main genome assemblies (Unpolished Flye assembly, Polished Flye assembly, Unicycler Hybrid Assembly and Unicycler Illumina-only assembly) generated in this study for comparison and their BUSCO output.
Supplemental File 2: Phyre2 protein modeling output of the putative MVP1 TcdA toxin
Supplemental File 3: Phyre2 protein modeling output of the putative MVP1 TcdB toxin
Supplemental File 4: Phyre2 protein modeling output of the putative MVP1 TccC toxin
Supplemental File 5: InterProScan output of the NCBI-predicted MVP1 proteome.
Supplemental Table 1: NCBI BlastN output using the fuc genes of Vibrio parahaemolyticus MVP1 as the query to search against the Vibrio reference WGS database as of 21 Oct 2019.
Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Is the work clearly and accurately presented and does it cite the current literature?
Yes
Is the study design appropriate and is the work technically sound?
No
Are sufficient details of methods and analysis provided to allow replication by others?
Yes
If applicable, is the statistical analysis and its interpretation appropriate?
Not applicable
Are all the source data underlying the results available to ensure full reproducibility?
No source data required
Are the conclusions drawn adequately supported by the results?
Partly
Competing Interests: No competing interests were disclosed.
Is the work clearly and accurately presented and does it cite the current literature?
Partly
Is the study design appropriate and is the work technically sound?
Partly
Are sufficient details of methods and analysis provided to allow replication by others?
Yes
If applicable, is the statistical analysis and its interpretation appropriate?
I cannot comment. A qualified statistician is required.
Are all the source data underlying the results available to ensure full reproducibility?
Yes
Are the conclusions drawn adequately supported by the results?
Partly
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: I am a microbiologist working on vibrio pathogenesis.
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | ||
---|---|---|
1 | 2 | |
Version 1 16 Dec 19 |
read | read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)