ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Software Tool Article

NastyBugs: A simple method for extracting antimicrobial resistance information from metagenomes

[version 1; peer review: 2 approved with reservations]
PUBLISHED 08 Nov 2017
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Pathogens gateway.

This article is included in the Antimicrobial Resistance collection.

This article is included in the Hackathons collection.

Abstract

Multidrug resistant bacteria are becoming a major threat to global public health. While there are many possible causes for this, there have so far been few adequate solutions to this problem. One of the major causes is a lack of clinical tools for efficient selection of an antibiotic in a reliable way. NastyBugs is a new program that can identify what type of antimicrobial resistance is most likely present in a metagenomic sample, which will allow for both smarter drug selection by clinicians and faster research in an academic environment.

Keywords

Metagenome, microbiome, antibiotic, resistance, bacterial, genomic signature, MDR, workflow

Introduction

Antimicrobial resistance (AMR) of bacterial pathogens is a growing public health threat around the world. Most concerning are multidrug resistant (MDR) bacteria, which have become more prevalent in recent decades1. Well known examples of these pathogens include methicillin-resistant Staphylococcus aureus (MRSA), vancomycin-resistant S. aureus, extended spectrum beta-lactamase, and vancomycin-resistant Enterococcus. MRSA is prevalent in surgical and maternity hospitals and nursing homes, where it is often associated with hospital-acquired infection with high morbidity and mortality2. The current method of determining if a patient has a MDR infection is based on being able to culture a patient-derived sample in the presence of different antibiotic drugs3. This is a slow process that can take days to weeks, which can put the patient in danger of not receiving the correct antibiotic in time.

There are several known mechanisms by which bacteria acquire AMR. One common mechanism involves acquiring resistance through horizontal genetic transfer (HGT), which can include plasmid-, phage-, transposon-, and integron-mediated resistance4. The second major mechanism involves SNPs in chromosomal genes that can result in a change in antibiotic binding sites5.

High-throughput whole genome sequencing (WGS) of microbiomes is a state-of-the-art method for studying complex microbial communities, such as the human gut. WGS creates large raw data sets, which must be processed quickly and efficiently to guide clinicians for the best and most efficient treatment strategy for the given patient. However, simple, clinically applicable bioinformatics methods that can provide a fast, reusable, reproducible, scalable pipeline to locate AMR genomic signatures in large metagenomics datasets from the NCBI Sequence Read Archive (SRA) and other public datasets are still lacking. Such pipelines could also be used for crowdsourcing of this analysis, such as with undergraduate students. The problem of determining efficient strategies for antibiotic usage is the keystone of the modern antibacterial therapy and prevention6.

In the last few years, a variety of different papers and tools have been developed that exploit AMR detection for both complete genomes and metagenomes. Some of the existing detection methods for AMR genomic signatures include: ResFinder7, PointFinder8, SSTAR9, DeepARG10, ARIBA11 and ResCap12. Another approach is Galaxy-based pipeline Amr++. All detection methods depend on the availability of collections of known AMR genomic signatures. These signatures are then directly searched for, or models are generated for the detection of novel AMR genes/loci. One of the most updated manually curated databases is the Comprehensive Antibiotic Resistance Database (CARD)13; others include ResFinder7, ARG-ANNOT14, and MegaRES15. Although some of these tools provide user-friendly web interfaces and use both FASTA and FASTQ files as input, they do not use the power of command line. Moreover, these solutions are not universal, e.g. ResFinder searches for only HGT-mediated resistance, whereas its successor PointFinder only looks for AMR caused by chromosomal point mutations. Other disadvantages of the existing solutions include an inability to work with big datasets or multiple raw sequence files, slow speed, and poor handling of metagenomic data.

The primary objective of this project is designing a reliable system for rapid diagnostics and prompt treatment of patients with MDR infections. At the heart of the system is a reusable, reproducible, scalable, and interoperable workflow to locate AMR genomic signatures in SRA shotgun sequencing (including metagenomics) datasets. To ease this task we used only RefSeq reference genomes for the bacterial pathogens important for public health, but the pipeline can be scalable to include databases for other microbes, viruses, and fungi. The result, NastyBugs, is a new program that can identify what type of antimicrobial resistance is most likely present in a metagenomic sample, which will allow for both smarter drug selection by clinicians and faster research done in an academic environment. NastyBugs is a framework created during the National Center of Biotechnology Information Hackathon in August 2017.

Methods

The detailed workflow to extract antimicrobial resistance gene signatures is described in Figure 1.

6f43026e-39e1-4689-b6d9-4e7a44987050_figure1.gif

Figure 1. An overview of the NastyBugs pipeline.

Three BLAST databases for downstream analysis were created using the latest versions of: 1) RefSeq human genome assembly GRCh37/UCSC hg19 (RefSeq accession no. GCF_000001405.37); 2) RefSeq bacterial genomes; 3) CARD13. For comparison purposes, CARD was used as two databases: one consisted of genes and another of SNPs.

Further analysis consists of three steps: 1) Host (human) reads removal; 2) Antimicrobial resistance signature identification; 3) Bacterial identification and characterization. Steps 2 and 3 were performed in parallel. Input data is SRA accession numbers (ERR or SRR) of the metagenome of interest. Another option is using FASTQ files from local storage.

Host reads removal

Using STAR16 or Magic-BLAST, all reads mapped to human genome GRCh37/hg19 were removed, and unmapped non-human reads (considered as bacterial) were collected using SAMtools for further analysis.

Antimicrobial resistance signature identification

To remove adapters/linkers/barcodes we used FASTX Clipper and Trimmer. The non-human reads are mapped again using Magic-BLAST; however this time they are mapped to bacterial genes/SNPs from the CARD. This allows for the identification of genes and SNPs that can lead to antimicrobial resistance in the bacterial population. Obtained reads were sorted and the sum of read abundance was calculated.

Bacterial identification and characterization

The identification of bacterial species and abundance is carried out in parallel. For that we again used Magic-BLAST and the database of NCBI RefSeq reference bacterial genomes. The resulting list of species was visualized using Krona17.

Output format

The output TAB-delimited formatted file contains data in five columns: 1) RefSeq accession number; 2) genus; 3) resistance gene; 4) ARO (Antibiotic Resistance Ontology) accession number; 5) score (number of mapped reads per 1kb). The data can be used for constructing a scatter plot showing relative abundance of antimicrobial resistance and the corresponding bacterial species in metagenomic sample analysed.

Dependencies

The documented workflow contains the script with containerized tools in Docker.

We used the following dependencies: 1) Magic-BLAST v. 1.3, a novel tool allowing mapping of large next-generation RNA or DNA sequencing runs against a whole genome; 2) SAMtools v. 1.3.1, a popular suite of programs for interacting with HTS data; 3) FASTX-Toolkit v. 0.0.13, a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing; 4) STAR v. 2.5.3a, RNA-seq aligner; and 5) Krona v. 2.7, a tool for metagenomic pie chart visualization.

Use case

To validate our pipeline we used two human gut metagenomic datasets, SRA acc. no. ERR1600439 and SRR5239736. The SRR5239736 sample was used for comparison of our results with the results obtained by ResCap12. For metagenome sample SRR5239736, 24% of reads were mapped to the gene database of CARD and 1.6% reads were mapped to the SNP database of CARD.

Magic-BLAST, a novel program from the BLAST family, can provide a faster and more accurate way to align reads of interest with reference sequences. A quick comparison of STAR and Magic-BLAST showed at least a 10-fold difference in speed increase with Magic-BLAST for mapping of SRA reads to human genome compared to STAR. For that reason, we chose not to use STAR in the pipeline.

Conclusion and next steps

Obtained results showed high efficiency of identification of antibiotic signatures in the studied samples. However, the presented workflow may be improved. Planned improvements will include: 1) optimization of the pipeline; 2) additional large-scale validation for different metagenomic samples; 3) representation of results with more information; 4) adding information about proteins participating in AMR; and 5) prediction of novel resistance genes using Hidden Markov Model.

Moreover, implementation of machine learning analysis may provide additional capabilities.

The pipeline can be used to efficiently identify the presence of antimicrobial resistant genes, which in turn can be used as features for further downstream machine learning analysis. One useful application of machine learning in antimicrobial resistance is the prediction of the appropriate antimicrobial therapy to apply to a critically ill patient. For these patients, the time taken to administer an appropriate antibiotic agent inversely correlates with improved patient outcomes18. Whole genome sequencing of microbial isolates, followed by antimicrobial resistance genes identification and machine learning prediction provides an attractive solution to this problem. A previous application in this regard applied a simple rules-based approach and a logistic regression model19. More sophisticated, non-linear supervised machine learning methods, such as random forests, gradient boosting, and artificial neural networks may play a key role in producing accurate predictions for clinical use. Artificial neural networks, such as convolutional and recurrent neural networks, are particularly interesting as they may be able to extract novel features.

Data and software availability

The code for the pipeline is publically available on GitHub: https://github.com/NCBI-Hackathons/MetagenomicAntibioticResistance.

Archived source code as at time of publication: http://doi.org/10.5281/zenodo.102026620

License: MIT

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 08 Nov 2017
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Tsang H, Moss M, Fedewa G et al. NastyBugs: A simple method for extracting antimicrobial resistance information from metagenomes [version 1; peer review: 2 approved with reservations]. F1000Research 2017, 6:1971 (https://doi.org/10.12688/f1000research.12781.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 08 Nov 2017
Views
59
Cite
Reviewer Report 08 Dec 2017
Tom J. B. de Man, Centers for Disease Control and Prevention,  Atlanta, GA, USA 
Approved with Reservations
VIEWS 59
The authors describe a new tool that is able to detect antimicrobial resistance (AMR) genes and identify bacterial species composition directly from metagenomic datasets. This pipeline removes the need to culture bacterial specimens and perform PCR for AMR gene detection ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
de Man TJB. Reviewer Report For: NastyBugs: A simple method for extracting antimicrobial resistance information from metagenomes [version 1; peer review: 2 approved with reservations]. F1000Research 2017, 6:1971 (https://doi.org/10.5256/f1000research.13848.r27751)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
65
Cite
Reviewer Report 20 Nov 2017
Torsten Seemann, University of Melbourne, Melbourne, Vic, Australia 
Approved with Reservations
VIEWS 65
"A Software Tool Article should include the rationale for the development of the tool and details of the code used for its construction.  The article should provide examples of suitable input data sets and include an example of the output ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Seemann T. Reviewer Report For: NastyBugs: A simple method for extracting antimicrobial resistance information from metagenomes [version 1; peer review: 2 approved with reservations]. F1000Research 2017, 6:1971 (https://doi.org/10.5256/f1000research.13848.r27750)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 08 Nov 2017
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.