ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Research Article
Revised

Re-analysis of metagenomic sequences from acute flaccid myelitis patients reveals alternatives to enterovirus D68 infection

[version 2; peer review: 2 approved]
PUBLISHED 13 Jul 2015
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Preclinical Reproducibility and Robustness gateway.

Abstract

Metagenomic sequence data can be used to detect the presence of infectious viruses and bacteria, but normal microbial flora make this process challenging. We re-analyzed metagenomic RNA sequence data collected during a recent outbreak of acute flaccid myelitis (AFM), caused in some cases by infection with enterovirus D68. We found that among the patients whose symptoms were previously attributed to enterovirus D68, one patient had clear evidence of infection with Haemophilus influenzae, and a second patient had a severe Staphylococcus aureus infection caused by a methicillin-resistant strain. Neither of these bacteria were identified in the original study. These observations may have relevance in cases that present with flaccid paralysis because bacterial infections, co-infections or post-infection immune responses may trigger pathogenic processes that may present as poliomyelitis-like syndromes and may mimic AFM.  A separate finding was that large numbers of human sequences were present in each of the publicly released samples, although the original study reported that human sequences had been removed before deposition.

Keywords

microbiome, metagenomics, neurological infections, computational biology, next-generation sequencing, sequence alignment

Revised Amendments from Version 1

We have revised our manuscript to make some small text amendments, in response to Charles Chiu's comments.

To read any peer review reports and author responses for this article, follow the "read" links in the Open Peer Review table.

Background

Metagenomic shotgun sequencing, in which DNA or RNA is extracted from a tissue sample and then sequenced, has the potential to detect a wide range of infections. Deep whole-genome shotgun (WGS) sequencing can detect bacteria, viruses, and eukaryotic pathogens with equal effectiveness, as long as the infectious agent is similar to a species that has been previously sequenced. Sequencing databases already contain thousands of known species, and as this number grows, the sensitivity of WGS will grow as well.

In 2014, a large outbreak of infection with enterovirus D68 was associated with both severe respiratory illness and acute paralysis, which the U.S. Centers for Disease Control and Prevention (CDC) named acute flaccid myelitis (AFM)1. Samples collected from 48 patients were sequenced and shown to form a novel strain, Clade B1, based on phylogenetic analysis of 180 complete enterovirus D68 sequences2. The same study conducted metagenomic sequencing of cerebrospinal fluid (CSF) and/or nasopharyngeal (NP) swabs from 22 of these patients and found enterovirus D68 in some NP samples that were positive based on PCR testing.

The identification of species from a WGS sample is a challenging problem that has spurred the development of multiple new computational methods35. Because of the large size of next-generation sequencing data sets, these methods need to be very fast, but in the context of clinical diagnosis, they also need to be accurate. We downloaded the 31 next-generation sequencing (NGS) samples from the Greninger et al.2 study (NCBI accession SRP055445) and re-analyzed them using a computational pipeline based on the recently developed Kraken metagenomic analysis software4, a very fast and sensitive system that can be customized to use a database containing any species whose sequences are available.

Alternative infectious diagnoses in two subjects

Among the 22 subjects for which NGS data were available, we found at least two that had far greater numbers of sequences (reads) from a bacterial pathogen than from enterovirus D68. Neither subject had been reported in 2 as having a bacterial infection.

In one subject, US/CA/09-871, reported by Greninger et al.2 as positive for enterovirus D68 through PCR and metagenomic NGS, we found in the NP swab sample an overwhelming presence of bacterial sequences from Haemophilus influenzae, a known cause of meningitis and neurological complications that was a common infection prior to the development of an effective vaccine.

Specifically, we identified 2,389,621 reads from H. influenzae in this subject, with the closest similarity to strain R2846. These reads comprise 93% of all microbial reads identified at the species level in the sample. Greninger et al.2 reported 2,742 reads (in their Supplementary Table 4) matching enterovirus D682 but did not report finding any H. influenzae reads from this sample. Our analysis found 1,330 reads matching enterovirus D68.

To confirm the identity of these reads, we aligned them separately to the complete genome of H. influenzae R2846, and we found that the reads completely covered the genome. Dividing the genome into 100 kilobase windows, depth of coverage varied from 266–828 reads/100Kbp, with far deeper coverage as expected at the 16S ribosomal RNA genes.

The enterovirus D68 isolated from patient US/CA/09-871 differed from the others in that it appeared in 2009, well before the 2014 outbreak, and that it grouped with Clade C, phylogenetically distinct from Clade B1 that was associated with AFM. This patient was reported2 as having respiratory illness but not AFM. The sequence evidence here suggests that the patient might have had complications from H. influenzae-associated infection, although no clinical or CSF data was available for our re-analysis.

In a second subject, US/CA/12-5837, we found a strikingly large number of reads from Staphylococcus aureus in the NP swabs. The two separate NGS files associated with this subject contained 6,858,453 and 1,343,806 reads, comprising 70% and 84% (respectively) of all non-human reads identified at the species level in each sample. The closest match was S. aureus subsp. aureus MRSA252, a methicillin-resistant strain. The coverage was deep enough, approximately 40X, that it would be possible to assemble this genome separately from the reads here (Figure 1). Greninger et al.2 reported 2,790 reads from enterovirus D68 in this subject (our analysis found 1,641) but did not report any from S. aureus.

9a946a50-8e24-41f9-9b7f-e671fb7fb2da_figure1.gif

Figure 1. Depth of read coverage of the S. aureus MRSA252 genome using reads identified in the NGS sample from subject US/CA/12-5837.

High peaks correspond to 16S rRNA genes. Red line: median coverage; blue line: mean coverage.

Patient US/CA/12-5837 was sampled in 2012, two years before the outbreak of AFM, although this patient was described in Greninger et al.2 as positive for enterovirus D68 based on clinical PCR testing and metagenomic sequencing. This patient is reported to be one of the first patients with enterovirus-D68-positive AFM2, but the sequence evidence indicates a severe S. aureus infection that might explain at least some of the patient’s symptoms. S. aureus has been implicated in neurological complications such as myelitis6 and meningitis7 by mechanisms that involve not only direct invasion into the central nervous system (CNS), but also immunopathogenic responses triggered by superantigens that can target the CNS8. At a minimum, S. aureus infection was overlooked by the previous analysis. Although the potential role of bacterial infection in the neurological disease that affected these two subjects is difficult to assess because of the lack of clinical and CSF information, its involvement as a pathogenic co-factor should be evaluated.

Human reads included in database submission

The metagenomics data (NCBI accession SRP055445) released by Greninger et al.2 comprise 43 files which cover 22 of the 48 subjects from their study (in their Supplementary Table 1); the study did not conduct NGS for all subjects. Our metagenomics pipeline identifies human reads at the same time that it searches for pathogens; therefore we scanned the data for human as well as microbial content. Greninger et al.2 reported that all human sequences had been removed from these files. We found, however, that all samples contained large numbers of human reads, ranging from a low of 18,215 to a high of 6,159,868. These comprised as few as 0.5% to as many as 95.6% of the reads in each sample, as shown in Table 1.

Table 1. Human reads found in metagenomic NGS samples from which human sequences were supposed to have been removed.

Shown are the number of reads in each sample that clearly match the human genome and do not match any microbial species. AFM: acute flaccid myelits; NP: nasopharyngeal swap; CSF: cerebrospinal fluid.

IsolateRun IDSourceNumber
of human
reads
%human
US/CA/12-5641SRR1919640NP6,159,86885.4
US/CA/12-5641SRR1919641NP1,427,49090.8
US/CA/12-5806SRR1919642NP164,87689.8
US/CA/12-5806SRR1919643CSF202,67795.5
US/CA/12-5807SRR1919644NP160,71994.1
US/CA/12-5807SRR1919645CSF383,09424.2
US/CA/12-5809SRR1919646NP65,63595.4
US/CA/12-5809SRR1919647NP456,22870.4
US/CA/12-5837SRR1919648NP4,662,95820.2
US/CA/12-5837SRR1919649NP1,251,67228.6
US/CA/14-5999SRR1919650CSF3,046,66489.9
US/CA/14-5999SRR1919651NP1,407,84271.0
US/CA/14-5999SRR1919933NP174,14068.5
US/CA/14-6000SRR1919652CSF746,83191.1
US/CA/14-6000SRR1919653NP164,6380.6
US/CA/14-6000SRR1919934NP19,4690.5
US/CA/14-6007SRR1919654CSF352,39185.4
US/CA/14-6010SRR1919655CSF426,17293.2
US/CA/14-6010SRR1919656NP1,194,58738.8
US/CA/14-6010SRR1919935NP144,39136.7
US/CA/14-6013SRR1919657NP544,27687.4
US/CA/14-6013SRR1919658NP1,636,06783.9
US/CA/14-6013SRR1919936NP213,18079.8
US/CA/14-6067SRR1919659CSF567,2633.9
US/CA/14-6067SRR1919937CSF66,0762.3
US/CA/14-6070SRR1919660CSF578,5794.3
US/CA/14-6070SRR1919938CSF88,1533.2
US/CA/14-6102SRR1919661CSF791,14382.4
US/CA/14-6102SRR1919939CSF92,72378.2
US/CO/13-60SRR1919662CSF519,45695.7
US/CO/13-60SRR1919940CSF79,47793.4
US/CO/14-86SRR1919663CSF155,05838.4
US/CO/14-86SRR1919941CSF18,21526.5
US/CO/14-88SRR1919664NP453,4113.8
US/CO/14-88SRR1919942CSF39,8992.7
US/CO/14-93SRR1919665CSF758,65096.6
US/CO/14-93SRR1919943CSF123,25095.3
US/CO/14-94SRR1919666NP835,68996.1
US/CO/14-94SRR1919944NP131,99895.2
US/CO/14-95SRR1919667CSF352,6792.8
US/CA/11-1767SRR1919639Culture1,030,90033.7
US/CA/10-786SRR1919638NP130,0440.5
US/CA/09-871SRR1919637CSF384,28511.0

The inclusion of human sequence data in the files deposited at NCBI was likely a result of a computational method (SURPI5) that was insufficiently sensitive. Although the exact cause cannot be determined here, it is well known that sequence alignment algorithms often trade speed for sensitivity; e.g., by allowing fewer mismatches, an aligner can process reads at a much higher rate, at the cost of missing some alignments. It is less clear why the very large numbers of matches to two bacteria were missed; for both these bacteria, complete genomes from multiple strains are available in GenBank. We used both the Kraken system4 and the Bowtie2 aligner9 to ensure both sensitivity and speed in our analysis.

Release of sequence data is highly valuable, if not essential, for reproducibility and validation of sequencing-based studies. Failure to filter human reads from a sample is not uncommon; a recent study10 found that Human Microbiome Project samples, from which human DNA was supposed to have been removed, contain up to 95% human sequence. This suggests that future efforts to deposit microbiome data need to employ more sensitive computational screens in order to avoid the unintentional release of human sequence data.

Methods

Sequences were extracted from SRP055445 and each file was separately run through the Kraken program version 0.10.6-beta (https://github.com/DerrickWood/kraken)4, which identifies species by comparison with a database of all 31-bp sequences in all species. The database included the human genome (version GRCh38.p2), all complete bacterial and viral genomes, selected fungal pathogens, and known laboratory vector sequences from the NCBI UniVec database (http://www.ncbi.nlm.nih.gov/tools/vecscreen/univec). Percentages of bacterial and viral reads in each sample were re-computed after excluding human and vector sequences. Reads matching more than one species were classified at the genus level or above. Reads from H. influenzae and S. aureus were re-aligned using Bowtie2 version 2.2.59, a very fast and sensitive program for alignment of NGS reads to a reference genome, with the --local option. Bowtie2 was also used to re-align all reads from US/CA/12-5837 and US/CA/09-871 to the sequence of multiple enterovirus D68 strains (GenBank accessions JX101846.1, AY426531.1, KM851231.1, KM892500.1, KM892501.1, KM881710.2, KP745751.1, KP745755.1, KP745757.1, KP745760.1, KP745764.1, KP745766.1, and KP745767.1). We report the highest number of reads matching any one of these strains.

Comments on this article Comments (5)

Version 2
VERSION 2 PUBLISHED 13 Jul 2015
Revised
  • Reader Comment 23 Jul 2015
    Charles Chiu, University of California, San Francisco, USA
    23 Jul 2015
    Reader Comment
    We would like to clarify the tables in the Supplementary Appendix of our paper:

    Table S3: summary table of NGS read counts (bacterial, viral, other from both CSF and NP)
    Table S4: ... Continue reading
  • Author Response 13 Jul 2015
    Steven Salzberg, Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, 21205, USA
    13 Jul 2015
    Author Response
    We have made two small changes to correct two minor points raised by Chiu, Greninger, and Nacacche in their comments. First we reworded a sentence about sample US/CA/09-871 to clarify ... Continue reading
Version 1
VERSION 1 PUBLISHED 02 Jul 2015
Discussion is closed on this version, please comment on the latest version above.
  • Reader Comment 06 Jul 2015
    Charles Chiu, University of California, San Francisco, USA
    06 Jul 2015
    Reader Comment
    We thank Drs. Salzberg and Pardo for their response.  Here are our replies addressing their new comments (in bulleted underline):
    • “They don’t disagree with our finding, but say that they already
    ... Continue reading
  • Author Response 06 Jul 2015
    Steven Salzberg, Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, 21205, USA
    06 Jul 2015
    Author Response
    We thank Drs. Greninger, Naccache, and Chiu for their response, which makes some valid points that we will comment on further here.  But first we wish to acknowledge that our ... Continue reading
  • Reader Comment 03 Jul 2015
    Charles Chiu, University of California, San Francisco, USA
    03 Jul 2015
    Reader Comment
    This manuscript raises two main criticisms of our paper in their re-analysis.  Here we directly address the 2 main points:
     
    1. The authors claim that bacterial reads were seen in the nasopharyngeal
    ... Continue reading
  • Discussion is closed on this version, please comment on the latest version above.
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Breitwieser FP, Pardo CA and Salzberg SL. Re-analysis of metagenomic sequences from acute flaccid myelitis patients reveals alternatives to enterovirus D68 infection [version 2; peer review: 2 approved]. F1000Research 2015, 4:180 (https://doi.org/10.12688/f1000research.6743.2)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 2
VERSION 2
PUBLISHED 13 Jul 2015
Revised
Views
52
Cite
Reviewer Report 12 Aug 2015
Yoav Gilad, Department of Human Genetics, University of Chicago, Chicago, IL, USA 
Approved
VIEWS 52
The analysis is described in detail and the findings are clear. I found the comments of the authors of the original paper of interest, and the responses of Breitwieser et al. appropriate. I do think that it would be fair to ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Gilad Y. Reviewer Report For: Re-analysis of metagenomic sequences from acute flaccid myelitis patients reveals alternatives to enterovirus D68 infection [version 2; peer review: 2 approved]. F1000Research 2015, 4:180 (https://doi.org/10.5256/f1000research.7307.r9476)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
123
Cite
Reviewer Report 15 Jul 2015
David J. Lipman, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA 
Approved
VIEWS 123
This paper provides a straightforward reanalysis of the metagenomic data presented in Greninger et. al. and  does not contradict the basic interpretation of the results in that paper.  As noted in the exchange of comments on this paper by Greninger ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Lipman DJ. Reviewer Report For: Re-analysis of metagenomic sequences from acute flaccid myelitis patients reveals alternatives to enterovirus D68 infection [version 2; peer review: 2 approved]. F1000Research 2015, 4:180 (https://doi.org/10.5256/f1000research.7307.r9479)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (5)

Version 2
VERSION 2 PUBLISHED 13 Jul 2015
Revised
  • Reader Comment 23 Jul 2015
    Charles Chiu, University of California, San Francisco, USA
    23 Jul 2015
    Reader Comment
    We would like to clarify the tables in the Supplementary Appendix of our paper:

    Table S3: summary table of NGS read counts (bacterial, viral, other from both CSF and NP)
    Table S4: ... Continue reading
  • Author Response 13 Jul 2015
    Steven Salzberg, Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, 21205, USA
    13 Jul 2015
    Author Response
    We have made two small changes to correct two minor points raised by Chiu, Greninger, and Nacacche in their comments. First we reworded a sentence about sample US/CA/09-871 to clarify ... Continue reading
Version 1
VERSION 1 PUBLISHED 02 Jul 2015
Discussion is closed on this version, please comment on the latest version above.
  • Reader Comment 06 Jul 2015
    Charles Chiu, University of California, San Francisco, USA
    06 Jul 2015
    Reader Comment
    We thank Drs. Salzberg and Pardo for their response.  Here are our replies addressing their new comments (in bulleted underline):
    • “They don’t disagree with our finding, but say that they already
    ... Continue reading
  • Author Response 06 Jul 2015
    Steven Salzberg, Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, 21205, USA
    06 Jul 2015
    Author Response
    We thank Drs. Greninger, Naccache, and Chiu for their response, which makes some valid points that we will comment on further here.  But first we wish to acknowledge that our ... Continue reading
  • Reader Comment 03 Jul 2015
    Charles Chiu, University of California, San Francisco, USA
    03 Jul 2015
    Reader Comment
    This manuscript raises two main criticisms of our paper in their re-analysis.  Here we directly address the 2 main points:
     
    1. The authors claim that bacterial reads were seen in the nasopharyngeal
    ... Continue reading
  • Discussion is closed on this version, please comment on the latest version above.
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.