Alternative miRNAs? Human sequences misidentified as plant miRNAs in plant studies and in human plasma

Background: A 2017 study reported that “Plant miRNAs found in human circulating system provide evidences of cross kingdom RNAi”. Analysis of two human blood plasma sequencing datasets was said to provide evidence for uptake of plant miRNAs into human plasma. The results were also purportedly inconsistent with contamination. Methods: Sequences from public datasets and miRNA databases were compared with results downloaded from the website of the reporting journal. Results: Only one putative plant miRNA (“peu-MIR2910) mapped consistently above background, and this sequence is found with 100% identity in a human rRNA. Several other rarer but consistently mapped putative plant miRNAs also have 100% or near 100% matches to human transcripts or genomic sequences, and some do not appear to map to plant genomes at all. Conclusions: Reanalysis of public data suggests that dietary plant xenomiR uptake is not supported, but instead confirms previous findings that detection of rare plant miRNAs in mammalian sequencing datasets is artifactual. Some putative plant miRNAs, including MIR2910 and MIR2911, may represent human sequence contamination or other artifacts in plant studies, emphasizing the need for rigorous controls and data filtering strategies when assessing possible xenomiRNAs.


Introduction
Reports of plant or other dietary miRNAs, or xenomiRs, entering mammalian circulation through the diet 1-4 generated initial excitement for the xenomiR transfer hypothesis, yet negative results of replication and reproduction studies have cast doubt on xenomiR transfer as a general mechanism 5-11 . A prominent claim of xenomiR function 1 has also failed rigorous reproduction 7 , unmasked as the result of an uncontrolled variable in the original experiment. Analyses of public datasets have revealed that studies of xenomiRs and other foreign-origin nucleic acids are fraught with artifacts: combinations of contamination, amplification or sequencing errors, permissive analysis pathways, and batch effects 8,10,12-16 . A particularly comprehensive study recently found that foreign miRNAs in human biofluids and tissues do not match human food consumption, are marked by batch effects, and are thus most parsimoniously explained as artifacts 13 . Studies of organisms with no exposure to plants have also found evidence of the same types of apparent plant contamination that plague some measurements of human samples 8,17 . Liu et al. 18 mapped sequencing data from two studies of human plasma and other samples 19,20 to various plant genomes using a 2010 plant miRNA database, PMRD 21 , concluding that previous reports of dietary xenomiR transfer are supported. In this brief report, these results are examined critically.

Methods
Plant mapping results from Liu et al. 18 (total mapped counts) were downloaded from the BMC Genomics website. Accession numbers of sequencing datasets were checked against the publications of Ninomiya et al. 19 and Yuan et al. 20 , as well as the Sequence Read Archive (SRA). Data were sorted and analyzed in Microsoft Excel for Mac 2011, Version 14.7.1. Plant miRNA sequences were obtained from miRBase 22 . Because certain plant sequences have been removed from miRBase because they have been identified as ncRNA degradation artifacts, the plant micro-RNA database (PMRD) 21 was consulted; however, repeated attempts to access the site were unsuccessful, so information was retrieved instead from miRMaid 23 or miRNEST 2.0 24 . Supplementary File 1 contains the relevant count data from the Ninomiya et al. 19 and Yuan et al. 20 studies.
An earlier version of this article can be found on bioRxiv (https://doi.org/10.1101/120634). , and MIR2911 mapped in none of the plasma samples. The single putative plant miRNA that mapped above background levels in this study was, again, peu-MIR2910 (Table 2).

Data evaluation
Lowering the threshold: still only a handful of possible xenomiRs Since only one plant miRNA appeared to map consistently above background, the inclusion threshold of Yuan et al. was relaxed to include all miRNAs with three or more mapped reads (Liu et al. data) in 10% or more of the samples from either study. These are rather permissive criteria but may at least screen out some false positives due to amplification and sequencing errors. All samples from the Ninomiya study were included, despite the fact that most were not plasma. 11 miRNAs satisfied these criteria for the Yuan et al. data (Table 1). (One low-mapping miRNA was excluded because its sequence could not be found in miRBase 22,28 , miRMaid 23 , miRNEST 2.0 24 or indeed through any searches attempted.) 10 satisfied the criteria from the Ninomiya study (Table 2), including one sequence that was part of another (compare ath-MIRf10046-akr and ath-MIRf10045-ak, Table 3). However, if only the plasma samples from the latter study are considered, three miRNAs remain (Table 2). In total, 15 putative miRNAs satisfied the permissive inclusion criteria, including five (Yuan only), four (Ninomiya only), and six (both) ( Table 3).
To miR or not to miR As miRNA discovery, validation, and annotation has advanced, numerous reported miRNAs have been reclassified as degradation fragments of other noncoding RNAs (ncRNAs). A classic example is MIR2911, a plant rRNA degradation fragment that has been misidentified as a microRNA. Interestingly, only 2 of the 15 miRNAs identified as plant miRNAs in this study are annotated in miRBase. Although some of these sequences may represent rare or unusually structured miRNAs, several are part of non-miRNA ncRNAs or other sequences that seem unlikely, at least at first  glance, to give rise to microRNAs. Among the apparently misidentified miRNAs is MIR2910, the most abundant plant miRNA identified by Liu et al. The MIR2910 sequence, UAGU-UGGUGGAGCGAUUUGUC, is found in the highly conserved and expressed large subunit (LSU) rRNA of plants, and has been specifically removed from miRBase as a non-miRNA. Even the two identified miRNAs that remain in miRBase, MIR2916 and MIR894, are not above question. A 20 nucleotide stretch of MIR2916 map to rRNA, while the full MIR894 sequence appears to be found in a variety of plant transcripts.
Human sequences in the plant database and vice-versa Curiously, several sequences did not map to the species to which they were ascribed by the PMRD 21 . Unfortunately, the PMRD could not be accessed directly during this study; however, other databases appear to provide access to its contents. Specifically, ptc-MIRf12412-akr and ptc-MIRf12524-akr did not map to Populus or to other plants. The poplar tree is also not a common dietary staple of human populations. In contrast, both sequences mapped with 100% identity and coverage to numerous human sequences (Table 3). ptc-MIRf10804-akr had numerous 100% identity human matches, plus a 1-mismatch alignment to the human miR-3929 precursor. Other miRNAs, including MIR2911, also displayed some lesser degree of matching to human transcripts or the genome. Strikingly, the putative MIR2910 sequence is not only a fragment of plant rRNA; it has a 100% coverage, 100% identity match in the human 18S rRNA (see NR_003286.2 in GenBank; Table 3). These matches of putative plant RNAs with human sequences are difficult to reconcile with the statement of Liu et al. that BLAST of putative plant miRNAs "resulted in zero alignment hit" 18 , suggesting that perhaps a mistake was made, and that the BLAST procedure was performed incorrectly.

Discussion
In mammalian studies, mapping of MIR2910 and other dubious plant miRNAs is best explained as mapping of human degradome fragments to plant RNAs that are in some cases genuine sequences but not miRNAs, and in other cases, human sequences that have contaminated plant RNA samples and databases. Re-analysis of the results of Liu et al. 18 thus echoes the recent findings of Kang, Bang-Berthelsen, and colleagues 13 , as well as previous negative findings surrounding dietary xenomiRs, summarized above. A stringent data analysis procedure, such as filtering all reads against the ingesting organism genome/transcriptome with one or two mismatches, then requiring perfect matches of remaining reads against plant or other foreign organisms, would engender higher confidence that "foreign" RNAs are not simply amplification or sequencing artifacts. Indeed, pre-mapping to the ingesting organism's genome may not be sufficient; as shown 13 , the largest number of xenomiRs in some human studies are from rodents, likely because of proximity in research laboratories. Therefore, it may be best to screen against mammalian sequences in general, and perhaps also against widespread microbe contaminants. Of course, even the most stringent analysis procedures cannot distinguish a physical contaminant from a "real" read; therefore strict process controls are also needed to assess possible contamination. In general, such controls have not been done in existing studies.
This report underlines the danger in assuming that xenomiRs in mammalian material originate from the diet. When the species and roles are reversed-for example, with the finding of human sequences in a list of poplar tree miRNAs-few analysts would conclude that poplar trees consume humans. The simplest explanation is that the sequenced plant material was contaminated with human nucleic acid. In the same way, the extremely low-level, variable, and batch-effect prone concentrations of several plant sequences in human plasma and tissue could be due to uptake from the diet, albeit at levels far too low to affect physiologic processes. However, artifact remains the simplest explanation.

Data availability
Data used in these analyses were downloaded from the supplementary materials of Liu et al. 18

Competing interests
No competing interests were disclosed.

Grant information
The author(s) declared that no grants were involved in supporting this work. Click here to access the data.