Introduction
Functionally important non-coding RNAs (ncRNAs) are now better understood with the progress of high-throughput technologies. Discovery of the major class of ncRNAs, microRNAs (miRNAs1) has further facilitated the molecular aspects of biomedical research.
MicroRNAs are a large group of small endogenous single-stranded non-coding RNAs (17–22nt long) found in eukaryotic cells. They post-transcriptionally regulate gene expression of specific mRNAs by degradation, translational inhibition, or destabilization of the targets (transcripts of protein-coding genes)2. Esquela-Kerscher et al. have reported on miRNAs involvement in almost every regulation aspect of biological processes such as apoptosis, and stress response3. Wubin et al. demonstrated that miR-29a regulatory circuitry plays an important role in epididymal development and its functions4. Additionally, tissue-specificity of miRNAs has been shown to provide a better clue of their fundamental roles in normal physiology5.
Dysregulation of miRNAs and their ability to regulate repertoires of genes (as well as co-ordinate multiple biological pathways) has been linked to several diseases6,7. One example is chronic lymphocytic leukemia where in about 68% of the cases miRNA genes (miR15 and miR16) are missing or down-regulated8. Thus, uncovering the relations between miRNAs and diseases as well as genes/proteins is crucial for our understanding of miRNA regulatory mechanisms for diagnosis and therapy9,10.
Several databases, prediction algorithms and tools are available, providing insight into miRNA-disease and miRNA-mRNA associations. Although the detailed target recognition mechanism is still elusive, several algorithms attempt to predict miRNA targets. However, a limited precision of 0.50 and recall of 0.12 has been reported when evaluated against proteomics supported miRNA targets11. Despite the fact that these resources provide insight into miRNA-associated relationships, the majority of relations are scattered as unstructured text in scientific publications12. Figure 1 shows the growth of publications in MEDLINE and in addition depicts the normalized growth of publications that reference the keyword “microRNA”.

Figure 1. Growth of miRNA-related publications in comparison with the growth of MEDLINE.
The dotted line points out the relative increase of miRNA-related publications per year in comparison to the growth of MEDLINE (as of 31 December, 2013).
Some databases such as miR2Disease and PhenomiR store manually extracted relations from literature. The miR2Disease database13 contains information about miRNA-disease relationships with 3273 entries (as of the last update on March 14, 2011). PhenomiR14 is a database on miRNA-related phenotypes extracted from published experiments. It consists of 675 unique miRNAs, 145 diseases, and 98 bioprocesses from 365 articles (Version 2.0, last updated on February 2011). TarBase11 hosts more than 6500 experimentally validated miRNA targets extracted from literature.
However, manual retrieval of relevant articles and extraction of relation mentions from them is labor-intensive. A solution is to use text-mining techniques. Moreover, the vast majority of the research in this direction is mainly focused around extraction of protein-protein interactions15. On the contrary, miRNA relation extraction is still naive. The shift of focus towards identification of miRNA-relations is slowly establishing with the rise in systems approaches to investigate complex diseases. The manually curated database miRTarbase16 incorporates such text-mining techniques to retrieve miRNA-related articles. Recently, the miRCancer database has been constructed using a rule-based approach to extract miRNA-cancer associations from text17. As of June 14, 2014, this database contains 2271 associations between 38562 miRNAs and 161 human cancers from 1478 articles.
Related work
Text-mining technologies are established for a variety of applications. For instance, the BioCreative competition18,19 and BioNLP Shared Task20–22 series have been conducted to benchmark text mining techniques for gene mention identification, protein-protein relation extraction and event extraction, among others.
To our knowledge, only limited work has been carried out in the area of miRNA-related text-mining. Murray et al. considered miRNA-gene associations from PubMed database using semantic search techniques23. For their analysis, experimentally derived datasets were examined, combined with network analysis and ontological enrichment. Regular expressions were used to detect miRNA mentions. The authors optimized the approach to reach 100% accuracy and recall for detecting miRNAs mentions as in miRBase. Relations were identified based on a manually curated rule set. The authors extracted 1165 associations between 270 miRNAs and 581 genes from the whole MEDLINE.
The freely available miRSel12 database integrates automatically extracted miRNA-target relationships from PubMed abstracts. A set of regular expressions is used for miRNA recognition that matches all miRBase synonyms and generic occurrences. The authors reach a recall of 0.96 and precision of 1.0 on 50 manually annotated abstracts for miRNA mention identification. Further, the relations between miRNA and genes were extracted at sentence level employing a rule-based approach. They evaluated on 89 sentences from 50 abstracts resulting in a recall of 0.90 and precision of 0.65. Currently, it hosts 3690 miRNA-gene interactions11.
In contrast to the previous text-mining approaches focusing purely on miRNA gene relations, we extend the information extraction approach additionally to retrieve miRNA-disease relations. Furthermore, we evaluate our approach using a larger corpus to achieve robustness. We differentiate between actual miRNA mentions (refered to as Specific miRNAs) and co-referencing miRNAs (Non-Specific miRNAs). We evaluated three different relation extraction approaches, namely co-occurrence, tri-occurrence and machine learning based methods.
To support further research, our corpora are made publicly available in an established XML format as proposed by Pyysalo et al.24, as well as the regular expressions used for miRNAs named entity recognition. In addition, our dictionary for trigger term detection and general miRNA mention identification are made available. To our knowledge, the annotated corpora as well as the information extraction resources are the most comprehensive developed so far.
Methods
Named entities annotation
Mentions of miRNAs consisting of keywords (case-insensitive and not containing any suffixed numerical identifier) such as “Micro-RNAs” or “miRs” are annotated as Non-Specific miRNA. Names of particular miRNAs such as miRNA-101, suffixed with numerical identifiers are labeled as Specific miRNA. Numerical identifiers (separated by delimiters such as “,”, “/”, and “and”) occurring as part of specific miRNA mentions are annotated as a single entity. Box 1 depicts the annotation of specific miRNA mentions (including an example for part mentions). In addition, Disease, Gene/Protein, Species, and Relation Trigger are annotated. The detailed annotation guideline for annotating specific miRNA mentions is available as a supplementary file.
Box 1. Example of miRNAs annotations.
Here “-181b”, and “-181c” are the part mentions annotated as a single entity along with “miR-181a” in box. A non-specific miRNA mention is shown in italics.
Interesting results were obtained from . These set of brain-enriched miRNAs are down-regulated in glioblastoma. However, , and are strongly up-regulated.
Mentions of disease names, disease abbreviations, signs, deficiencies, physiological dysfunction, disease symptoms, disorders, abnormalities, or organ damages are annotated as Disease. Possessive terms such as “Diabetic patients” are not marked. Mentions referring to proteins/genes which are either single word (e.g. “trypsin”), multi-word, gene symbols (e.g. “SMN”), or complex names (including of hyphens, slashes, Greek letters, Roman or Arabic numerals) are annotated as Gene/Protein. Only those organisms that are having published miRNA sequences and annotations represented in miRBase database are labeled as Species. Any verb, noun, verb phrase, or noun phrase associating miRNA mention to either labeled disease or gene/protein term is annotated as Relation Trigger.
Relations annotation
We restrict the relationship extraction to sentence level and four different interacting entity pairs: Specific miRNA-Disease (SpMiR-D), Specific miRNA-Gene/Protein (SpMiR-GP), Non-Specific miRNA-Disease (NonSpMiR-D), and Non-Specific miRNA-Gene/Protein (NonSpMiR-GP). Relevant triples, an interacting pair co-occurring with a Relation Trigger are defined to form a relation and can belong to one of the four above mentioned Relation classes.
The annotation has been performed using Knowtator25 integrated within the Protégé framework26.
Corpus selection, annotation and properties
We develop a new corpus based on MEDLINE, annotated with miRNA mentions and relations. Out of 27001 abstracts retrieved using the keyword “miRNA”, 201 are randomly selected as training and 100 as test corpus. Two annotators have been involved in the annotation. The first annotator annotated the training corpus iteratively to develop guidelines and built the consensus annotation. The second annotator followed these guidelines and annotated the same corpus. Table 1 provides the inter-annotator agreement (measured as F1 and Cohen’s κ) for the test corpus. An example annotation is shown in Box 1.
Table 1. Inter-annotator agreement scores for the test corpus.
Annotation Class | F1 | κ |
---|
Non-specific MiRNAs | 0.9985 | 0.996 |
Specific MiRNAs | 0.9545 | 0.916 |
Genes/Proteins | 0.8343 | 0.752 |
Diseases | 0.8270 | 0.853 |
Species | 0.9329 | 0.875 |
Relation Triggers | 0.8441 | 0.798 |
Table 2 shows the number of annotated concepts in the training and test corpora for each entity class and the count for manually extracted relations (triplets), categorized for different interacting entity pairs. Table 3 provides the overall statistics of the published corpora (additional information about the corpus is given in the README supplementary file).
Table 2. Manually annotated entities statistics.
Counts of manually annotated entities in the training and the test corpora as well as annotated sentences describing relations.
Annotation Class | Corpus |
---|
Training | Test |
---|
Non-specific MiRNAs | 1170 | 336 |
Specific MiRNAs | 529 | 376 |
Genes/Proteins | 734 | 324 |
Diseases | 1522 | 640 |
Species | 546 | 182 |
Relation Triggers | 1335 | 625 |
SpMiR-D | 175 | 138 |
SpMiR-GP | 235 | 146 |
NonSpMiR-D | 132 | 61 |
NonSpMiR-GP | 89 | 15 |
Table 3. Statistics of the published miRNA corpora.
Occurrences in the corpus | Training | Test |
---|
Sentences | 1877 | 788 |
Entities | 5828 | 2480 |
Entity pairs | 1996 | 867 |
Positive entity pairs | 497 | 312 |
Negative entity pairs | 1499 | 555 |
Named entity recognition
For identification of specific miRNA mentions in text (cf. Table 4), we developed regular expression patterns using manual annotations of miRNA mentions as the basis. Similarly, a dictionary has been generated for general miRNA recognition. The regular expression patterns are represented in the format as defined by Oualline et al.27. In this work, several aliases are defined (cf. Table 5) to be used in the final regular expression patterns for specific miRNA identification, given in Table 4. Detected entities are resolved to a unique miRNA name and disambiguated (e. g. hsa-microRNA-21 to hsa-mir-21 and microRNA 101 to mir-101). Unique miRNA terms are mapped to human miRBase database identifiers through the mirMaid Restful web service. For those names where we do not retrieve any database identifiers, we fall back to another organism mention found in the abstract (if any), using the NCBI taxonomy dictionary (see below), otherwise we retain the unique name.
Table 4. Regular expression patterns used for miRNAs identification.
Aliases used to form the final regular expression, see Table 5, are highlighted in bold.
Regular expression patterns | Example of identified text |
---|
(?Pref+(?Lin,?Let)) | lin-4, hsa-let-7a-1 |
(?Pref+(?miRNA,?Onco)(?S?S*?Tail)(?Sep?Tail)*) | hsa-mir-21/22, Oncomir-17∼92 |
(?Pref+(?miRNA,?Onco)S*(?D(?Z([/]?Z)*)+) ([\,]S*? (?Pref+(?miRNA,?Onco)S*(?D(?Z([/]?Z)*)+)*) | miR-17b, -1a |
Table 5. MiRNAs regex aliases.
Aliases used in regular expression patterns for miRNAs identification (highlighted in bold).
Description | Alias | Regular Expression Pattern |
---|
Digit sequences | D | (\d?\d*) |
Admissible hypens with a trailing space | Z | ([\- ]?[\- ]*) |
Admissible hypens with a leading space | S | ([ \-]) |
3-letter prefix for human | Pref | ([hH][sS][aA]\-) |
Non-specific miRNA mentions | miRNA | ([mM][iI]([cC][rR][oO])+[rR]([nN][aA]s+)+) |
Let-7 miRNA mention | Let | ([lL][eE][tT]S*[7]?\l+) |
Lin-4 miRNA mention | Lin | ([lL][iI][nN]S*[4]?\l+) |
Oncomir miRNA mention | Onco | ([oO][nN][cC][oO][mM][iI][rR]) |
Admissible tilde and word bundaries | Cluster | (∼[\b]-[\b]-*) |
Admissible hyphen and separator and and comma | Sep | (S*((and?S,\,)?S*)+) |
Admissible combination of upper and lower case alphabets | UL | (?\l?\l+,?\u?\u+) |
Admissible alpha-numerical identifiers in specific miRNA mentions | AN | (UL((/, *and *?D+)?UL)+) |
Admissible alpha-numerical identifiers in oncomir mentions | Tail | (?D(?AN?Cluster+,\-?D?AN+)+) |
We detect Species with a dictionary-based approach. The built dictionary consists of all the concepts from the NCBI taxonomy corresponding to only those organisms mentioned in miRBase.
Similarly, for identification of Disease and Gene/Protein mentions in text we adapted a dictionary-based approach. To detect Disease, we apply three dictionaries: MeSH, MedDRA28 and Allie. For Gene/Protein, a dictionary29 based on SwissProt, EntrezGene, and HGNC is included. Gene synonyms which could be potentially tagged as miRNAs are removed to overcome redundancy. For example, genes encoding microRNA, hsa-mir-21 are named as miR-21, miRNA21 and hsa-mir-21, the gene symbol of MIR16 membrane interacting protein of RGS16 is MIR16, which can represent a miRNA mention.
The relation trigger dictionary comprises of all interaction terms from the training corpus, together with additional spelling variants (manually added to the list, also made freely available).
For all named entity recognition performed, the dictionary-based system ProMiner29 is used.
Relation extraction
We consider three approaches for addressing automatic extraction of interacting entity pairs from free text, described in the following.
The co-occurrence approach serves as a baseline. Assuming all interactions to be present in isolated sentences, this approach is complete but may be limited in precision. Reducing the number of false positives can be achieved by filtering with the dictionary of relation triggers. The rationale behind this filter is that the interaction is more likely to be described if such a term is present (we refer to that as tri-occurrence).
To increase the precision, we use a machine learning-based approach formulating the relation detection as a binary classification problem: each instance (consisting of a pair of entities) is classified either as not-containing a relation or belonging to one of the four-relation classes. Our system uses lexical and dependency parsing features. We evaluate linear support vector machines (SVM)30 as implemented in the LibSVM library, as well as LibLINEAR, a specialized implementation for processing large data sets31, and naive Bayes classifiers32. For more details, we refer to Bobić et al.33.
Lexical features capture characteristics of tokens around the inspected pair of entities. The sentence text can roughly be divided into three parts: text between the entities, text before the entities, and text after the entities. Stemming34 and entity blinding is performed to improve generalization. Features are bag-of-words and bi, tri, and quadri-gram based. This feature setting follows Yu et al. and Yang et al.35,36. The presence of relation triggers is also taken into account, using the previously described manually generated list. Next to lexical features, deep parsing provides an insight into the entire grammatical structure of the sentence37. We analyzed the vertices v (tokens from the sentence) in the dependency tree from a lexical (text of the token) and syntactical (POS tag) perspective. Edges e in the tree correspond to the information about the grammatical relations between the vertices. Extracting relevant information from the dependency parse tree is usually done following the shortest dependency path hypothesis38. Lexical and syntactical e-walks and v-walks on the shortest path are created by alternating sequence of vertices and edges, with the length of 3. We capture the information about the common ancestor vertex, in addition to checking whether the ancestor node represents a verb form (e.g. POS tag could be VB, VBZ, VBD, etc.). Finally, the length of the shortest path (number of edges) between the entities is considered as a numerical feature.
Results and discussion
Dataset 1.Manually annotated miRNA-disease and miRNA-gene interaction corpora.
See README.txt in the zip file for precise details about the corpusIn the following, we present results for named entity recognition and relation extraction. This section concludes with two use-case analyses.
Performance evaluation of named entity recognition
Among the 201 abstracts present in the training corpus, 82% contained general miRNA mentions, in comparison to specific miRNAs with 45%. In Table 6, results for miRNA entity recognition are reported. Non-specific miRNA recognition is close to perfect. Specific miRNA mention recognition has an F1 measure of 0.94.
Table 6. Evaluation results for miRNA entity classes.
Here only complete match results are presented. The performance of named entity recognition is evaluated using recall (R), precision (P) and F1 score.
Entity Class | R | P | F1
| R | P | F1
|
---|
Training Corpus | Test Corpus |
---|
Non-specific MiRNAs | 1.000 | 0.995 | 0.997 | 1.000 | 0.997 | 0.999 |
Specific MiRNAs | 0.921 | 0.928 | 0.924 | 0.936 | 0.934 | 0.935 |
Relation Triggers | 0.864 | 0.885 | 0.874 | 0.790 | 0.842 | 0.815 |
For disease mention recognition, combined dictionaries, based on three established resources, resulted in 0.79 and 0.69 F1 score for the training and test corpus respectively. Boundary matches result for the same reported 0.88 of F1, providing the possibility for detection of similar text strings for better recall. Genes/proteins dictionary showed a performance of 0.84 and 0.85 of F1 in training and test corpus respectively.
Most abstracts in the test corpus are associated to human (71), followed by mouse (16), and rat (8). Pig has 2 abstracts, zebrafish, HIV-1, HSV-1, and Caenorhabditis elegans 1 each. The evaluation of the relation trigger dictionary (cf. Table 6) suggests that it covers a substantial part of the vocabulary with recall of 0.86 for the training and 0.79 for the test corpus.
Relation extraction
Only 11.5% of miRNA-related associations occur outside the sentence level, thus, our work focused on relations at sentence level. Sentences in which co-occurring entity pairs do not participate in any relation are tagged as false. A comparison of the different relation extraction approaches is shown in Figure 2. The co-occurrence based approach leads to 100% recall for relation extraction. The recall is not diminished using the tri-occurrence approach while the precision increases between 4pp (percentage points) and 17pp when compared to the co-occurrence based approach, reducing false positives (cf. Figure 2). However, the precision reaches less than 60%. Using the machine-learning based classification, precision is increased up to 76% for specific miRNA-gene relations. The F1 measure is not substantially different but a trade-off between precision and recall can be observed. This is true for all three methods (LibLINEAR, SVM, and Naive Bayes). Most relation extraction approaches are dependent on the performance of named entity recognition. The impact of error propagation coming from an automated entity recognizers is evaluated by applying the tri-occurrence method on the automatically annotated training and test corpus, here termed as “NERTri”. Compared to the results on the gold standard entity annotation a drop of 13 pp for NonSpMiR-D, 7pp for NonSpMiR-GP, 22pp for SpMiR-D, and 30pp for SpMiR-GP in F1 is observed for the test corpus.

Figure 2. Comparison of different relation extraction approaches.
On the x-axis, different entity pair relations are represented as SpMiR-D for Specific miRNA-Disease, SpMiR-GP for Specific miRNA-Gene/Protein, NonSpMiR-D for Non-Specific miRNA-Disease, and NonSpMiR-GP for Non-Specific miRNA-Gene/Protein.
Use case analysis
For the impact analysis of the proposed approach, we compare the extracted information with two databases, namely miR2Disease and miRSel. We focus on relations and articles concerning Alzheimer’s disease.
Alzheimer’s disease (AD) is ranked sixth for causing deaths in major developed countries39. It affects not only individuals but also incurs a high cost to the society. Recently, miRNAs have shown close associations with AD pathophysiology40. Increasing the need to identify new therapeutic targets for AD, after major set backs due to failed drugs, motivates the need to look in this direction. In silico methods, such as the one proposed in this work, can aid in building miRNA-regulatory networks specific to AD, for further analysis such as identifying the mechanisms, sub-networks, and key targets.
Extracting miRNA-Alzheimer’s disease relations from full MEDLINE
The database miR2Disease is queried to return all miRNA-disease relations occurring in Alzheimer’s disease. We compare that dataset with all miRNA-disease relations from MEDLINE applying our tri-occurrence approach, retrieving 41 abstracts with 159 relations. Obtained triplets have been manually curated to remove 51 false positives. The results are summarized in Table 7. The miR2Disease database returns 28 evidences from 9 articles. Among these, only 14 evidences are present in abstracts. Moreover, 16 evidences are extracted from one full text document41. Only two evidences are identified at abstract level among these 16 evidences. Overall, 26 miRNAs identified by miR2Disease are in relation with Alzheimer’s disease. Therefore, our text-based extraction proposes approximately three times more relations than the database provides.
Table 7. miR2Disease database comparison.
MiRNA-Alzheimer’s disease relation retrieved from MEDLINE and in miR2Disease database.
| miR2Disease | NERTri | True Positives in NERTri | NERTri and miR2Disease Overlap |
---|
Publications | 9 | 41 | 36 | 8 |
Relations | 28 | 159 | 108 | 11 |
Evidences (abstracts) | 14 | 159 | 108 | 10 |
Unique miRNAs | 26 | 46 | 40 | 16 |
The analysis of 17 false negative relations which are in the database but not found by our approach shows that most of the relations could be found only in full text and that the automatic system misses four miRNA-Alzheimer’s disease relations from abstracts. Manual inspection reveals that in three out of these missing four evidences the disease name is not mentioned in the sentence (relation occurred
at co-reference level).
Extraction of miRNA-gene relations for Alzheimer’s diseases from full MEDLINE
We apply our relation detection NERTri to 100 abstracts from PubMed retrieved using the query “alzheimer disease” [MeSH Terms] OR (“alzheimer disease”[All Fields] OR “alzheimer”[All Fields]) AND (“micrornas”[MeSH Terms] OR “micrornas”[All Fields] OR “microrna”[All Fields]) AND (“2001/01/01”[PDAT]:“2013/7/4”[PDAT]). Manual inspection leads to 184 miRNA-gene relations (Table 8) in 39 abstracts.
Table 8. miRSel database comparison.
Comparison of miRNA-gene relations retrieval for Alzheimer’s disease in MEDLINE.
Approach | Articles | Relations |
---|
PubMed Query(“Alzheimer AND miRNA”) | 100 | NA |
PubMed Query with relations at sentence level | 37 | NA |
PubMed Query ∩ NERTri | 28 | 140 |
PubMed Query ∩ miRSel | 12 | 56 |
NERTri | 39 | 184 |
NERTri ∩ miRSel | 14 | 22 |
The found relations are compared with the content of miRSel. Among the 37 abstracts from the PubMed query, miRSel contained only 12 abstracts with 56 miRNA-gene relations (cf. Table 8). False negatives in our approach when compared with miRSel could not be directly identified as the database is not downloadable and searchable for disease specific relations. However, low intersection between miRSel and NERTri can be observed.
In summary, our approach provides AD related gene-microRNA relations from PubMed which have not been available in the database before.
Overall, the results are promising when compared with the miR2Disease and miRSel databases and indicate that we can extend the databases to a large extent with new relations. Such an approach makes it much easier to keep databases up to date. Nevertheless full text processing would most certainly increase the recall of automatic processing.
Conclusion
In this work, we proposed approaches for identification of named entities of classes diseases and genes/proteins and relations of those entities with miRNAs, from biomedical literature. Distinguishing between two types of miRNA mentions has enabled us to achieve better recall and precision in document retrieval and relations identification. Three different relation extraction approaches are compared, showing that the tri-occurrence based approach should be the first reliable choice among all others. The tri-occurrence based approach is comparable to a machine learning-based method but considerably faster. In comparison to two well established databases, we have shown that additional useful information can be extracted from MEDLINE using our proposed methods.
To the best of our knowledge, this is the first work where manually annotated corpora containing information about miRNAs and miRNA-relations is published. Moreover, the corpora and methods provided represent useful basis and tools for extracting the information about miRNAs-associations from literature. This work serves as an important benchmark for current and future approaches in automatic identification of miRNA relations. It provides the basis for building a knowledge-based approach to model regulatory networks for identification of deregulated miRNAs and genes/proteins.
Data availability
Corpora availability: http://www.scai.fraunhofer.de/mirnacorpora.html
Archived corpora at time of publication: F1000Research: Dataset 1. Manually annotated miRNA-disease and miRNA-gene interaction corpora, 10.5256/f1000research.4591.d3463942
Author contributions
SB, RK, JF, and MHA conceived and designed the overall research strategy. SB carried out all the development work and performed the analysis. She is the major contributor of manuscript preparation and principal annotator. TB developed the machine learning-based workflow for relation extraction, transformed corpora into the standard format, and contributed to manuscript writing. JF supported in use-case analysis and paper writing. MHA is the scientific supervisor for this work. RK contributed to critical discussions, analysed the results and a major contributor in correcting and writing manuscript. All authors read and approved the final version of the manuscript.
Competing interests
No competing interests were disclosed.
Grant information
Shweta Bagewadi was supported by University of Bonn. Tamara Bobić was partially funded by the Bonn-Aachen International Center for Information Technology (B-IT) Research School during her contribution to this work at Fraunhofer SCAI.
Acknowledgements
We would like to thank Heinz-Theo Mevissen for all the support during implementation of the dictionaries and regular expressions in ProMiner. We acknowledge Anandhi Iyappan for her contribution as the second annotator. We are also grateful to Harsha Gurulingappa for all his support and fruitful discussions during this work. We would like to thank Ashutosh Malhotra for proof reading the manuscript.
Faculty Opinions recommendedReferences
- 1.
Lee RC, Feinbaum RL, Ambros V:
The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14.
Cell.
1993; 75(5): 843–54. PubMed Abstract
| Publisher Full Text
- 2.
Bartel DP:
MicroRNAs: genomics, biogenesis, mechanism, and function.
Cell.
2004; 116(2): 281–297. PubMed Abstract
| Publisher Full Text
- 3.
Esquela-Kerscher A, Slack FJ:
Oncomirs microRNAs with a role in cancer.
Nat Rev Cancer.
2006; 6(4): 259–69. PubMed Abstract
| Publisher Full Text
- 4.
Ma W, Hu S, Yao G, et al.:
An androgen receptor-microrna-29a regulatory circuitry in mouse epididymis.
J Biol Chem.
2013; 288(41): 29369–81. PubMed Abstract
| Publisher Full Text
| Free Full Text
- 5.
Babak T, Zhang W, Morris Q, et al.:
Probing microRNAs with microarrays: tissue specificity and functional inference.
RNA.
2004; 10(11): 1813–1819. PubMed Abstract
| Publisher Full Text
| Free Full Text
- 6.
Bottoni A, Zatelli MC, Ferracin M, et al.:
Identification of differentially expressed microRNAs by microarray: a possible role for microRNA genes in pituitary adenomas.
J Cell Physiol.
2007; 210(2): 370–377. PubMed Abstract
| Publisher Full Text
- 7.
Wu X, Song Y:
Preferential regulation of miRNA targets by environmental chemicals in the human genome.
BMC Genomics.
2011; 12(1): 244. PubMed Abstract
| Publisher Full Text
| Free Full Text
- 8.
Calin GA, Dumitru CD, Shimizu M, et al.:
Frequent deletions and downregulation of micro-RNA genes miR15 and miR16 at 13q14 in chronic lymphocytic leukemia.
Proc Natl Acad Sci U S A.
2002; 99(24): 15524–9. PubMed Abstract
| Publisher Full Text
| Free Full Text
- 9.
Banno K, Yanokura M, Iida M, et al.:
Application of microRNA in diagnosis and treatment of ovarian cancer.
BioMed Res Int.
2014; 2014: 232817. PubMed Abstract
| Publisher Full Text
| Free Full Text
- 10.
Bartel DP:
MicroRNAs: target recognition and regulatory functions.
Cell.
2009; 136(2): 215–33. PubMed Abstract
| Publisher Full Text
| Free Full Text
- 11.
Vergoulis T, Vlachos IS, Alexiou P, et al.:
TarBase 6.0: capturing the exponential growth of miRNA targets with experimental support.
Nucleic Acids Res.
2011; 40(Database issue): D222–229. PubMed Abstract
| Publisher Full Text
| Free Full Text
- 12.
Naeem H, Küffner R, Csaba G, et al.:
miRSel: automated extraction of associations between microRNAs and genes from the biomedical literature.
BMC Bioinformatics.
2010; 11: 135. PubMed Abstract
| Publisher Full Text
| Free Full Text
- 13.
Jiang Q, Wang Y, Hao Y, et al.:
miR2Disease: a manually curated database for microRNA deregulation in human disease.
Nucleic acids Res.
2009; 37(Database issue): D98–104. PubMed Abstract
| Publisher Full Text
| Free Full Text
- 14.
Ruepp A, Kowarsch A, Schmidl D, et al.:
PhenomiR: a knowledgebase for microRNA expression in diseases and biological processes.
Genome Biol.
2010; 11(1): R6. PubMed Abstract
| Publisher Full Text
| Free Full Text
- 15.
Czarnecki J, Nobeli I, Smith A, et al.:
A text-mining system for extracting metabolic reactions from full-text articles.
BMC Bioinformatics.
2012; 13(1): 172. PubMed Abstract
| Publisher Full Text
| Free Full Text
- 16.
Hsu SD, Lin FM, Wu WY, et al.:
miRTarBase: a database curates experimentally validated microRNA-target interactions.
Nucleic acids Res.
2011; 39(Database issue): D163–9. PubMed Abstract
| Publisher Full Text
| Free Full Text
- 17.
Xie B, Ding Q, Han H, et al.:
miRCancer: a microRNA-cancer association database constructed by text mining on literature.
Bioinformatics.
2013; 29(5): 639–44. PubMed Abstract
| Publisher Full Text
- 18.
Smith L, Tanabe LK, nee Ando RJ, et al.:
Overview of BioCreative II gene mention recognition.
Genome Biol.
2008; 9(Suppl 2): S2. PubMed Abstract
| Publisher Full Text
| Free Full Text
- 19.
Arighi CN, Lu Z, Krallinger M, et al.:
Overview of the BioCreative III Workshop.
BMC Bioinformatics.
2011; 12(Suppl 8): S1. PubMed Abstract
| Publisher Full Text
| Free Full Text
- 20.
Nedellec C, Bossy R, Kim JD, et al.:
Proceedings of the BioNLP Shared Task 2013 Workshop. Association for Computational Linguistics, Sofia, Bulgaria, 2013. Reference Source
- 21.
Tsujii J, Kim JD, Pyysalo S:
Proceedings of BioNLP Shared Task 2011 Workshop. Association for Computational Linguistics, Portland, Oregon, USA, 2011. Reference Source
- 22.
Tsujii J:
Proceedings of the BioNLP 2009 Workshop Companion Volume for Shared Task. Association for Computational Linguistics, Boulder, Colorado, 2009. Reference Source
- 23.
Murray BS, Choe SE, Woods M, et al.:
An in silico analysis of microRNAs: mining the miRNAome.
Mol Biosyst.
2010; 6(10): 1853–62. PubMed Abstract
| Publisher Full Text
- 24.
Pyysalo S, Airola A, Heimonen J, et al.:
Comparative analysis of five protein-protein interaction corpora.
BMC Bioinformatics.
2008; 9(Suppl 3): S6. PubMed Abstract
| Publisher Full Text
| Free Full Text
- 25.
Ogren PV:
Knowtator: A Protégé plug-in for annotated corpus construction. In Proceedings of the 2006 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume: demonstrations. New York, Association for Computational Linguistics. 2006; 273–275. Publisher Full Text
- 26.
Gennari JH, Musen MA, Fergerson RW, et al.:
The evolution of Protégé: an environment for knowledge-based systems development.
Int J Hum Comput Stud.
2003; 58(1): 89–123. Publisher Full Text
- 27.
Oualline S:
Vi iMproved. New Riders Publishing, Thousand Oaks, CA, USA, 2001. Reference Source
- 28.
Brown EG, Wood L, Wood S:
The medical dictionary for regulatory activities (MedDRA).
Drug Saf.
1999; 20(2): 109–17. PubMed Abstract
| Publisher Full Text
- 29.
Fluck J, Mevissen HT, Oster M, et al.:
ProMiner: Recognition of Human Gene and Protein Names using regularly updated Dictionaries. In Proceedings of the Second BioCreative Challenge Evaluation Workshop, Madrid, Spain. 2007; 149–151. Reference Source
- 30.
Cortes C, Vapnik V:
Support-vector networks. In Machine Learning, 1995; 20(3): 273–297. Publisher Full Text
- 31.
Fan E, Chang K, Hsieh C, et al.:
LIBLINEAR: A Library for Large Linear Classification.
Machine Learning Research.
2008; 9: 1871–1874. Reference Source
- 32.
John GH, Langley P:
Estimating continuous distributions in Bayesian classifiers. In Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, UAI’95, San Francisco, CA, USA, Morgan Kaufmann Publishers Inc. 1995; 338–345. Reference Source
- 33.
Bobić T, Klinger R, Thomas P, et al.:
Improving distantly supervised extraction of drug-drug and protein-protein interactions. In Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP, Avignon, France, Association for Computational Linguistics. 2012; 35–43. Reference Source
- 34.
Porter M:
An algorithm for suffix stripping.
Program.
1980; 14(3): 130–137. Publisher Full Text
- 35.
Yu H, Qian L, Zhou G, et al.:
Extracting protein-protein interaction from biomedical text using additional shallow parsing information. In Biomedical Engineering and Informatics, 2009. BMEI ’09. 2nd International Conference on, 2009; 1–5. Publisher Full Text
- 36.
Yang Z, Lin H, Li Y:
BioPPISVMExtractor: a protein-protein interaction extractor for biomedical literature using svm and rich feature sets.
J Biomed Inform.
2010; 43(1): 88–96. PubMed Abstract
| Publisher Full Text
- 37.
De Marneffe MC, Manning CD:
Stanford typed dependencies manual. 2010. Reference Source
- 38.
Bunescu RC, Mooney RJ:
A shortest path dependency kernel for relation extraction. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, Association for Computational Linguistics. HLT ’05, Stroudsburg, PA, USA, 2005; 724–731. Publisher Full Text
- 39.
Thies W, Bleiler L,
Alzheimer’s Association:
2011 Alzheimer’s disease facts and figures.
Alzheimers Dement.
2011; 7(2): 208–244. PubMed Abstract
| Publisher Full Text
- 40.
Cheng L, Quek C, Sun X, et al.:
Deep-sequencing of microRNA associated with Alzheimer’s disease in biological fluids: From biomarker discovery to diagnostic practice.
Frontiers in Genetics.
2013; 4(150).
- 41.
Hébert SS, Horré K, Nicolaï L, et al.:
Loss of microRNA cluster miR-29a/b-1 in sporadic Alzheimer’s disease correlates with increased BACE1/beta-secretase expression.
Proc Nat Acad Sci U S A.
2008; 105(17): 6415–6420. PubMed Abstract
| Publisher Full Text
| Free Full Text
- 42.
Bagewadi S, Bobi T, Hofmann-Apitius M, et al.:
Dataset 1. manually annotated miRNA-disease and miRNA-gene interaction corpora.
F1000Research.
Data Source
Comments on this article Comments (0)