Keywords
R, databases, genomics, genetic variants, genome annotation, data mining
This article is included in the RPackage gateway.
R, databases, genomics, genetic variants, genome annotation, data mining
This new version considered interesting comments of the reviewers regarding applicability of the haploR and comparison to its analogues as well as correction some missed points during the first version, attending most of the comments raised by the reviewers.
Major changes in this version 2 are:
- Altered the Abstract and Introduction sections.
- Updated a ‘Methods’ section: only the basic examples are kept; other examples were moved to haploR-vignette (see Supplementary File S1).
- Altered a 'Conclusion and Future Work' section: we emphasised the advantages of haploR and provided clarifications regarding adding the Regulatory Elements Database.
This version 2 also includes an updated haploR-vignette as Supplementary File S1.
See the authors' detailed response to the review by Garrett M. Dancik
See the authors' detailed response to the review by Claudia Vitolo, Estibaliz Gascon and Fatima Pillosu
See the authors' detailed response to the review by Stephanie M. Gogarten
Genome wide association studies (GWAS) have produced a significant amount of data. To better understand the biological mechanisms involved in complex trait regulations, web-based tools, such as HaploReg1 and RegulomeDB2, were proposed. These tools offer a link of detected genetic variants to additional post-GWAS information about linkage disequilibrium (LD), expression quantitative trait loci (eQTL), allele frequencies (AF), protein functions, and chromatin states (for annotated single-nucleotide polymorphisms (SNP)). These tools are all web-based and require the user to do the following: open a web page, manually enter information, and obtain the results. The user needs to advise that in a number of situations, extra precautions must be made. Two examples of this would be saving the results in different file formats (TXT, CSV, XLSX, etc.,) or taking advantage of their highly-optimized search engines from custom scripts. Among a plethora of annotation packages on Bioconductor (www.bioconductor.org) and CRAN (www.cran-project.org), myvariant3, biomaRt4, rentrez5 can retrieve information about annotated SNPs. However, even rich outputs of these packages lack information about LD, eQTL, AF and haplotype blocks. We present an R package, haploR, which allows querying HaploReg and RegulomeDB web-based tools from R environment. The package connects to the web site, queries the database, and downloads results into a data frame. HaploR can easily be included in bioinformatics pipelines, which will facilitate search for SNP -phenotype associations.
We present an R package, haploR, which allows querying HaploReg and RegulomeDB web-based tools from R environment. The package connects to the web site, queries the database and downloads results into a data frame. haploR can easily be included in bioinformatics pipelines, which will facilitate search for SNP - phenotype associations.
haploR relies on HTTP methods POST and GET to query and download the content of web pages. Functions queryHaploreg(...) and queryRegulome(...) are designed to query the HaploReg (http://archive.broadinstitute.org/mammals/haploreg/haploreg.php) and RegulomeDB (http://www.regulomedb.org/), respectively. The structure of the retrieved data is described on the package website and corresponding vignette.
The package is cross-platform (Windows, macOS and Linux), without any specific computer hardware requirements. A standard computer with the most-recent version of R will handle most applications of the haploR package. Installation instructions and a list of prerequisites are provided on the package web page.
To query HaploReg, the user needs to call queryHaploreg(query, file, study, ...). This function can accept three different inputs: (1) a vector of SNPs (query); (2) a text file (file); or (3) a study (study) that can be obtained from HaploReg using getHaploregStudyList(). Parameters of these functions are directly linked to options provided at the HaploReg web page and described in the package user manual. Examples below show usage of a vector of SNPs. For other examples please refer to the package vignette.
library(haploR)
x <- queryHaploreg(query=c("rs10048158","rs4791078"))
Here parameter query represents a vector of SNPs identified with rs-IDs.
The RegulomeDB project also allows exploration of properties of SNPs and presents results in different formats: (1) plain text (vector of rs-ID) (2) BED and (3) GFF formats. The function queryRegulome(query, ...) is used to query the RegulomeDB:
x <- queryRegulome(query=c("rs4791078","rs10048158"))
Here the query is a vector of rs-IDs. The output is similar to that used in the queryHaploreg function in terms of the type of information retrieved, but specific to the RegulomeDB output. For detailed format explanations refer to the RegulomeDB web site.
haploR can be easily included to bioinformatics pipeline to streamline the process and reduce the analysis time. Its advantages over the original databases include: shorter retrieval time, the ability to present results in a user-friendly form (allowing for a more streamlined workflow,) and convenient use of needed information in reports, presentations and publications. We plan to add other tools, such as Regulatory Elements (http://dnase.genome.duke.edu/index.php), which provides the data from DNaseI hypersensitivity and microarray experiments performed in 6. Understanding the factors modulating gene expression and protein yield across individuals can be beneficial. Cell types may help discover novel mechanisms of genetic associations.
Tool available from: https://cran.r-project.org/package=haploR
Source code available from: https://github.com/izhbannikov/haploR
Archived source as at time of publication: https://cran.r-project.org/src/contrib/haploR_1.4.4.tar.gz, doi: https://doi.org/10.5281/zenodo.570956
License: GPL-3
The example script and output files for the package are available at: https://doi.org/10.5281/zenodo.570960
IYZ developed the package, evaluation/validation tests and wrote the manuscript. KA, SU and AIY contributed to the development of the package and revised manuscript. All authors read and approved the final manuscript.
This work was supported by the National Institute on Aging of the National Institutes of Health (NIA/NIH) under Award Numbers P01AG043352, R01AG046860, and P30AG034424. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIA/NIH.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
haploR-vignette. Using haploR, an R package for querying HaploReg and RegulomeDB. This file includes a description of post-GWAS analysis and the unique contribution of the haploR to it. It also includes an example of a typical analysis workflow using haploR. There is also a description of the post-GWAS web databases (HaploReg, RegulomeDB) used in the package with comprehensive examples of usage. This file also describes the data structures used in haploR.
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Competing Interests: No competing interests were disclosed.
Competing Interests: No competing interests were disclosed.
Competing Interests: No competing interests were disclosed.
Competing Interests: No competing interests were disclosed.
Competing Interests: No competing interests were disclosed.
Competing Interests: No competing interests were disclosed.
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | |||
---|---|---|---|
1 | 2 | 3 | |
Version 2 (revision) 15 May 17 |
read | read | read |
Version 1 01 Feb 17 |
read | read | read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
... Continue reading While the value of tools that allow for the more ready accession of existing databases is apparent, I have difficulty understanding precisely how the use of haploR might benefit me.
Part of this relates to the vagueness of the language that has been used in the writing presented, and part of this relates to the lack of clear examples.
I suggest that the authors consult an editor to address issues pertaining to the use of the English language. I also suggest that the authors provide a concise example of a scenario in which their software will be of benefit.
Part of this relates to the vagueness of the language that has been used in the writing presented, and part of this relates to the lack of clear examples.
I suggest that the authors consult an editor to address issues pertaining to the use of the English language. I also suggest that the authors provide a concise example of a scenario in which their software will be of benefit.