Keywords
R, databases, genomics, genetic variants, genome annotation, data mining
This article is included in the RPackage gateway.
R, databases, genomics, genetic variants, genome annotation, data mining
Genomic experiments, including genome wide association studies (GWAS), produced and continue to produce a huge amount of data. To better understand the biological mechanisms involved in regulation complex traits, this information requires further analysis. Large projects, such as ENCODE1, are devoted to bring together accumulated knowledge about different functional and regulatory elements that control cells’ functioning. These projects manage such data to facilitate collaboration between researchers working in the area of genetics of complex traits.
There exists a set of web-based tools, such as HaploReg2 and RegulomeDB3, which offer a link of detected genetic variants to additional post-GWAS information. These include information about linkage disequilibrium (LD), expression quantitative trait loci (eQTL), allele frequencies, protein functions, chromatin states, etc., for annotated genetic variants. These tools are web-based, which requires the user to open a web page, manually enter information and obtain the results of such linking in a certain format.
In a number of situations, a user needs to have additional flexibility in working with such tools. For example, saving the results of such analyses in different file formats for further use. This can be provided using various kinds of computer languages available in Modern Bioinformatics and Computational Biology, including R, Python, Perl and other high-level languages and computational platforms. Among them, R language is one of the leaders, since it is free and offers a large set of packages to facilitate bioinformatics analysis.
We present an R-package, haploR, which allows for querying HaploReg and RegulomeDB web-based tools. The package connects to the corresponding web site, queries the database and downloads results in the form of a data frame or a file. The package can easily be included in bioinformatics pipelines, which will, in turn, facilitate analysis for rapid single nucleotide variant (SNP)/gene - phenotype association discovery.
The R-package haploR relies on HTTP methods POST and GET to query, download and parse the content of web pages. Functions queryHaploreg(...) and queryRegulome(...) are designed to obtain data from the resources HaploReg (http://archive.broadinstitute.org/mammals/haploreg/haploreg.php) and RegulomeDB (http://www.regulomedb.org/), respectively.
To query HaploReg and download the results, the user needs to call queryHaploreg(query, file, study, ...) function. This function can accept three different inputs: (1) a vector of SNPs (query); (2) a text file (file); or (3) a study (study). Other parameters are directly linked to query options (see HaploReg web page) and described in the package user manual. Output of this function is a table with column names identical to those used in HaploReg. Examples below show usage of these options.
library(haploR)
queryHaploreg(query=c("rs10048158","rs4791078"))
Here parameter query represents a vector of rs-IDs.
In this example, SNPs are stored in a text file, one SNP per line. In this case, to call queryHaploreg, the user has to execute the following command:
queryHaploreg(file=system.file("extdata/snps.txt", package="haploR"))
Here file represents a path to the file with SNPs.
HaploReg offers an option to use data from study done in the past. To use this option, the user should first obtain a list of studies and then use a particular study as a parameter:
#Get a list of studies
studies <- getStudyList()
#Query Hploreg
queryHaploreg(study=studies[[2]])
Other options, such as a source for epigenomes, mammalian conservation algorithm, and others are also available; see the package’s user manual (https://cran.r-project.org/web/packages/haploR/haploR.pdf) and vignette (https://cran.r-project.org/web/packages/haploR/vignettes/haplor-vignette.html) for correct use.
The RegulomeDB project also allows exploration of properties of SNPs and presents results in different formats: (1) plain text (2) BED and (3) GFF formats. The function queryRegulome(query, format) is used to query the RegulomeDB:
queryRegulome(query=c("rs4791078","rs10048158"), format="full")
Here the query is a vector of rsIDs and format is an output format provided by the RegulomeDB web site. The output of this function is similar to that used in the queryHaploreg function, but has columns that correspond to the RegulomeDB output.
Here, we present a new package haploR, which currently allows querying web tools HaploReg and RegulomeDB. We plan to add other web-based tools, such as Regulatory Elements DB (http://dnase.genome.duke.edu/index.php), which provides the data from DNaseI-hypersensitivity and Affymetrix microarray experiments performed in 4.
Tool available from: https://cran.r-project.org/package=haploR
Source code available from: https://github.com/izhbannikov/haploR
Archived source as at time of publication: doi, https://doi.org/10.5281/zenodo.2599965; https://cran.r-project.org/src/contrib/haploR_1.4.1.tar.gz
License: GPL-2 | GPL-3
The example script and output files for the package are available at: https://doi.org/10.5281/zenodo.2600396
IYZ developed the package, performed evaluation/validation tests and wrote the manuscript. KA, AIY contributed to the development of the package. KA, AIY revised the manuscript and gave comments helpful to finalize it. All authors read and approved the final manuscript.
This work was supported by the National Institute on Aging of the National Institutes of Health (NIA/NIH) under Award Numbers P01AG043352, R01AG046860, and P30AG034424. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIA/NIH.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Competing Interests: No competing interests were disclosed.
Competing Interests: No competing interests were disclosed.
Competing Interests: No competing interests were disclosed.
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | |||
---|---|---|---|
1 | 2 | 3 | |
Version 2 (revision) 15 May 17 |
read | read | read |
Version 1 01 Feb 17 |
read | read | read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
... Continue reading While the value of tools that allow for the more ready accession of existing databases is apparent, I have difficulty understanding precisely how the use of haploR might benefit me.
Part of this relates to the vagueness of the language that has been used in the writing presented, and part of this relates to the lack of clear examples.
I suggest that the authors consult an editor to address issues pertaining to the use of the English language. I also suggest that the authors provide a concise example of a scenario in which their software will be of benefit.
Part of this relates to the vagueness of the language that has been used in the writing presented, and part of this relates to the lack of clear examples.
I suggest that the authors consult an editor to address issues pertaining to the use of the English language. I also suggest that the authors provide a concise example of a scenario in which their software will be of benefit.