ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Software Tool Article

Pharmosome: an integrative and collective database for exploration and analysis of single nucleotide polymorphisms associated with disease

[version 1; peer review: 2 approved with reservations]
PUBLISHED 10 Jan 2020
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

Abstract

Current single nucleotide polymorphism (SNP) databases are limited to a narrow set of SNPs, which has led to a lack of interactivity between different databases, limited tools to analyze and manipulate the already existing data, and complexity in the graphical user interface. Here we introduce Pharmosome, a web-based, user-friendly and collective database for more than 30,000 human disease-related SNPs, with dynamic pipelines to explore SNPs associated with disease development, drug response and the pathways shared between different genes related to these SNPs. Pharmosome implements several tools to design primers to detect SNPs in large genomes and facilitates analysis of different SNPs to determine relationships between them by aligning sequences, constructing phylogenetic trees, and providing consensus sequences illustrating the connections between SNPs. Pharmosome was written in the Python programming language using the Django web framework in combination with HTML, CSS, and JavaScript to receive user inputs, and process and export the sorted result to the interface. Pharmosome is available from: https://pharmosome.herokuapp.com/.

Keywords

SNP, Disease, Python, Bioinformatics

Introduction

With the extending and deciphering of genomic data produced by sequencing technologies and the Human Genome Project1, much information has been discovered, such as exons, introns, domains, coding sequence, and non-coding sequences. Mutated signals have attracted tremendous attention because of their crucial impact in altering gene expression, in particular, single nucleotide polymorphisms (SNPs)2, which also act as gene molecular-markers for an associated trait. Additionally, the existence of a specific SNP can indicate precisely what disease is foreseen and the possible drugs to treat it, which is considered the ultimate goal of pharmacogenomics.

Pharmacogenetics, or its inclusive version pharmacogenomics (since it covers proteomic, genomic, epigenomic and transcriptomic effects on disease and drug response), has produced a vast body of research since 19973, due to its key importance in personalized medicine through investigating how far genetic variations (e.g. SNPs) are involved in disease development and determining drug targets. Thereby the safety and efficacy of an individualized drug therapy can be improved. Since the relationships between molecular data pertaining to patients and their disease phenotype are complex and difficult to determine manually, scientists have begun to develop and enrich the bioinformatics knowledge base with more sophisticated and accurate molecular tools to detect genetic variations for example, SNPector is a recent tool developed by the authors to detect SNP effect in drug response and disease development4. This will allow interpretation of how these tiny variations may cause direct errors, e.g. X-SNP is a common SNP type that gives rise to premature termination codons that halt gene expression5. Some SNP types involved in alteration of the protein production process cause disease6, while other regulatory SNPs7 may disrupt pathways causing cascade errors that lead to collapse of a whole pathway thereby causing disease development.

Elucidations about SNPs play a central role in providing recommendations for practicing physicians. In addition, a wide range of research fields have arisen out of pharmacogenomics, such as vaccinomics, which study aberrant immune responses to vaccines based on genetic makeup. For example, a specific SNP in the TLR3 gene was found to be responsible for the reduction of humoral immune responses and cell-mediated immunity to the measles vaccine8. Nutrigenomics is the science of gene–nutrient interactions, which involves research methods and clinical implementation to detect and treat nutrient-related diseases. One of the best known examples of nutrigenomics is lactase persistence, in which the gene encoding lactase is expressed past weaning. Lactase persistence in Europeans is caused by a polymorphism called “C-13910T” in the lactase phlorizin hydrolase gene promoter9, and the lack of C-1390T in many adults can lead to severe gastrointestinal discomfort and diarrhea resulting from ingesting milk due to the inability to metabolize lactose10.

Here we introduce Pharmosome, a web-based, user-friendly and collective database for more than 30,000 human disease-related SNPs, with dynamic pipelines to explore SNPs associated with disease development, drug response and the pathways shared between different genes related to these SNPs. Pharmosome implements several tools to design primers to detect SNPs in large genomes and facilitates analysis of different SNPs to determine relationships between them by aligning sequences, constructing phylogenetic trees, and providing consensus sequences illustrating the connections between SNPs.

Methods

Implementation

We collected SNP-related data (e.g. SNP ID, annotation, pathway and phenotypes) from PharmGKB11, NCBI12,13, Ensemble14, DiseaseEnhancer15, GeneCards16 and Reactome17 in tab-delimited format in order to link between different available information. The collected data were categorized into four main sub-databases: SNP, Gene, Chemical and Disease. The Python 3+ programming language was utilized to read, select, filter and sort the data and to link the Python scripts with HTML and JavaScript codes.

Data collection. About 50% of the data was downloaded from PharmGKB, which is considered the most common database for SNP annotation. The data comprises the associated phenotype, the clinical perspectives and, considering storage space limitations, the remaining data are imported on-demand to use later by a set of Python functions we built to get access to the Application Programming Interface (API) of different databases. These other databases include GeneCards, DiseaseEnhancer, Ensembl and Reactome; thereby a user can return specific data using preset IDs and then export this information to the HTML interface. The number of data entries collected by Pharmosome is shown in Table 1.

Table 1. Number of data entries collected from each database used by Pharmosome.

Type of dataNCBIEnsembleReactomePharmGKBDiseaseEnhancerGeneCardsKEGGDrugBank
Gene18,9750027,0001,05913,50020,1090
Transcript0381,060000000
Pathway002,25600000
Disease0003,5461,05902,2870
Chemical000 3,39300011,926
SNP440,000,0000010,7800000
Annotation10,84502,2563,3930020,1090

Data fetching and exporting. We constructed a Python module containing 12 functions. These functions connect in different ways to import the data from the databases above either from tab-delimited files or using APIs. User-requested information is extracted and exported to the Pharmosome web interface. The Django web framework was used to build the Pharmosome web interface with HTML programming language. We used Django to build functions that receive the user input from the Pharmosome interface, process requests using a Python script, and finely export the result to the interface.

return render(request, 'WebInterface.html', context={})

Operation

Google Chrome browser is recommended to use the Pharmosome web interface (https://pharmosome.herokuapp.com/), but other internet browsers can also be used.

Functions of Pharmosome

SNP sub-database. In the SNP sub-database, users can enter the ID of a SNP (e.g. rs141033578) and receive output data about its related gene, chromosomal location, gene bands, summary of the normal function of the gene, the gene part responsible for enhancing the disease occurrence, pathway of defective gene (if available), gene transcripts and different splicing variants. Pharmosome also provides information that can be used to retrieve data of recent studies (specifically, SNP reference nucleotide and alternative nucleotide data and an explanation of how the SNP contributes to disease development and drug response). For example, an SNP in GSTP1 was associated with overall survival in 107 patients with metastatic colorectal cancer who received 5-FU/oxaliplatin combination chemotherapy that caused the replacement of isoleucine with valine at amino acid position 105 of the protein, which is known to substantially diminish enzyme activity18.

Disease sub-database. The Disease sub-database allows users to search for disease data collected from KEGG Disease and PharmGKB databases by entering the disease name. The search result is a list of genes responsible for the disease, the ID of the SNP occurring in this gene, a description of clinical annotations related to this gene and the gene-specific chemical used in the treatment. The sub-database can be used, for example, to identify the SNP, gene and chemical related to coronary artery disease caused by a high frequency of a particular polymorphism in the PLA2 gene. This gene encodes glycoprotein IIIa, which is associated with a high prevalence of premature myocardial infarction19,20

Chemical sub-database. The Chemical sub-database has data from four sources: KEGG, PharmGKB, DrugBank and ChemSpider. After users enter a chemical name, the output is a list of the chemical name, trend and generic names, structure, description and pharmacodynamics. For example, by inputting bortezomib (a proteasome inhibitor), users receive output data relating to the clinical success of bortezomib, which established the ubiquitin (Ub)+proteasome system as a key therapeutic target in multiple myeloma21,22.

Gene sub-database. The Gene sub-database links between different sources (NCBI GenBank, Ensemble, Reactome and DiseaseEnhancer). In particular, the DiseaseEnhancer dataset represents a new approach that determines the gene part that is responsible for enhancing occurrence of disease. NCBI provides information about gene name, location, a summary of gene function and chromosomal location. Reactome displays the pathway in which genes are involved and gives an overview description. The Gene sub-database can be used to identify the pathway, splicing variants, disease enhancing region, genomic and proteomic expression profile or even to get general information. An example of this would be looking at the SLCO1B1 gene, for which various groups tested the hypothesis of whether polymorphisms in SLCO1B1 affect pharmacokinetics and the effects of drugs in humans23.

SNP collector. The SNP collector is a tool within Pharmosome that is designed to find all SNPs present on the gene, related to disease, associated with the chemical compound. Users can choose between different options to collect SNPs and their clinical annotations and chemicals related to each SNP. This tool can be used to find SNPs or other information. For example, it could be used to detect the bond work which occurs on location 118 of the mu opioid gene, which was 3 fold more vigorous than the wildtype in its interaction with b-endorphin Other regulatory-SNPs in this region of the gene can be linked to other phenotypes24.

Pick primer. The detection of SNP existence in a DNA sample taken from patients depends on designing an appropriate primer. Primers should be compatible with the flanking sequences of SNP. As the presence of an SNP in the genome may result in disease and affect the choice of drug, there is a need to detect the presence of SNP e.g. for early disease diagnosis. The pick primer tool within Pharmosome has the important function of designing primers to detect an SNP in the genome by retrieving the SNP sequence record from the NCBI database, locating the SNP position and designing primers 50 bases before, after and within the SNP sequence.

SNP phylogeny. As discussed in previous sections, there is always some relationship between different SNPs due to the complicated interaction network between different genes. In order to determine how these SNPs are related to each other, the SNP phylogeny tool constructs a phylogenetic tree that illustrates the relationships between different SNPs25 by downloading SNP and flanking sequences and commencing multiple sequence alignment to determine how far each sequence is related to others. This function could be used clarify connections in studies, such as that of Thompson et al. that showed an association of 43 SNPs in 16 genes with the response drug of atorvastatin26.

Workflow

Figure 1 shows the flow of information to meet the needs of users.

6a28f1f7-d46c-448a-8a50-fe7463637c79_figure1.gif

Figure 1. Pharmosome workflow illustrating the processing in the background of the database and the relationships and links between different interface webpages.

Use case

Pharmosome deploys seven sub-databases and tools. Our approach during the building of Pharmosome, is to achieve the easiest usage. We designed each tool and sub-database to receive the user input with minimum required parameters (as shown in Table 2). Users enter the target input they require and the Pharmosome interface will automatically redirect to another page that shows the user the output results. Figure 2Figure 5 show output on the Pharmosome web interface.

Table 2. Use cases for Pharmosome detailing the input required in the web interface and a summary of the output of Pharmosome.

FunctionInput requiredInput exampleOutput description
SNP
sub-database
SNP IDrs75527207SNP related disease and drug
response annotation
Disease
sub-database
Disease NameHeart FailureGene involved in disease, chemical
used in treatment, and SNP causing
the disease (Figure 1)
Chemical
sub-database
Chemical NameIvacaftorDescription of target disease
(Figure 2)
Gene
sub-database
Gene SymbolCFTRGene annotation (Figure 3)
SNP collectorGene, Chromosome,
Disease, or Chemical
CFTR, 11, Heart
Failure, or Ivacaftor
List of SNPs (Figure 4)
Pick primerSNP IDrs75527207Forward and reverse primer
SNP
phylogeny
SNP Listrs113993960
rs2853741 rs1045642
rs3857532 rs10817464
rs16969968
Illustration shows the phylogenetic
tree of the input SNPs
6a28f1f7-d46c-448a-8a50-fe7463637c79_figure2.gif

Figure 2. Disease sub-database output on Pharmosome web interface after input of “Heart failure”.

The results consist of list of drop-down menus. Each menu describes disease annotation and the associated gene to the disease.

6a28f1f7-d46c-448a-8a50-fe7463637c79_figure3.gif

Figure 3. Chemical sub-database output on Pharmosome web interface after input of “Ivacaftor”.

The results consists of a list of drop-down menus. Each menu describes Drug/Chemical annotations.

6a28f1f7-d46c-448a-8a50-fe7463637c79_figure4.gif

Figure 4. Gene sub-database output on Pharmosome web interface after input of “CFTR”.

The results consist of a navigation bar, and each button expands to a different annotation.

6a28f1f7-d46c-448a-8a50-fe7463637c79_figure5.gif

Figure 5. SNP collector output on Pharmosome web interface after input of “CFTR”.

The results consists of a list of SNPs located on this gene.

Summary

In this study, we introduce Pharmosome, an integrative and collective database for exploring and analysing human SNPs and the associated disease and drug response. Our tool deploys various functions to determine the relationships between different SNPs, construct the consensus sequence between different SNPs and to determine the pathways shared between different genes. Pharmosome also includes sub-databases to simplify, link and display data about gene functions, pathways, transcriptomes of genes, different splicing variants, clinical annotation, chemical structures and annotations of chemicals involved in the disease. The returned data are informative, user-friendly and easy to navigate. Pharmosome was written in Python 3.5, HTML and CSS with the implementation of Django (Python library) to design links between Python scripts and other languages.

Software availability

Pharmosome web interface: https://pharmosome.herokuapp.com/

Source code available from: https://github.com/peterhabib/Pharmosome_Web

Archived source code as at time of publication: http://doi.org/10.5281/zenodo.358319127.

License: MIT

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 10 Jan 2020
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Habib PT, Alsamman AM, Hassanein SE et al. Pharmosome: an integrative and collective database for exploration and analysis of single nucleotide polymorphisms associated with disease [version 1; peer review: 2 approved with reservations]. F1000Research 2020, 9:14 (https://doi.org/10.12688/f1000research.21773.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 10 Jan 2020
Views
4
Cite
Reviewer Report 18 Nov 2021
Mulin Jun Li, Department of Epidemiology and Biostatistics, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin, 300070, China 
Approved with Reservations
VIEWS 4
In this manuscript, the authors developed Pharmosome, a web-based, user-friendly and collective annotation database for exploring and analyzing SNPs associated with disease development, drug response and the pathways shared between different genes related to these SNPs. Pharmosome is a friendly ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Li MJ. Reviewer Report For: Pharmosome: an integrative and collective database for exploration and analysis of single nucleotide polymorphisms associated with disease [version 1; peer review: 2 approved with reservations]. F1000Research 2020, 9:14 (https://doi.org/10.5256/f1000research.24001.r98627)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
9
Cite
Reviewer Report 19 Aug 2021
Fuyi Li, Department of Microbiology and Immunology, Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, Vic, Australia 
Approved with Reservations
VIEWS 9
This study developed a novel web-based database, Pharmosome, for human disease-related SNPs. The web page of the database is well-designed. The manuscript is well written and easy to follow.

I have several comments and suggestions:
    ... Continue reading
    CITE
    CITE
    HOW TO CITE THIS REPORT
    Li F. Reviewer Report For: Pharmosome: an integrative and collective database for exploration and analysis of single nucleotide polymorphisms associated with disease [version 1; peer review: 2 approved with reservations]. F1000Research 2020, 9:14 (https://doi.org/10.5256/f1000research.24001.r90928)
    NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

    Comments on this article Comments (0)

    Version 1
    VERSION 1 PUBLISHED 10 Jan 2020
    Comment
    Alongside their report, reviewers assign a status to the article:
    Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
    Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
    Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
    Sign In
    If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

    The email address should be the one you originally registered with F1000.

    Email address not valid, please try again

    You registered with F1000 via Google, so we cannot reset your password.

    To sign in, please click here.

    If you still need help with your Google account password, please click here.

    You registered with F1000 via Facebook, so we cannot reset your password.

    To sign in, please click here.

    If you still need help with your Facebook account password, please click here.

    Code not correct, please try again
    Email us for further assistance.
    Server error, please try again.