ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Software Tool Article

Building pathway graphs from BioPAX data in R

[version 1; peer review: 1 approved, 3 approved with reservations]
PUBLISHED 28 Sep 2016
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Bioinformatics gateway.

This article is included in the RPackage gateway.

Abstract

Biological pathways are increasingly available in the BioPAX format which uses an RDF model for data storage. We can retrieve the information in this data model in the scripting language R using the package rBiopaxParser, which converts the BioPAX format to one readable in R. It also has a function to build a regulatory network from the pathway information, here we describe an extension of this function. The new function will also include non-regulatory interactions in the pathway and thus allow extraction of maximum information. This function will be available as part of the rBiopaxParser distribution from Bioconductor.

Keywords

rBiopaxParser, R, pathways, BioPAX

Introduction

Biological pathways represent signaling and/or metabolic events involving protein and non-protein molecules. They are increasingly used in gene and protein expression studies to provide an aggregate score for gene sets encoding for defined biological events1. Several pathway databases, either curated or not, have adopted the BioPAX [RRID:SCR_009881] (Biological Pathway Exchange) language as a standard for pathway representation using the RDF (Resource Description Framework) data model2.

The structure of BioPAX is founded upon groupings, called classes, for physical entities and interactions with hierarchical networks of their sub-classes. Interactions between physical entities are represented such that conjoint interactions may form a specific pathway with defined, but different types of interactions between the involved physical entities. The BioPAX format is being actively developed, with BioPAX level 2 format focusing on metabolic pathways and BioPAX level 3 introducing full support for signaling pathways.

SPARQL (Simple Protocol And RDF Query Language) is a query language able to retrieve and manipulate data stored in RDF. Pathway information is often combined with statistical data analysis using tools such as R3. The rBiopaxParser [RRID:SCR_002744]4 is an R package to retrieve data stored in a BioPAX RDF format. It comes with several options that are useful to probe the data and extract specific information from it, for example participants of a pathway, stoichiometric conditions to be fulfilled for an interaction, etc.

One such option is the pathway2RegulatoryGraph (P2RG) function that converts a pathway into a graphical structure. This is extremely useful for visual representation and subsequent graph-based network analysis. The P2RG function returns the parts of a pathway that are regulated (activated or inhibited) by proteins or protein complexes. Here we present an adaptation of P2RG, called pathway2Graph (P2G) which can be used to build a graph of the entire pathway. P2G is specifically aimed at retrieving results from Reactome BioPAX level 3. We have verified P2G results by directly querying the original BioPAX data using SPARQL.

Methods

The classes of PhysicalEntity and Interaction that are used in Reactome v51 to represent information on pathways are shown in Figure 1. This graph was generated using the tool RDF2Graph5 on the Reactome Level 3 RDF file. The nodes in Figure 1 represent classes and the edges show the possible relationships, called predicates, these classes could have in the database. As depicted in Figure 1, the node Pathway will have one or more PathwaySteps that consist of different types of Interaction sub-classes. All the Interaction nodes shown in Figure 1 describe interactions between PhysicalEntities, hence are connected to them by particular types of predicates as indicated in the edge labels. The Interaction classes are interconnected because they can be dependent on each other. The Control interaction and its sub-classes (Catalysis and Modulation) represent signaling events. They regulate BiochemicalReaction and Degradation interactions which mostly represent metabolic reactions.

eed14743-b3a6-4350-aa6b-c7d21992553f_figure1.gif

Figure 1. Interplay of classes in Reactome BioPAX.

This figure shows a network of the Interaction and PhysicalEntity classes that are a part of a pathway in Reactome v51 BioPAX level 3. Nodes are classes and the directed edges are links between them in the database. The green nodes are the Pathway and PathwayStep classes, the blue nodes are Interaction classes and orange nodes are PhysicalEntity classes.

To create a regulatory graph, the P2RG function starts with the Control, Catalysis and Modulation interactions that are either activating or inhibiting other interactions. This method provides a graph with plenty of information on the regulatory components of the pathway. The nodes of this graph are physical entities like Proteins or SmallMolecules and the directed edges are either activation or inhibition events. However, interactions can be missed if they are not regulated by the Control interactions and could result in the loss of valuable information in the graphical representation of the pathway.

The new function P2G can start with any type of interaction to obtain a graph with all possible physical entities involved in the pathway. Similar to the result of the P2RG function, the P2G function gives a graph with nodes that are physical entities, but the edges are not strictly activation or inhibition events. The directed edges could represent several types of events like translocation of a protein or cleavage of DNA. In some cases there is more than one documented connection between the same physical entities. In this case only the first connection is used as an edge in the final pathway graph.

Comparison of two methods: P2G vs P2RG

The Reactome database (v51) categorizes pathways into 27 branches, we only worked with pathways that have more than one interaction, resulting in 1,666 pathways. Using P2RG, graphs for 1,548 pathways were retrieved. By using the new P2G function, we were able to retrieve information on all 1,666 pathways. The highest number of pathways were obtained, using either method, in the “Disease” category (P2RG: 3,396 pathways, P2G: 4,888 pathways). In 85% of the cases, pathways retrieved using P2G consisted of more physical entities (nodes) than those retrieved using P2RG. 19% of the pathways have at least twice the number of nodes, and 60% have at least twice the number of interactions between nodes (edges) in the P2G version compared to the P2RG version. For example, the pathway ‘Apoptosis induced DNA fragmentation’ has seven nodes when built with the P2RG function and 23 nodes when built with P2G as shown in Figure 2. Total number of nodes and edges in important Reactome categories are given in Table 1. Missing information causes the appearance of disconnected graphs when reconstructing pathways. By using the new P2G function, the percentage of disconnected pathways is reduced by 9%. Additionally, P2G also has the option of only retrieving the biggest connected component. The pathways have directed edges because most of the interactions have direction. Edges without a direction are represented as bidirectional edges.

eed14743-b3a6-4350-aa6b-c7d21992553f_figure2.gif

Figure 2. Graphs of the pathway ‘Apoptosis induced DNA fragmentation’.

Both graphs were extracted from the same BioPAX file. A) Graph recovered using the new P2G function; B) Graph recovered using P2RG function. In both panels blue nodes are proteins or protein complexes, white nodes are non-protein entities. Black encircled nodes are found in both graphs and red encircled nodes are only detected with the new P2G function.

Table 1. Numbers of nodes and edges.

The number of nodes and edges of ten different pathways (Reactome Categories) are indicated as obtained after application of P2RG and P2G on the same set of BioPAX RDF information.

Reactome CategoriesP2RG
Nodes
P2RG
Edges
P2G
Nodes
P2G
Edges
Binding and Uptake of Ligands
by Scavenger Receptors
006856
Cell-Cell communication1314142142
Disease3,3965,8784,88812,159
Gene Expression6529001,1102,450
Immune System1,4312,2332,4195,045
Membrane Trafficking86121181382
Metabolism3,0825,9223,47911,289
Signaling Pathways2,0693,2743,4307,131
Steroid hormones7214781333
Transcription2814206231,324

Conclusion

The P2G function (pathway2Graph) is currently available in the development version of rBiopaxParser package and will be part of the package in the Bioconductor 3.4 release. It is a useful addition to the rBiopaxParser package because it retrieves all the components of a pathway from the database and provides complete graphical information for both signaling as well as metabolic pathways.

Data availability

The input data for this package is the BioPAX format of any pathway database. We used the Reactome database which is freely available for download in different formats from the website www.reactome.org. A subset of this database is given as Supplementary file 1.

Software availability

Software available from: The function pathway2Graph is currently available in the development version of the R package rBiopaxParser accessible through the following commands in R.

Library (devtools)

install_github (repo = "rBiopaxParser", username = "frankkramer-lab")

Latest source code: https://github.com/frankkramer-lab/rBiopaxParser/tree/2.12.0

Archived source code as at the time of publication: http://dx.doi.org/10.5281/zenodo.616186

Software license: GPL-2

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 28 Sep 2016
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Benis N, Schokker D, Kramer F et al. Building pathway graphs from BioPAX data in R [version 1; peer review: 1 approved, 3 approved with reservations]. F1000Research 2016, 5:2414 (https://doi.org/10.12688/f1000research.9582.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 28 Sep 2016
Views
20
Cite
Reviewer Report 02 Dec 2016
Kyle Ellrott, Oregon Health & Science University, Portland, OR, USA 
Approved with Reservations
VIEWS 20
The authors of this paper describe a new function provided by the rBiopaxParser library, which is a R based system for parsing BioPax documents. BioPax is coded in RDF, which is a linked data format that describes the subject matter ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Ellrott K. Reviewer Report For: Building pathway graphs from BioPAX data in R [version 1; peer review: 1 approved, 3 approved with reservations]. F1000Research 2016, 5:2414 (https://doi.org/10.5256/f1000research.10320.r17476)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Reader Comment 28 Dec 2016
    Nirupama Benis, Wageningen University and Research, The Netherlands
    28 Dec 2016
    Reader Comment
    Thank you for your comments. We uploaded the second version of the paper before we received your comments on the paper. In the new version we have explained in more ... Continue reading
COMMENTS ON THIS REPORT
  • Reader Comment 28 Dec 2016
    Nirupama Benis, Wageningen University and Research, The Netherlands
    28 Dec 2016
    Reader Comment
    Thank you for your comments. We uploaded the second version of the paper before we received your comments on the paper. In the new version we have explained in more ... Continue reading
Views
12
Cite
Reviewer Report 24 Nov 2016
Hilary Ann Coller, Department of Molecular, Cell and Developmental Biology, University of California Los Angeles, Los Angeles, CA, USA 
Approved
VIEWS 12
The authors have developed a new function that allows the user to build a regulatory network in a graph format based on pathway information. In the version that the authors developed, the output graph includes regulatory and non-regulatory interactions and ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Coller HA. Reviewer Report For: Building pathway graphs from BioPAX data in R [version 1; peer review: 1 approved, 3 approved with reservations]. F1000Research 2016, 5:2414 (https://doi.org/10.5256/f1000research.10320.r17940)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 30 Nov 2016
    Nirupama Benis, Wageningen University and Research, The Netherlands
    30 Nov 2016
    Author Response
    Thank you for your comments.
    Competing Interests: No competing interests were disclosed.
COMMENTS ON THIS REPORT
  • Author Response 30 Nov 2016
    Nirupama Benis, Wageningen University and Research, The Netherlands
    30 Nov 2016
    Author Response
    Thank you for your comments.
    Competing Interests: No competing interests were disclosed.
Views
15
Cite
Reviewer Report 07 Nov 2016
Stephen N. Floor, Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA 
Approved with Reservations
VIEWS 15
The authors have developed a new function in the rBiopaxParser package to generate figures from BioPAX formatted biological pathway data. This new function, called pathway2Graph (P2G) replaces an older function called pathway2RegulatoryGraph (P2RG). P2G includes more interaction terms than P2RG, ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Floor SN. Reviewer Report For: Building pathway graphs from BioPAX data in R [version 1; peer review: 1 approved, 3 approved with reservations]. F1000Research 2016, 5:2414 (https://doi.org/10.5256/f1000research.10320.r17305)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 12 Dec 2016
    Nirupama Benis, Wageningen University and Research, The Netherlands
    12 Dec 2016
    Author Response
    Thank you for your comments. In the new version we have expanded the Introduction with non-technical specifics that should explain the basic differences between the functions to a broader audience. ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 12 Dec 2016
    Nirupama Benis, Wageningen University and Research, The Netherlands
    12 Dec 2016
    Author Response
    Thank you for your comments. In the new version we have expanded the Introduction with non-technical specifics that should explain the basic differences between the functions to a broader audience. ... Continue reading
Views
21
Cite
Reviewer Report 02 Nov 2016
Lynn Fink, University of Queensland, Diamantina Institute, Woolloongabba, QLD, Australia 
Approved with Reservations
VIEWS 21
This articles describes the addition of a new function to the extant rBiopaxParser R library. This new function converts a BioPAX-formatted pathway of gene or protein interactions into a graphical structure that is human-viewable. This function supercedes an earlier function ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Fink L. Reviewer Report For: Building pathway graphs from BioPAX data in R [version 1; peer review: 1 approved, 3 approved with reservations]. F1000Research 2016, 5:2414 (https://doi.org/10.5256/f1000research.10320.r17096)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 12 Dec 2016
    Nirupama Benis, Wageningen University and Research, The Netherlands
    12 Dec 2016
    Author Response
    Thank you for your comments. The Introduction has been expanded with an image (now Figure 1) to emphasize the differences between the two functions.
    Competing Interests: No competing interests were disclosed.
COMMENTS ON THIS REPORT
  • Author Response 12 Dec 2016
    Nirupama Benis, Wageningen University and Research, The Netherlands
    12 Dec 2016
    Author Response
    Thank you for your comments. The Introduction has been expanded with an image (now Figure 1) to emphasize the differences between the two functions.
    Competing Interests: No competing interests were disclosed.

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 28 Sep 2016
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.