Software Tool Article
ReactomeFIViz: a Cytoscape app for pathway and network-based data analysis [v2; ref status: indexed, http://f1000r.es/4cf]
Previously titled: ReactomeFIViz: the Reactome FI Cytoscape app for pathway and network-based data analysis
Ontario Institute for Cancer Research, Toronto, Ontario M5G 0A3, Canada
DMICE, Oregon Health & Science University, Portland, Oregon 97239, USA
Section of Integrative Biology, Institute for Cellular and Molecular Biology, and Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, TX 78712, USA
Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
This project is supported by a NIH grant (2U41HG003751-05) to LS and a Genome Canada grant (OGI 5458) to LS.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
High-throughput experiments are routinely performed in modern biological studies. However, extracting meaningful results from massive experimental data sets is a challenging task for biologists. Projecting data onto pathway and network contexts is a powerful way to unravel patterns embedded in seemingly scattered large data sets and assist knowledge discovery related to cancer and other complex diseases. We have developed a Cytoscape app called “ReactomeFIViz”, which utilizes a highly reliable gene functional interaction network combined with human curated pathways derived from Reactome and other pathway databases. This app provides a suite of features to assist biologists in performing pathway- and network-based data analysis in a biologically intuitive and user-friendly way. Biologists can use this app to uncover network and pathway patterns related to their studies, search for gene signatures from gene expression data sets, reveal pathways significantly enriched by genes in a list, and integrate multiple genomic data types into a pathway context using probabilistic graphical models. We believe our app will give researchers substantial power to analyze intrinsically noisy high-throughput experimental data to find biologically relevant information.
How to cite: Wu G, Dawson E, Duong A et al. ReactomeFIViz: a Cytoscape app for pathway and network-based data analysis [v2; ref status: indexed, http://f1000r.es/4cf] F1000Research 2014, 3:146 (doi: 10.12688/f1000research.4431.2)
© 2014 Wu G et al.
This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Data associated with the article are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication).
No competing interests were disclosed.
First published: 01 Jul 2014, 3:146 (doi: 10.12688/f1000research.4431.1)
First indexed: 22 Sep 2014, 3:146 (doi: 10.12688/f1000research.4431.2)
Latest published: 12 Sep 2014, 3:146 (doi: 10.12688/f1000research.4431.2)
Amendments from Version 1
The major changes we have made to this version according to suggestions from reviewers are below:
1). Added a new paragraph in the “Results” section.
2). Added a new figure as Figure 3 for the new paragraph.
3). Added a new reference as Reference 8.
4). Changed the title to avoid confusion.
See referee responses
High-throughput experiments, which generate large and complex data sets, are routinely performed in modern biological and clinical studies to unravel mechanisms underlying complex diseases, such as cancer. However, extracting reliable and meaningful results from these experiments is usually difficult and requires sophisticated computational tools and algorithms, which are challenging for experimental biologists to comprehend. A user-friendly software tool is extremely important for both bench and computational biologists to perform high-throughput data analysis related to cancer and other complex diseases.
Many studies have shown that alterations in pathways or networks are better correlated with complex disease phenotypes than any particular gene or gene product1,2. Pathway- and network-based data analysis approaches project information about seemingly unrelated genes and proteins onto pathway and network contexts, and create an integrated view for researchers to understand mechanisms related to phenotypes of interest.
In this paper, we describe a software tool called ReactomeFIViz (also called the Reactome FI Cytoscape app or ReactomeFIPlugIn), which can be used to perform pathway- and network-based data analysis for data generated from high-throughput experiments. This tool uses the highly reliable Reactome functional interaction (FI) network3 for doing network-based data analysis. The FI network was constructed by merging interactions extracted from human curated pathways with interactions predicted using a machine learning approach. This tool can also be used to perform pathway-based data analysis by using high quality human-curated pathways in the Reactome database4, the most comprehensive open source pathway database.
We used conventional three-tier software architecture to implement ReactomeFIViz (Figure 1). The back-end contains several databases hosted in the open-source MySQL database engine (http://www.mysql.com). The middle server-side application uses hibernate (http://hibernate.org) to access the databases storing FIs and cancer gene index data (see below). The server-side application also uses the in-house developed Reactome API for Object/Relational mapping to access pathway-related contents stored in a database using the Reactome database schema. On the server-side, a lightweight servlet container, Spring Framework (http://projects.spring.io/spring-framework/), and a Java RESTful framework, Jersey (https://jersey.java.net), are used to power a RESTful API for the Cytoscape front-end. The front-end Cytoscape app uses this RESTful API to communicate with the server-side application. Almost all analysis features in the app are provided by this RESTful API, which should also facilitate their use by other front-end applications, such as a web browser or tablet app.
Figure 1. The three-tier software architecture used to implement ReactomeFIViz.
For cancer data analysis, we imported the cancer gene index (CGI, https://wiki.nci.nih.gov/display/cageneindex) data into a MySQL database and then developed a hibernate API for the server-side application. The CGI data contains annotations for cancer-related genes. These annotations were extracted by using text-mining technologies and then validated by human curators (https://wiki.nci.nih.gov/display/cageneindex/Creation+of+the+Cancer+Gene+Index).
The Reactome FI network is updated annually. We recommend using the latest version of the FI network. Different versions of the FI network may yield different results due to updates to gene interactions, so we have also deployed two older versions of the FI network to use for comparison of legacy data sets and to reproduce published results.
R (http://www.r-project.org) is used in the server-side for executing network module-based survival analysis and other statistical computations. ReactomeFIViz uses Java based methods in the server-side to call functions in R. Users of our app don’t need to install R in their machines in order to perform the statistical analyses implemented in the app.
ReactomeFIViz is designed and implemented for Cytoscape 3, and includes all features in Reactome FI Cytoscape plug-in for Cytoscape 2. Users are recommended to use the latest version of our app for Cytoscape 3.
Network analysis features
ReactomeFIViz implements multiple features for users to perform network-based data analysis, including FI sub-network construction3, network module discovery3, functional annotation3, HotNet mutation analysis5,6, and network module-based gene signature discovery from microarray data sets7. The HotNet algorithm5,6 was implemented by porting python and MatLab code of HotNet _v1.0.0 (downloaded from http://compbio.cs.brown.edu/projects/hotnet/) to Java and R. For details about other algorithms and their implementations, please refer to our previous work3,7.
The majority of interactions in the Reactome FI network are extracted from reactions and complexes. In order to display semantic meanings (e.g. catalysis, activation and inhibition) of these interactions, we created a Reactome FI network specific visual style. This visual style is registered as a service using the OSGi API supported by Cytoscape 3, and applied to newly constructed FI sub-network automatically for network analysis.
Pathway analysis features
Since version 4.0.0.beta, released in January 2014, ReactomeFIViz allows users to explore a list of high quality, human curated Reactome pathways, visualize Reactome pathways directly in Cytoscape, and perform pathway enrichment analysis on a list of genes based on a binomial test8. In April 2014, we added a new experimental feature for performing integrated pathway analysis for multiple genomic data types by adapting a factor graph based approach called “PARADIGM”9 into ReactomeFIViz.
The Reactome database contains several hundred manually laid-out pathway diagrams4. Pathway diagrams in Reactome are drawn based on biochemical reactions. A reaction usually contains multiple inputs and outputs, in addition to catalysts, inhibitors and activators. The network model in Cytoscape is designed to support simple graphs containing edges between two nodes only. In order to display Reactome pathway diagrams, we adapted the pathway diagram view in the Reactome curator tool4 into the Cytoscape environment, and wrapped it in a JInternalFrame so that a pathway view can be displayed along with a network view in the Cytoscape desktop (Figure 2).
Figure 2. A Reactome pathway diagram displayed in a Reactome diagram view.
The diagram view is wrapped in a JInternalFrame and hosted in the Cytoscape desktop.
ReactomeFIViz provides a suite of features to assist users to perform pathway- and network-based data analyses (Figure 3). Based on a list of genes loaded from a file, the user can construct a sub-network, perform network clustering to search for network modules related to patient clinical or other phenotypic information, annotate network modules, perform pathway enrichment analysis, and even model pathway activities based on probabilistic graphical models9. By performing pathway- and network-based analyses using ReactomeFIViz, researchers will be able to uncover pathway and network patterns related to their studies and then link found patterns to clinical phenotypes3,7.
Figure 3. Major features implemented in ReactomeFIViz.
Features implemented in ReactomeFIViz are roughly categorized into two types: pathway analysis features (top) and network analysis features (bottom). Input for these features can be a simple gene list contained in a text file, a mutation annotation file (MAF), a text file for gene expression data, or copy number variants (CNVs).
As an example, we present results generated from network module based analysis for the TCGA ovarian cancer mutation data10 using ReactomeFIViz. The TCGA mutation data file and clinical information file were downloaded from the Broad Institute Firehose web site https://confluence.broadinstitute.org/display/GDAC, released in July 2012. The clinical information has been pre-processed.). For this data set, we chose the 2009 version of the FI network, and picked genes mutated in three or more samples to construct a FI sub-network. We performed a network clustering, followed by survival analysis for each network module by splitting samples into two groups: samples having genes mutated in the module (Group 1) and samples not having genes mutated in the module (Group 0). Our results indicate that group 1 samples (Figure 4, green line in the Kaplan-Meier plot11) have significantly longer overall survival times compared to group 0 samples (Figure 4, red line in the Kaplan-Meier plot) (p-value = 3.4 × 10-5 based on the CoxPH analysis12) based on module 3. Pathway enrichment analysis results imply that module 3 is enriched with genes in calcium signaling pathway (http://www.genome.jp/kegg/pathway/hsa/hsa04020.html) and mitotic G2/M transition (http://www.reactome.org/cgi-bin/control_panel_st_id?ST_ID=REACT_2203.2). These results suggest that mutations impacting calcium signaling and the cell cycle may increase the survival of ovarian cancer patients. However, we may need more samples and independent data sets to validate our conclusion.
Figure 4. Module 3 generated from the TCGA ovarian cancer mutation data file is significantly related to patient overall survival.
The central main panel shows the network view of module 3. The bottom table displays pathway annotations for genes in module 3 with two pathways, Calcium Signaling Pathway and G2/M Transition, highlighted. The right panel shows survival analysis results using both the Cox proportional hazards (CoxPH) model and the Kaplan-Meier model. The Kaplan-Meier plot was added to the figure later.
Using the same version of ReactomeFIViz but different versions of the FI network may yield different results because of updates of protein interactions in the FI network. We performed the same analysis with the latest version of the FI network (the 2013 version), and found that genes in module 3 from the 2009 version of the FI network have been split among several modules discovered using the newer version of the FI network. The module having the largest overlap with module 3 from the 2009 version of the FI network has the most significant p-value from the survival analysis (p-value = 1.1 × 10-3 from CoxPH), which implies that our method is fairly robust against updates of the FI network. For details, see the supplementary results.
Our Cytoscape app provides a suite of features for users to perform network- and pathway-based analysis for data generated from multiple experiments related to cancer and other complex diseases. Users can use our tool to search for disease-related network and pathway patterns. Our tool is built upon the Reactome database, arguably the most comprehensive human curated open source pathway database, and leverages the highly reliable functional interaction network extracted from human curated pathways. Many studies based on the FI network and this app have shown its many applications to cancer and other disease studies13–16.
For future development, we will focus on using probabilistic graphical models, such as factor graphs, for performing pathway modeling and linking results to patient clinical information in order to uncover cellular mechanisms related to cancer drug sensitivity, search for cancer biomarkers, and assist new drug development.
Data files used in the example: http://reactomews.oicr.on.ca:8080/caBigR3WebApp/TCGA_OV_Firehose_MAF_CLIN_2012.zip.
Use the detailed procedures described in our user guide to reproduce the results described in the example: http://wiki.reactome.org/index.php/Reactome_FI_Cytoscape_Plugin.