ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Software Tool Article

The PathLinker app: Connect the dots in protein interaction networks

[version 1; peer review: 1 approved, 2 approved with reservations]
PUBLISHED 20 Jan 2017
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Bioinformatics gateway.

This article is included in the Cytoscape gateway.

Abstract

PathLinker is a graph-theoretic algorithm for reconstructing the interactions in a signaling pathway of interest. It efficiently computes multiple short paths within a background protein interaction network from the receptors to transcription factors (TFs) in a pathway. We originally developed PathLinker to complement manual curation of signaling pathways, which is slow and painstaking. The method can be used in general to connect any set of sources to any set of targets in an interaction network. The app presented here makes the PathLinker functionality available to Cytoscape users. We present an example where we used PathLinker to compute and analyze the network of interactions connecting proteins that are perturbed by the drug lovastatin.

Keywords

signaling pathways, pathway reconstruction, protein interaction networks, PathLinker, Cytoscape, k-shortest paths

Introduction

Signaling pathways are a cornerstone of systems biology. While several databases store high-quality representations of these pathways, they require time-consuming manual curation. PathLinker is an algorithm that automates the reconstruction of any human signaling pathway by connecting the receptors and transcription factors (TFs) in that pathway through a physical and regulatory interaction network1. In previous work, we have demonstrated that PathLinker achieved much higher recall (while maintaining reasonable precision) than several other methods1. Furthermore, it was the only method that could control the size of the reconstruction while ensuring that receptors were connected to TFs in the result. We have also experimentally validated PathLinker’s novel finding that CFTR, a transmembrane protein, facilitates the signaling from receptor tyrosine kinase Ryk to the phosphoprotein Dab2, which controls signaling to β-catenin in the Wnt pathway1. These encouraging results suggest that PathLinker may serve as a powerful approach for discovering the structure of poorly studied processes and prioritizing both proteins and interactions for experimental study.

More generally, PathLinker can be useful for connecting sources to targets in protein networks, a problem that has been the focus of many studies in the past28. Applications have included explaining high-throughput measurements of the effects of gene knockouts9,10, discovering genomic mutations that are responsible for changes in downstream gene expression11,12, studying crosstalk between different cellular processes13,14, and linking environmental stresses through receptors to transcriptional changes8.

In this paper, we describe a Cytoscape app that implements the PathLinker algorithm. We describe in detail a use case where we employ PathLinker to analyze the Environmental Protection Agency’s ToxCast data. Specifically, we compute and analyze the network of interactions connecting proteins that are perturbed in this dataset by lovastatin, a drug used to lower cholesterol. We conclude by comparing PathLinker to other path-based Cytoscape apps.

Methods

Implementation

PathLinker requires three inputs (Figure 1): a (directed) network G, a set S of sources, and a set T of targets. Each element of S and T must be a node in G. Each edge in G may have a real-valued weight. The primary algorithmic component of PathLinker is the computation of the k best-scoring loopless paths in the network from any source in S to any target in T (Figure 1). By loopless, we mean that a path contains any node at most once. The definition of the score of a path depends on the interpretation of the edge weights, as described in “Operation.” PathLinker computes the k-highest scoring paths by integrating Yen’s algorithm15 with the A* heuristic, which allows very efficient computation for very large k values, e.g., 20,000, on networks with hundreds of thousands of edges1; see Table 2 below for statistics on the running time. PathLinker outputs the sub-network composed of the k best paths.

619e782a-bf14-4a98-88a3-8deb7b2f774f_figure1.gif

Figure 1. Overview of PathLinker.

In this figure, PathLinker computes five paths from receptors (blue diamonds) to TFs (yellow squares) and ranks each node and edge by the index of the first path that contains it.

Table 1. The top 15 functional enrichment results from the ClueGO app for the Lovastatin network computed by PathLinker.

The column titled “# of Genes” displays the number of genes in the PathLinker network that are annotated to that GO term/pathway. The column titled “% Associated Genes” shows the percentage of genes annotated to that term/pathway that are in the PathLinker network.

Ontology
Source
Ontology TermCorrected
p-value
# of
Genes
% Associated
Genes
GOcellular response to peptide hormone
stimulus
3.17 × 10–21224.1%
GOresponse to insulin6.18 × 10–19204.1%
KEGGErbB signaling pathway6.95 × 10–171213.7%
GOFc receptor signaling pathway1.62 × 10–15174.0%
GOinsulin receptor signaling pathway2.62 × 10–15164.6%
KEGGAGE-RAGE signaling pathway in
diabetic complications
3.35 × 10–141110.8%
KEGGT cell receptor signaling pathway4.62 × 10–141110.5%
KEGGFocal adhesion5.26 × 10–14136.4%
KEGGChronic myeloid leukemia7.28 × 10–141013.6%
KEGGAcute myeloid leukemia5.73 × 10–13915.7%
GODNA-templated transcription, initiation1.55 × 10–12144.1%
KEGGProlactin signaling pathway5.14 × 10–12912.5%
GOpositive regulation of T cell activation8.00 × 10–12125.2%
KEGGChemokine signaling pathway2.98 × 10–11115.8%
KEGGFoxO signaling pathway3.55 × 10–11107.4%

Table 2. Time taken by the PathLinker app using lovastatin’s and each pathway’s set of sources and targets for increasing values of k.

PathwayLovastatinTNFα PathwayTGFβ PathwayWnt Pathway
# of sources34514
# of targets5447714
ktime (sec)
1003.63.34.75.2
1,0009.87.710.513.8
10,00094.386.0116.8144.4

One of the first steps in Yen’s algorithm is to compute the shortest path from T to S. Initially, we implemented this step by running Dijkstra’s algorithm after reversing G. Reversing the network using the Cytoscape API proved to be time costly. Therefore, we modified our implementation of Dijkstra’s algorithm to traverse edges from target to source. Yen’s algorithm periodically requires the temporary removal of edges from the network. However, it transpires that using the Cytoscape API to delete and add edges is inefficient. Therefore, we maintain a set of "hidden edges," which our implementation of Yen’s algorithm ignores. When PathLinker completes, the app renders the computed network using the built-in hierarchical layout, if k ≤ 200. Since this layout renders the network upside down, i.e., with source nodes at the bottom and target nodes at the top, we reflected node coordinates around the x-axis before displaying the layout.

Operation

We have implemented PathLinker in Java 7. We have tested it with Cytoscape v3.2, 3.3, and 3.4. PathLinker requires a network to be already loaded in Cytoscape. To run PathLinker on the currently selected network, the user needs to fill in the inputs and press the “Submit” button. The input panel has three sections (Figure 2(a)):

619e782a-bf14-4a98-88a3-8deb7b2f774f_figure2.gif

Figure 2. PathLinker screenshots.

(a) The input panel for the app. (b) PathLinker lovastatin results (described in “Use Case”).

  • Sources/Targets: The names of the sources and the targets, separated by spaces. If there are sources or targets that are not nodes in the network, PathLinker will warn the user, identify the errant nodes, and ask the user for permission to continue with the remaining nodes. If none of the sources or none of the targets are in the network, PathLinker will exit. There are two options here:

    • Allow sources and targets in paths: Normally, PathLinker removes incoming edges to sources and outgoing edges from targets before computing paths. If the user selects this option, PathLinker will not remove these edges. Therefore, source and target nodes can appear as intermediate nodes in paths computed by PathLinker.

    • Targets are identical to sources: If the user selects this option, PathLinker will copy the sources to the targets field. This option allows the user to compute a subnetwork that connects a single set of nodes. In this case, PathLinker will allow sources and targets to appear in paths, i.e., it will behave as if the previous option is also selected. Note that since PathLinker computes loopless paths, if the user inputs only a single node and selects this option, PathLinker will not compute any paths at all.

  • Algorithm: There are two parameters here.

    • k: the number of paths the user seeks. The default is k = 200. If the user inputs an invalid value (e.g., a negative number or a non-integer), PathLinker will use the default value.

    • Edge penalty: This value is relevant only when the network has edge weights. In the case of additive edge weights, PathLinker will penalize each path by a factor equal to the product of the number of the edges in the path and the value of this parameter. In other words, each edge in the path will increase the cost of the path by the value of this parameter. When edge weights are multiplicative, PathLinker performs the same penalization but only after transforming the weights and the edge penalty to their logarithms. The default value is one for multiplicative weights and zero for the other two cases.

  • Edge weights: There are three options for the edge weights to be used in the algorithm:

    • No weights: The score of a path is the number of edges in it. PathLinker computes the k paths of lowest score.

    • Edge weights are additive: The score of a path is the sum of the weights of the edges in it. PathLinker computes the k paths of lowest score in this case as well.

    • Edge weights are probabilities: This situation arises often with protein interactions networks, since such a weight indicates the experimental reliability of an edge. PathLinker treats the edge weights as multiplicative and computes the k highest cost paths, where the cost of a path is the product of the edge weights. Internally, PathLinker transforms each weight to the absolute value of its logarithm to map the problem to the additive case.

  • Output: The user can select a checkbox to generate a sub-network containing the nodes and edges in the top k paths. If k ≤ 200, PathLinker will display this sub-network using the built-in hierarchical layout (Figure 3). If k > 200, PathLinker will use the default layout algorithm.

619e782a-bf14-4a98-88a3-8deb7b2f774f_figure3.gif

Figure 3. A hierarchical layout of the lovastatin network computed by PathLinker.

We have mapped UniProt ids to gene names in this network.

When it completes, PathLinker opens a table containing the k paths. Each line in the table displays the rank of each path, its score, and the nodes in the path itself. The user may analyze the network computed by PathLinker using other Cytoscape apps. The next section describes a use case that further elaborates on these possibilities.

Use Case: analysis of ToxCast data for lovastatin

The Environmental Protection Agency’s (EPA) Toxicity Forecaster (ToxCast) initiative and its extension Tox21, have screened over 9,000 chemicals (such as pesticides and pharmaceuticals) using high-throughput assays designed to test the response of many receptors, TFs, and enzymes in the presence of each chemical16,17. Here we show a use case on how to integrate PathLinker with the ToxCast data to examine possible signaling pathways by which the chemical lovastatin could affect a cell.

Input datasets and pre-processing. We downloaded the “ToxCast & Tox21 Summary Files” data from the ToxCast website18. In these data, lovastatin perturbed three receptors (EGFR, KDR and TEK) and five TFs (MTF1, NFE2L2, POU2F1, SMAD1 and SREBF1). We used these proteins as the sources and targets, respectively, for PathLinker (Figure 2(a)). Rather than use the default Cytoscape human network, we used the interactome used in the original PathLinker paper1, which contained 12,046 nodes and 152,094 directed edges (http://bioinformatics.cs.vt.edu/~murali/supplements/2016-sys-bio-applications-pathlinker). We preferred this network as we had used a popular Bayesian approach12 to estimate edge weights so as to favor signaling interactions.

Running PathLinker. We used k = 50, no edge penalty (i.e., a penalty of 1), and the option for edge weights that indicated that they are like probabilities (Figure 2(a)). The results appear in Figure 2(b) and Figure 3. Each row in Figure 2(b) describes a path: its index (from 1 to k = 50), the score of the path, and the nodes in the path, ordered from receptor to TF. Note that the score of the path is the product of the weights of the edges in it, due to the edge weight option we selected. Since PathLinker prefers high-scoring paths in this case, the paths appear in decreasing order of score. Figure 3 displays a hierarchical layout of the sub-network composed of the paths computed by PathLinker.

Further analysis. We mapped the node UniProt accession number names to gene names using UniProt’s ID mapping tool (http://www.uniprot.org/uploadlists), imported the mapping results to the PathLinker network, and then changed the node labels using the Style tab. Finally we applied a hierarchical layout to the (lovastatin) sub-network and spread apart overlapping nodes to make the paths easier to visualize (Figure 3). We noted that the target MTF1 did not appear in any of the top 50 paths.

Functional Enrichment. Since the result from PathLinker is a network in the current session of Cytoscape, it is amenable for analysis by other Cytoscape apps. As an example, we demonstrate how we applied the ClueGo app for functional enrichment19 to see if the lovastatin sub-network was enriched for any Gene Ontology (GO) terms or KEGG pathways. Table 1 displays the top 15 enriched terms/pathways. Most of the paths in the PathLinker result come from the EGFR source node, so it is not surprising the ErbB signaling pathway is highly significant. We found considerable support in the literature for this pathway and other significant GO terms/pathways. Lovastatin has been shown to inhibit epidermal growth factor (EGF) and insulin-like growth factor 1 (IGF-1)20,21. Moreover, the PathLinker sub-network for lovastatin includes an interaction from EGFR to AKT1, which agrees with a study showing that lovastatin inhibits EGFR dimerization and results in the activation of AKT22. Lovastatin has also been shown to inhibit the T cell receptor pathway23, the Ras signaling pathway23, and the Fc receptor–mediated phagocytosis by macrophages24. Thus, the network computed by PathLinker for lovastatin promises to capture several possible mechanisms by which the chemical inhibits cellular pathways.

Running time. As we mentioned earlier in "Implementation," PathLinker is very efficient. In Table 2, we show the running time for the PathLinker app for lovastatin and for a representative set of signaling pathways. Even for k = 10,000, the app completed in less than 2.5 minutes for all inputs. We executed PathLinker on the same network on which we performed the lovastatin analysis.

Comparison to related Cytoscape apps

In this section, we compare PathLinker to other Cytoscape apps that compute paths in networks. A difficulty we faced in understanding the functionality of some of these apps was that they did not precisely define their output in the documentation. Therefore, we had to take recourse to studying the source code for some of these apps in order to understand precisely the properties of the computed paths. We focus the comparison mainly on these properties and not on other features of the apps.

PathExplorer. (http://apps.cytoscape.org/apps/pathexplorer) This app uses breadth first search (BFS) to compute the shortest path from a single node (that the user can select) to every other node in the network. The app can also compute the shortest path from every node in the network to a single node. Since the app uses BFS, the shortest path property is guaranteed only for unweighted networks. If there are multiple shortest paths to a node, it appears that the app will select one.

StrongestPath. (http://apps.cytoscape.org/apps/strongestpath) This app computes the “strongest” paths from a group of source nodes to a group of target nodes. The authors do not provide a definition of “strongest” paths. We describe our understanding of their algorithm now. Suppose the input network is G. Their software takes a real-valued threshold τ > 0 as input; the user can manipulate a slider to select this value. The app appears to operate as follows:

  • 1. Connect a super source s to each source in G. Connect each target to a super target t in G.

  • 2. Use Dijkstra’s algorithm to compute the shortest path in G from s to every node in G.

  • 3. Create a new network G′ with the same node set as G. For every edge (u, v) in G, add the reverse of that edge (v, u) to G′ .

  • 4. Use Dijkstra’s algorithm to compute the shortest path in G′ from t to every node in G.

  • 5. For every node v in G, record d(v) the sum of the length of the shortest s-v path in G and the length of the shortest t-v in G′ . Compute the corresponding s-t path πv that goes through v.

  • 6. Sort all the nodes in G in increasing order of d(v).

  • 7. Let a be the smallest value of d(v).

  • 8. For every node v such that d(v) ≤ a + τ, output the path πv.

In other words, for every node v, the app computes the shortest path that starts at some source node, goes through v, and ends at some target node. The number of such paths returned depends on the value of the threshold τ selected by the user. This app can operate on weighted and directed networks. We believe that the algorithm will compute the shortest path from any source to any target correctly. However, when τ > 0, it is not possible to guarantee that the algorithm will compute all paths from a source to a target of length ≤ a + τ, since the method computes at most n distinct paths, where n is the number of nodes in the network.

PesCa [25]. (http://apps.cytoscape.org/apps/pesca30) For a single node, this app computes the shortest path from that node to every other node in the network. If the user selects multiple nodes, PesCa computes the shortest path(s) between each pair of selected nodes. A useful feature is that if there are multiple shortest paths between a pair of nodes, the app computes all of them. This app focuses on shortest paths.

PathLinker. Our algorithm is strikingly different in that it allows the user to compute as many (k) shortest paths from sources to targets as desired. For example, if k = 1, PathLinker will compute the shortest path from some source to some target using Dijkstra’s algorithm on a graph with a new super source and a super target. For larger values of k, Yen’s algorithm (used by PathLinker) uses a dynamic program to mathematically guarantee the following property: if πk−1 is the (k −1)st path and πk is the kth path, then there is no source-to-target path in the graph whose length is strictly between the lengths of πk−1 and πk. The other Cytoscape apps discussed here either cannot guarantee this property (e.g., StrongestPath) or do not compute less-than-optimal paths (e.g., PathExplorer and PesCa).

Summary

We have described a new Cytoscape app that implements a mathematically rigorous, computationally-efficient, and experimentally-validated network connection algorithm called PathLinker. While we had originally developed PathLinker for reconstructing signaling pathways, the method is general enough to connect any set of sources to any set of targets in a weighted and directed network. As a specific example, we used PathLinker to compute the network of interactions connecting proteins perturbed by the drug lovastatin in the ToxCast dataset and showed how the literature supported PathLinker’s findings. The app may also be used to compute a sub-network connecting a single set of nodes. This app promises to be a useful addition to the suite of Cytoscape apps for analyzing networks.

Data and software availability

Software available from: http://apps.cytoscape.org/apps/pathlinker

Latest source code: https://github.com/Murali-group/PathLinker-Cytoscape

Archived source code as at time of publication: 10.5281/zenodo.16516226

License: GNU General Public License version 3

The original Python implementation is available at https://github.com/Murali-group/PathLinker for users who seek to integrate PathLinker directly into their own computational pipelines or want to apply PathLinker for large values of k.

Datasets: We obtained the lovastatin data from the following three files in the INVITRODB_V2_SUMMARY.zip file that we downloaded18:

  • hitc_Matrix_151020.csv

  • Chemical_Summary_151020.csv

  • Assay_Summary_151020.csv

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 20 Jan 2017
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Gil DP, Law JN and Murali TM. The PathLinker app: Connect the dots in protein interaction networks [version 1; peer review: 1 approved, 2 approved with reservations]. F1000Research 2017, 6:58 (https://doi.org/10.12688/f1000research.9909.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 20 Jan 2017
Views
21
Cite
Reviewer Report 22 Mar 2017
Stefan Wuchty, Department of Computer Science, Center for Computational Science, University of Miami, Coral Gables, FL, USA 
Approved
VIEWS 21
The manuscript 'The PathLinker app: Connect the dots in protein interaction networks' by Gil, Law and Murali introduces a Cytoscape app that allows the user to apply their PathLinker algorithm to find potential signaling pathways from a user-defined set of ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Wuchty S. Reviewer Report For: The PathLinker app: Connect the dots in protein interaction networks [version 1; peer review: 1 approved, 2 approved with reservations]. F1000Research 2017, 6:58 (https://doi.org/10.5256/f1000research.10680.r20285)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
25
Cite
Reviewer Report 13 Mar 2017
Tamás Korcsmáros, Earlham Institute, Norwich, UK 
David Fazekas, Eotvos Lorand University, Budapest, Hungary 
Approved with Reservations
VIEWS 25
The paper of Gil et al. describes a new Cytoscape App, Pathlinker, which is the Cytoscape implementation of the previously published approach by the Murali group with the same name. It is always useful for the community to implement network ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Korcsmáros T and Fazekas D. Reviewer Report For: The PathLinker app: Connect the dots in protein interaction networks [version 1; peer review: 1 approved, 2 approved with reservations]. F1000Research 2017, 6:58 (https://doi.org/10.5256/f1000research.10680.r20283)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
35
Cite
Reviewer Report 01 Feb 2017
Barry Demchak, Department of Medicine, University of California, San Diego, La Jolla, CA, USA 
Approved with Reservations
VIEWS 35
This paper describes the PathLinker Cytoscape app, including the mathematical algorithms and a comparison to similarly-focused Cytoscape apps. It is well written and address the important problem of deducing relationships that can advance biology.

It is very ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Demchak B. Reviewer Report For: The PathLinker app: Connect the dots in protein interaction networks [version 1; peer review: 1 approved, 2 approved with reservations]. F1000Research 2017, 6:58 (https://doi.org/10.5256/f1000research.10680.r19605)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 20 Jan 2017
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.