MSF: Modulated Sub-graph Finder

Mariam R. Farman; Ivo L. Hofacker; Fabian Amman

doi:10.12688/f1000research.16005.1

Home Browse MSF: Modulated Sub-graph Finder

ALL Metrics

Views

Downloads

Get PDF

Get XML

Export

▬

✚

Software Tool Article

MSF: Modulated Sub-graph Finder

[version 1; peer review: 1 approved, 1 approved with reservations, 1 not approved]

Mariam R. Farman¹, Ivo L. Hofacker¹, Fabian Amman^1,2

PUBLISHED 29 Aug 2018

Author details Author details

¹ Institute for Theoretical Chemistry,Theoretical Biochemistry Group,, University of Vienna, Vienna, 1090, Austria
² Department of Chromosome Biology, Max F. Perutz Laboratories,, University of Vienna, Vienna, 1030, Austria

Mariam R. Farman
Roles: Data Curation, Investigation, Methodology, Software, Validation, Visualization, Writing – Original Draft Preparation

Ivo L. Hofacker
Roles: Conceptualization, Funding Acquisition, Project Administration, Resources, Supervision, Writing – Review & Editing

Fabian Amman
Roles: Conceptualization, Project Administration, Supervision, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

Abstract

High throughput techniques such as RNA-seq or microarray analysis have proven to be invaluable for the characterization of global transcriptional gene activity changes due to external stimuli or diseases. Differential gene expression analysis (DGEA) is the first step in the course of data interpretation, typically producing lists of dozens to thousands of differentially expressed genes. To further guide the interpretation of these lists, different pathway analysis approaches have been developed. These tools typically rely on the classification of genes into sets of genes, such as pathways, based on the interactions between the genes and their function in a common biological process. Regardless of technical differences, these methods do not properly account for cross talk between different pathways and rely on binary separation into differentially expressed gene and unaffected genes based on an arbitrarily set p-value cut-off. To overcome this limitation, we developed a novel approach to identify concertedly modulated sub-graphs in the global cell signaling network, based on the DGEA results of all genes tested. Thereby, expression patterns of genes are integrated according to the topology of their interactions and allow potentially to read the flow of information from the perturbation source to the effectors. The described software, named Modulated Sub-graph Finder (MSF) is freely available at https: //github.com/Modulated-Subgraph-Finder/MSF.

Keywords

Differential gene expression analysis, pathway analysis, combining p-value, cell signalling network

Corresponding author: Ivo L. Hofacker

Competing interests: No competing interests were disclosed.

Grant information: This work was funded by the FWF (“Fonds zur Förderung der wissenschaftlichen Forschung”) within the project Internationalen Kooperationsprojektes - Intl cooperation Project (Joint Project - Lead Agency Verfahren) [I 1988-B22]. The grant was assigned to ILH. FA was funded by the Austrian Science Fund (FWF) [SFB F43].
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright: © 2018 Farman MR et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Farman MR, Hofacker IL and Amman F. MSF: Modulated Sub-graph Finder [version 1; peer review: 1 approved, 1 approved with reservations, 1 not approved]. F1000Research 2018, 7:1346 (https://doi.org/10.12688/f1000research.16005.1) First published: 29 Aug 2018, 7:1346 (https://doi.org/10.12688/f1000research.16005.1) Latest published: 14 Apr 2019, 7:1346 (https://doi.org/10.12688/f1000research.16005.3)

Introduction

High throughput sequencing techniques have been widely used to yield differentially expressed genes (DEG) (Malone & Oliver, 2011). To this end, changes in transcript abundance are measured, e.g. by next generation sequencing techniques, and interpreted as an indicator of differential expression of genes. DEGs can be used to gain insights into the mechanism underlying differences between conditions of samples, such as healthy versus diseased. Differential gene expression analysis (DGEA) informs about the magnitude of expression changes between the conditions which are often expressed as fold change, sign of fold change and the confidence level of observing an authentic change, often expressed as p-value. These DEGs information is further interpreted to extract meaningful biological insights. For example, genes that could be involved in the response to a particular stimuli or maybe the cause of a disease. To this end, pathway-based analysis has become an important tool to further interpret the results of a DGEA and to acquire understandings of the perturbations in a biological system. Biological pathways are sets of genes and their interactions forming a functional unit. DEGs help to identify pathways or networks that may be altered during a change of condition providing important information about diseases and its treatment process (Khatri et al., 2012). Pathway-based methods use predefined pathways or networks such as KEGG (Kanehisa & Goto, 2000) and Reactome (Fabregat et al., 2018), the expression measurements of the genes obtained from DGEA in combination with statistical methods and algorithms to identify specifically modulated pathways and processes (García-Campos et al., 2015).

Well established resources for pathway annotation are KEGG (Kyoto Encyclopedia of Genes and Genomes) (Kanehisa & Goto, 2000) and Reactome (Fabregat et al., 2018). KEGG pathways is a branch of KEGG database that hosts a collection of manually drawn pathway maps representing the molecular interaction, reaction and relation networks of cellular functions. Similarly, Reactome is an open-source, manually curated, peer-reviewed database for signaling and metabolic molecules with their interactions formed into different biological pathways for nineteen species (Fabregat et al., 2018). Both provide predefined pathways which are sets of genes and their interactions categorized into functional units. Starting from a gene interaction network, genes are labeled according to their role in a specific biological process. In this sense a particular gene can be assigned to different pathways. E.g., the human gene STAT1 is associated with 24 different pathways in the pathway annotation curated by KEGG and in 12 different pathways in the Reactome database. Although carefully produced, the assignment of genes to those predefined pathway units can be considered to be subjective to some degree and suffers from observational bias (Schnoes et al., 2013).

Existing pathway-based analysis approaches use different research designs, which can be categorized into ORA (Over-representation analysis), FCS (Functional class scoring) and pathway topology based methods. All of which aim to find a subset of genes, e.g., significantly differentially expressed genes, genes associated with a certain pathway more often than expected given the total set of examined genes, e.g. the whole genome background. ORA is the first and the most basic method of pathway analysis (García-Campos et al., 2015). It uses a DEG list with user defined cut-off for the log-fold change and pvalue (most commonly using absolute log-fold change ≥ 2 and p-value ≤ 0.05). Subsequently, sets of genes associated with annotated pathways are tested for being over-represented in the set of DEGs. To this end, hypergeometric distribution, chi-square tests, binomial probability or the Fisher’s exact test are used. Thereby the information of the topology of genes in the pathways is neglected (Bayerlová et al., 2015). Furthermore, ORA assumes that the biological pathways are independent of each other and ignores the fact that they cross-talk and overlap (García-Campos et al., 2015; Khatri et al., 2012).

Unlike ORA, FCS has no artificial cut-off to define a DEG list. FCS works in three steps. First it calculates the gene-level statistics including correlation of molecular measurements using differential expression of individual genes, ANOVA, t-test and Z-score. In the second step the statistics of individual genes in a pathway are transformed to an individual pathway-level statistic commonly using Kolmogorov-Smirnov statistic, mean or median. Finally the statistical significance of the pathwaylevel statistics is assessed. Although FCS covers some of the limitations from ORA, it still ignores the topology of genes in a pathway, cross-talk and overlap of the pathways (García-Campos et al., 2015; Khatri et al., 2012). Pathway topology based methods are similar to FCS except that they consider the topology of each gene during the gene-level statistics but still don’t aim to link different functional pathways (Khatri et al., 2012).

On these grounds we propose a novel approach to make use of the rich gene and protein interaction annotation resources available to gain additional functional insights from basic DGEA. To this, we start with the presupposition that expression of neighboring genes within a functional pathway are not independent from each other. Rather, they are often regulating each others expression or are part of the same regulon (Michalak, 2008). We understand that the categorization of links between genes into labeled pathways is often an arbitrary one, given the extensive cross talk between different pathways. Although these categories have proven to be useful in many situations, they force a certain perspective onto the interpretation of novel data. Based on these two principles, we aim to find sub-graphs of connected genes within cell signaling network which exhibit as a whole significant differential expression changes. This approach differs in two main aspects from common pathway analysis tools. First, it does not aim to identify functional pathways enriched in differentially expressed genes, but detects sub-graphs or branches in a network graph (potentially spanning more than one functionally grouped pathway) which is coherently modulated. Second, it aims to improve the DGEA on the gene level, by collecting the information of neighboring genes, which as a whole might exhibit prominent enough signal to be called, again as a whole, significantly modulated.

As input, information on functional links between genes provided by e.g. KEGG or Reactome and information on the differential expression status of single genes resulting from a DGEA, are required. As a result the analysis returns sub-graphs and their joint confidence scores, reflecting how the perturbation is migrated through the network. Furthermore, the entry points of perturbation in the networks and overlap with conventional pathway categories are returned. The output is prepared in a directed adjacency file, convenient for visualization, e.g., with StringApp (Morris et al., 2018), available as a Cytoscape plug-in (Shannon et al., 2003).

All of this can be helpful to understand the cause and effect of a stimulus and might inform about potential points of intervention. The proposed algorithm was implemented as a java program, which was named Modulated Sub-graph Finder (abbreviated MSF). MSF can help transform the information obtained from DGEA into comprehensible knowledge of signal transduction of genes and thereby being a valuable complement to existing pathway based methods. MSF is freely accessible from GitHub https://github.com/Modulated-Subgraph-Finder/MSF.

Methods

Implementation

MSF is developed as a novel heuristic approach to find concertedly modulated sub-graphs in networks of biological interactions. MSF does not use predefined gene sets grouped into functional units, but rather relies purely on the network of interacting genes. The input network consists of nodes corresponding to genes and edges representing interactions. Furthermore it utilizes comprehensive results from a differential gene expression analysis to discover the sub-graphs, or modules, which are as a whole modulated.

MSF uses the individual gene’s p-values generated from the DGEA. The p-value expresses the probability that the null hypothesis of unmodified gene expression can’t be rejected for a given statistical model. To find significantly modulated sub-graphs individual p-values of the vicinal genes in the global network are combined into a single combined p-value, using a statistical method for combining dependent p-values described by Hartung (Hartung, 1998). Hartung’s method uses the inverse of standard normal distribution function, individual gene p-values are first transformed to their corresponding normal score. Then using these normal scores, the correlation between genes is calculated, the normal scores and correlation are applied to the inverse normal function to calculate the combined p-value for all genes examined, namely the examined sub-graph. The combined p-value of a sub-graph will express the significance of all genes in the sub-graph being modulated together. Thereby, the information from the different genes are used as, although not independent, replicated measurement of the behavior of the whole sub-graph. This potentially increases the power to detect also significant sub-graphs consisting of genes which are not significant on there own.

Overview of our method

To reduce the complexity to score all possible connected sub-graphs MSF applies a four step heuristic as described in the following. The proceeding identification of modulated sub-graphs from a network by MSF is presented as a flowchart diagram (Figure 1).

Figure 1. Graphical representation of the `Modulated Sub-graph Finder (MSF)` heuristical approach to detect modulated sub-graphs in a global gene regulatory network without exhaustively testing all connected sub-graphs.

Initial modulated sub-graphs. MSF constructs the first sub-graph starting with the genes associated with the lowest (most significant) p-value deduced from the DGEA. From this seed it tries to extend the sub-graph by adding directly neighboring genes, starting with the most significant one. A single combined p-value is calculated for the extended sub-graph. If the combined p-value is smaller than the p-value of the original sub-graph, the extended sub-graph is accepted. This step is iteratively repeated until no further extension is accepted. In this case the process starts over with all remaining genes not yet in a significantly modulated sub-graph. This step identifies all simple sub-graphs that are modulated in the whole network.

Extending modulated sub-graphs. In the next step, we check if any of the initial modulated sub-graphs could further be extended by adding more than one gene at a time. This is done by testing all possible extension paths up to N (default 2) genes at all nodes in the sub-graph. Again, this step is iteratively repeated until no further genes are added to the significant differentially expressed sub-graphs. This step bridges small gaps of genes without a clear differential signal in the DGEA.

Merging modulated sub-graphs. After detection and extension of the modulated subgraphs, they are tested if combined sub-graphs score better than on their own. The merging of the sub-graphs is done by depth first search traversal from the first sub-graph to the second sub-graph. If the two sub-graphs merge with the connector of at most N genes (default 1 gene) and the combined p-value of the merged sub-graph including the bridging genes in between is less than the individual p-values of the two sub-graphs, the two sub-graphs are merged together to one big modulated sub-graph. This step is repeated iteratively until no sub-graphs could be merged.

Finding sources & sinks. In a last post processing step MSF identifies the trigger points of the modulated sub-graphs. These trigger genes are the sources of the sub-graphs with only outgoing edges. These genes can be interpreted as the possible entry points of perturbation from where the stimulus causes downstream effects. In the same spirit the most downstream genes of the modulated sub-graph are identified and defined as sinks. Sinks can be interpreted as the effectors where the integrated information within the signal transduction network is set to action. Due to circular loops not all sub-graphs are guaranteed to have sources or sinks.

Operation. MSF requires Java version 8 and JDK 1.8. The few package dependencies are already been added to the release. MSF runs fast on a standard laptop computer and so it has normal system requirements. To run MSF, the user must provide two text files, one containing the DGEA and the second one containing the interactions in an adjacency format file. Example files and a detailed tutorial to use MSF has been provided on github https://github. com/Modulated-Subgraph-Finder/MSF.

Results

Case Study

To demonstrate its usefulness, MSF is applied to an RNA-seq data set of primary human monocytederived macrophages (MDMs) infected with Ebola virus (GSE84188) (Olejnik et al., 2017). Ebola Virus (EBOV) belongs to the Filoviridea family; filamentous, enveloped and single stranded RNA viruses. EBOV causes hemorrhagic fever in humans, inducing the host innate and adaptive immune response to be unable to control virus infection (Prins et al., 2009). Until now, there are no approved antiviral drugs for the treatment of Ebola virus infection (Konde et al., 2017; Rhein et al., 2015). The initial targets of EBOV are the macrophages and dendritic immune cells (Falasca et al., 2015; Rhein et al., 2015). EBOV inhibits the critical innate immune response of the host, which includes the activation of alpha/beta interferon (IFN-α/β) (Cárdenas et al., 2006; Konde et al., 2017; Prins et al., 2009). It has been proposed that IFNα/β should be tested against Ebola for its antiviral activity through clinical trials (Konde et al., 2017). Ebola infection data was selected to test the approach because it has been well recognized for the last several decades, and vast literature is available for the pathogenesis of Ebola. Thereby facilitating the verification of the results of MSF with the vast literature present on Ebola infection. Especially, the detection of IFN-α/β as point of action for the virus, could be considered as a basic indicator of the correctness and usefulness of the approach.

EBOV infection count data was downloaded from Gene Expression Omnibus (GEO) (GSE84188). Differential gene expression analysis was performed on the count data with edgeR package (version 3.4.2) (Robinson et al., 2010). The DEG analysis results generated by edgeR were used as input for MSF. Directed cell signaling interactions were filtered from Reactome Functional interactions (FIs) Version 2016 (Wu et al., 2016), which was used as a second input for MSF.

The EBOV infection experiments describe the course of infection at three time-points 6, 24 and 48 hour post infection (hpi). For the earliest time point at 6 hpi, three large modulated sub-graphs were identified with 41, 107, and 18 genes were identified, as well as two with less than 4 genes. The modulated sub-graphs consist predominantly of cytokines, chemokines (CXCL10, CCL8) and Interleukin genes (IL6, IL27, IL23). IFNB1 and IFNA1 were both identified as two of the possible sources in one of the sub-graph identified with 18 genes. Most of the sources identified by MSF were type I interferon induced genes (Supplementary material 6H). At 24 hpi eight modulated sub-graphs were identified with four main sub-graphs consisting of 27, 101, 134 and 194 genes, others being smaller than 10 genes. IFNB1 and IFNA1 were identified as the two sources out of the total sources. For the last time-point 48 hpi, six modulated sub-graphs were identified. Three of the sub-graphs were less than nine genes and main sub-graphs had 122, 191 and 202 genes. IFNB1 was identified as one of the sources in the most significantly modulated sub-graph (Supplementary).

As stated earlier IFN-α/β was reported to be one of the target genes of Ebola infection. We were able to successfully identify IFNA1 as a source in all three time-points and INFB1 in two of the time-points. Although IFNA1 and IFNB1 were already among the most significant gene in the DGEA during the later time points, MSF was also able to detect them as a source in the very early time-point when the genes were not significant based on the individual DGEA alone. Identifying the possible sources will reduce the search space for potential target genes and can help the biologist as the starting point of clinical testing for drugs and vaccines against an infection.

Table 1 compares the results of MSF, namely the number of detected sub-modules and their total genes numbers, to a simple analysis of mapping significantly modulated genes from the DGEA to the network and joining neighbors to modules. The numbers indicate that MSF detects less but larger sub-modules, applying its statistical test. Furthermore, the dependency of the results from the p-value cutoff choice is demonstrated for the DGEA, which is avoided for MSF altogether.

Table 1. Comparison of connected sub-graphs of modulated genes in the global network between the analysis results of `Modulated Sub-graph Finder` (`MSF`) and mapping the raw list of differentially expressed genes from the standard edgeR analysis, applying different p-value cut-offs, onto the network.

hpi - hours post infection.

	Total number of genes in network	Number of connected sub-graph in network
6hpi
edgeR + MSF	166	3
edgeR (p-value ≤ 0.1)	123	39
edgeR (p-value ≤ 0.05)	123	47
edgeR (p-value ≤ 0.01)	112	53
24hpi
edgeR + MSF	456	4
edgeR (p-value ≤ 0.1)	325	76
edgeR (p-value ≤ 0.05)	305	102
edgeR (p-value ≤ 0.01)	262	110
48hpi
edgeR + MSF	515	3
edgeR (p-value ≤ 0.1)	392	43
edgeR (p-value ≤ 0.05)	366	74
edgeR (p-value ≤ 0.01)	306	106

Modulated sub-graphs at 6 hpi

Three main modulated sub-graphs identified by MSF at 6 hpi are shown in Figure 2. The gene based graphs on the right hand side, represent the immediate output of the MSF-analysis, visualized by StringApp (Morris et al., 2018) in Cytoscape (Shannon et al., 2003). Each node represents a gene part of a modulated sub-graph, whereby the associated colors code the functional annotation deduced from KEGG Pathways. The cross-talk between the pathways and also the multiple employment of many genes is evident. The more schematic drawing on the right side represents the effortlessly deduced flow of information between the sensors and effectors in this particular example.

Figure 2. Visualisation of the three modulated sub-graphs identified by `Modulated Sub-graph Finder (MSF)` at 6 h after Ebola Virus (EBOV) infection in gene detail and their epitomized representation to depict the flow of perturbation in the directed network.

The node coloring is associated to KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways referring to the colors in the legend. The graph edges are from Reactome. Important genes to EBOV infection as from literature are enlarged in the graph.

In more detail, sub-graph 1 (top) shows how the activation of toll-like receptor, cytokine, chemokine and jakstat genes lead via TNF into apoptosis. The next significant sub-graph (sub-graph 2: middle) reveals how information from the Extra-cellular matrix (ECM) receptor, which are reported to interact with Ebola glycoprotein (GP) (Veljkovic et al., 2015), chemokines and cytokines, and cytosolic DNA sensing, is integrated into again modulation of apoptosis pathway. Eventually, sub-graph 3 (bottom) demonstrate how INFA1 and INFB1 modulate once more, via only a few intermediate steps, the apoptotic response of the cell.

This show cast example demonstrates with how little effort complex data can be interpreted, help to apprehend the dynamics of the underlying processes and suggest testable hypothesis and potential points of intervention.

Robustness

A potential concern is how noise in the gene expression measurements affects our analysis. To assess the robustness and stability of our method, we therefore added Poisson distributed noise to the read counts of the three time-points data set, used above. Then DGEA was carried out on the disturbed data with the same parameters as for the native data using edgeR, followed by analysis with MSF. This procedure was carried out 100 times. Every time the genes from the modulated sub-graphs identified from noisy data were compared to the genes of sub-graphs identified from the native data. This was also done for the DEG obtained for each run, using three different cutoffs of FDR 0.01, 0.05 and 0.1. The robustness of MSF and the DEG analysis for the time-point 6, 24, and 48 hpi are shown in Figure 3. The procedure for how data noise was modeled can be considered as rather stringent, which is already reflected by the limited recall rate in the edgeR based DGEA, between 68 % (6 hpi) and 93 % (24 hpi). For MSF-analysis the observed median recall rates lay between 71 % (6 hpi) and 84 % (48 hpi). The better performance of the pure DGEA can be explained by the fact that disturbed p-values do not change the results for DGEA as long as the p-value does not rise above the chosen cutoff value. In contrast, MSF is sensitive to p-value changes across the whole range of possible values.

Figure 3. Shows the percentage of differentially expressed genes (DEG) analysis (3 different cut-offs) genes and `Modulated Sub-graph Finder (MSF)` identified sub-graph genes recall rates for the three different time points of Ebola Virus (EBOV) infection data for 100 simulation where Poisson distributed noise was added to the experimental deduced reads per gene counts.

Comparison to Reactome pathway analysis tool

Gene enrichment analysis was performed using Reactome Analyze data tool (Fabregat et al., 2018). Reactome’s over-representation analysis tests whether certain Reactome pathways are enriched by the list of genes submitted. Genes from MSF identified sub-graphs for each time-point were analyzed for gene enrichment analysis. For comparison the DEG results from edgeR were filtered using three different cut offs of adjusted p-value 0.01, 0.05 and 0.1. This three subsets of DEG list were used for gene enrichment analysis. The compression of MSF and the three subset DEG list is shown in Figure 4.

Figure 4. The Upset plot shows the number of shared pathways between different time-point and cut offs.

All the different toll-like receptor signaling pathways identified from Modulated Sub-graph Finder (MSF) identified genes are marked.

The comparison shows most of the pathways known from literature to be dis-regulated by Ebola infection are enriched in both the enrichment analysis. Toll-Like receptor signaling pathway when interacts with EBOV glycoprotein (GP), it triggers the activation of cytokines (Olejnik et al., 2017). Toll-like receptor pathway is expected to be dis-regulated in the early stage of infection, this pathway was not identified as significantly dis-regulated when p-value cut off DEG lists were analyzed for enrichment. Ten toll-like receptor cascades were identified as dis-regulated from gene enrichment analysis of MSF identified genes, not a single one of these pathways was shown to be dis-regulated in DEG cut off lists. Since MSF considers the complete DEG results, even the weak signal at the earliest time-point was detected; for example Toll-like receptor signaling.

Discussion

Classic pathway analysis tools aim to detect in lists of significantly deregulated genes enriched associations with pathway genes categorized by their biological function and their interactions. Thereby, depending on the tool, the internal pathway topology is considered or neglected all together. The here presented tool, MSF, employs a different approach, by aiming to detect sub-graphs in whole gene regulatory networks which are significantly deregulated in a concerted manner. To this end, neighboring genes in the user provided network are tested for jointly common regulation. Exploiting that each gene’s abundance, although not independent from its neighbors, is measured repeatedly on its own, sensitivity can be increased by our applied p-value meta-analysis, namely Hartung’s method. This potentially enables to call nonsignificant modulated genes based on the DGEA, to be convincingly part of a deregulated gene group. Furthermore, it allows to identify connected sub-graphs, representing the propagation of gene regulation perturbation in the input network. A better understanding of this propagation, especially the critical spots such as sensors, effectors, or intermediate bottlenecks and hubs, facilitates the projection of potential intervention points, e.g., for drug development. Since MSF only uses interaction information in gene regulation networks, but not the functional grouping of the genes into functional pathways, it is especially adapted to discover so called cross-talk between such pathways.

Conclusions

MSF is a fast, robust and easy to use tool to find concertedly modulated sub-graphs in a given network. Its implementation in JAVA enables its use across many operating systems. It requires as input the results from a differential gene expression analysis in the appropriate file format. So far the raw output from edgeR (Robinson et al., 2010) and DESeq2 (Love et al., 2014) are supported. Furthermore, a gene network has to be provided in the format of a directed adjacency list.

Data availability

The Ebola infection RNA-seq data set analyzed during the current study are available in the GEO repository (GSE84188) (Olejnik et al., 2017). The cell signaling network file used is from Reactome Functional interactions (FIs) Version 2016 (Wu et al., 2016).

Software availability

Source code is available from GitHub: https://github.com/Modulated-Subgraph-Finder/MSF

Archived source code at time of publication: http://doi.org/10.5281/zenodo.1400242 (Farman, 2018).

Software license: MIT license.

Grant information

This work was funded by the FWF (“Fonds zur Förderung der wissenschaftlichen Forschung”) within the project Internationalen Kooperationsprojektes - Intl cooperation Project (Joint Project - Lead Agency Verfahren) [I 1988-B22]. The grant was assigned to ILH. FA was funded by the Austrian Science Fund (FWF) [SFB F43].

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Supplementary material

Supplementary material is available form GitHub: https://github.com/Modulated-Subgraph-Finder/MSF

Faculty Opinions recommended

References

Bayerlová M, Jung K, Kramer F, et al.: Comparative study on gene set and pathway topology-based enrichment methods. BMC Bioinformatics. 2015; 16(1): 334. PubMed Abstract | Publisher Full Text | Free Full Text
Cárdenas WB, Loo YM, Gale M Jr, et al.: Ebola virus vp35 protein binds double-stranded RNA and inhibits alpha/beta interferon production induced by RIG-I signaling. J Virol. 2006; 80(11): 5168–5178. PubMed Abstract | Publisher Full Text | Free Full Text
Fabregat A, Jupe S, Matthews L, et al.: The Reactome Pathway Knowledgebase. Nucleic Acids Res. 2018; 46(D1): D649–D655. PubMed Abstract | Publisher Full Text | Free Full Text
Falasca L, Agrati C, Petrosillo N, et al.: Molecular mechanisms of Ebola virus pathogenesis: focus on cell death. Cell Death Differ. 2015; 22(8): 1250–9. PubMed Abstract | Publisher Full Text | Free Full Text
Farman M: Modulated-Subgraph-Finder/MSF V.1 (Version V.1). Zenodo. 2018. http://www.doi.org/10.5281/zenodo.1400242
García-Campos MA, Espinal-Enríquez J, Hernández-Lemus E: Pathway Analysis: State of the Art. Front Physiol. 2015; 6: 383. PubMed Abstract | Publisher Full Text | Free Full Text
Hartung J: A note on combining dependent tests of significance. Technical report, Technical Report, SFB 475: Komplexitätsreduktion in Multivariaten Datenstrukturen, Universität Dortmund, 1998. Reference Source
Kanehisa M, Goto S: Kegg: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000; 28(1): 27–30. PubMed Abstract | Publisher Full Text | Free Full Text
Khatri P, Sirota M, Butte AJ: Ten years of pathway analysis: current approaches and outstanding challenges. PLoS Comput Biol. 2012; 8(2): e1002375. PubMed Abstract | Publisher Full Text | Free Full Text
Konde MK, Baker DP, Traore FA, et al.: Interferon β-1a for the treatment of Ebola virus disease: A historically controlled, single-arm proof-of-concept trial. PLoS One. 2017; 12(2): e0169255. PubMed Abstract | Publisher Full Text | Free Full Text
Love MI, Huber W, Anders S: Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014; 15(12): 550. PubMed Abstract | Publisher Full Text | Free Full Text
Malone JH, Oliver B: Microarrays, deep sequencing and the true measure of the transcriptome. BMC Biol. 2011; 9(1): 34. PubMed Abstract | Publisher Full Text | Free Full Text
Michalak P: Coexpression, coregulation, and cofunctionality of neighboring genes in eukaryotic genomes. Genomics. 2008; 91(3): 243–248. PubMed Abstract | Publisher Full Text
Morris J, Jensen LJ, Doncheva NT: stringApp 1.3.0. [Online; accessed 19-Februar-2018]. 2018. Reference Source
Olejnik J, Forero A, Deflubé LR, et al.: Ebolaviruses Associated with Differential Pathogenicity Induce Distinct Host Responses in Human Macrophages. J Virol. 2017; 91(11): pii: e00179-17. PubMed Abstract | Publisher Full Text | Free Full Text
Prins KC, Cárdenas WB, Basler CF: Ebola virus protein vp35 impairs the function of interferon regulatory factor-activating kinases IKKepsilon and TBK-1. J Virol. 2009; 83(7): 3069–3077. PubMed Abstract | Publisher Full Text | Free Full Text
Rhein BA, Powers LS, Rogers K, et al.: Interferon-γ Inhibits Ebola Virus Infection. PLoS Pathog. 2015; 11(11): e1005263. PubMed Abstract | Publisher Full Text | Free Full Text
Robinson MD, McCarthy DJ, Smyth GK: edger: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010; 26(1): 139–140. PubMed Abstract | Publisher Full Text | Free Full Text
Schnoes AM, Ream DC, Thorman AW, et al.: Biases in the experimental annotations of protein function and their effect on our understanding of protein function space. PLoS Comput Biol. 2013; 9(5): e1003063. PubMed Abstract | Publisher Full Text | Free Full Text
Shannon P, Markiel A, Ozier O, et al.: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003; 13(11): 2498–2504. PubMed Abstract | Publisher Full Text | Free Full Text
Veljkovic V, Glisic S, Muller CP, et al.: In silico analysis suggests interaction between Ebola virus and the extracellular matrix. Front Microbiol. 2015; 6: 135. PubMed Abstract | Publisher Full Text | Free Full Text
Wu G, Feng X, Stein L: Reactome FIs. [Online; Version 2016]. 2016. Reference Source

Comments on this article Comments (0)

Version 3

VERSION 3 PUBLISHED 29 Aug 2018

Author details Author details

Mariam R. Farman
Roles: Data Curation, Investigation, Methodology, Software, Validation, Visualization, Writing – Original Draft Preparation

Ivo L. Hofacker
Roles: Conceptualization, Funding Acquisition, Project Administration, Resources, Supervision, Writing – Review & Editing

Fabian Amman
Roles: Conceptualization, Project Administration, Supervision, Writing – Review & Editing

Competing interests

No competing interests were disclosed.

Grant information

Article Versions (3)

version 3

Revised

Published: 14 Apr 2019, 7:1346

https://doi.org/10.12688/f1000research.16005.3

version 2

Revised

Published: 22 Mar 2019, 7:1346

https://doi.org/10.12688/f1000research.16005.2

version 1

Published: 29 Aug 2018, 7:1346

https://doi.org/10.12688/f1000research.16005.1

© 2018 Farman MR et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

SEE MORE DETAILS

CITE

how to cite this article

Farman MR, Hofacker IL and Amman F. MSF: Modulated Sub-graph Finder [version 1; peer review: 1 approved, 1 approved with reservations, 1 not approved]. F1000Research 2018, 7:1346 (https://doi.org/10.12688/f1000research.16005.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?

Key to Reviewer Statuses VIEW HIDE

ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested

Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.

Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions

Version 1

VERSION 1

PUBLISHED 29 Aug 2018

Views

Reviewer Report 29 Nov 2018

Haibo Liu, Department of Animal Science, Iowa State University, Ames, IA, USA

Approved with Reservations

https://doi.org/10.5256/f1000research.17481.r40208

In this manuscript, the authors reported their newly developed tools, MSF, for interpreting gene lists from differential gene expression analysis. Their tool differs from existing pathway analysis tools: (1) it can identify concertedly modulated sub-graphs from user-provided gene networks, thus it can accounting for cross-talk between pathways; (2) it can potentially infer the flow of biological information in response to a perturbation from source to sink. Like the gene set enrichment analysis (GSEA), no arbitrary p-vale threshold is set to dichotomize whole gene lists before applying MSF analysis, instead all genes from DGEA are ordered by p-values from the smallest to the largest. An algorithm similar to the widely used network propagation algorithm is used for subgraph initialization and extension. The authors applied their tools to analyze an RNA-seq dataset from an Ebola virus infection experiment and showed their tool outperform the other tools. They concluded this tool, fast, robust and easy-to-use, is a good supplement to existing pathway-based analysis tools. However, the overall writing is very problematic and there are quite a few issues needing to be fixed. See the details listed as follows.

First of all, the code for the tool is not available via Github. I carefully checked multiple times the Github repository: https://github.com/Modulated-Subgraph-Finder/MSF, however I can't find the ModulatedSubPathFinder.jar file, which is the Java implementation of the proposed tool, MSF.
There are too many grammar issues and writing issues. Just mentioned a few, in the first paragraph of Introduction, "mechanism" in Line 6 should be plural, while "stimuli" in Line 14 should be "stimulus", "maybe" in Line 15 should be "may be". Careful proof-reading is strongly recommended.
The authors have a few misconceptions. For example, they treated "effectors" and "sinks" equally. In my opinion, effectors include sources, intermediate genes and sinks, i.e., all genes responding to perturbations. The authors think of the significance of statistical tests in the form of p-values as a confidence level of observing an authentic expression change. This might not be correct. Besides small p-values, the magnitude of fold changes is also important metric of authentic expression change. By the way, the fold change of gene expression is always non-negative. The expression of "sign of fold change" is not meaningful. Only log-transformed fold change is signed.
The flow of information/idea is not fluent in many places. For example, at the end of the first paragraph of Introduction, the authors mentioned the KEGG and Reactome Pathways. Then at the beginning of the second paragraph, they gave a detailed description of the two pathway databases, which might be unnecessary and disrupted the flow to set up the stage to introduce why their tool is necessary and useful. Some information about how their tool was implemented given in the last paragraph of Introduction should be moved to the Implementation section of Methods. Paragraph 3 under the section of “Case Study”, the DEGA results might better be described immediately after the second sentence of Paragraph 2. Similarly, some information in Conclusion should be move to the section of “Implementation”.
The title for the section of "Initial modulated sub-graphs” should be “Initializing modulated sub-graphs “. Under this section, “starting with the most significant one” should be “starting with the next most significant one”. “… not yet in a significantly modulated …” should be “…not yet in the significantly modulated …”.
Under the section of “Extending modulated sub-graphs”, it is not clear how the sub-graphs are extended by adding “MORE THAN ONE” gene at a time. If doing this way, there are infinite possibilities. The criterion to accept or reject added genes is not clear.
Under the section of “Merging modulated sub-graphs”, the authors mentioned that “After detection and extension of the modulated sub-graphs, they are tested if combined subgraphs SCORE better than on their own.” At this point, no merging has been done yet, how are the combined subgraphs tested? What is the SCORE used here? How can a depth-first search traverse from the FIRST sub-graph to the SECOND sub-graph before they are merged (Aren’t the subgraphs not necessarily connected?)?
Under the section of “Finding sources & sinks”, “circular loops” should be just “loop”.
Paragraph 2 Under “Case Study”, some details about edgeR-based DEGA are missing. How the directed cell signaling interactions ere filtered from the Reaction FIs? Based on what?
The directions of edges in right panel of Figure 2 should be showed, because the MSF can generate directed subgraphs.
In the last paragraph of the section of “Modulated sub-graphs at 6 hpi”, “This show cast example…” should be “This show case example…”.
The Robustness test seemed to show that the MSF is not robust enough to extra noise. What is the authors’ conclusion and explanation?
The authors compared the results from their tool to those from the Reactome pathway analysis tool and demonstrated better performance of their tool. Can they also compare the results from their tool to those from the GSEA tools, which don’t set arbitrary cutoff beforehand? The latter comparison might be more convincing.
In Discussion, what are “intermediate bottlenecks”? In Conclusion, the authors claimed their tool is “fast, robust and easy-to-use”. However, they did provide evidence to show their tool is fast. The robustness of their tool is not apparent.
Issues with figures: all legend titles are too long. In Figure 1, Texts in the flow chart symbols are not well-written. There are inconsistent case issues; “initial” should be “initializing”; symbol for condition test (“checked all interaction?”) should be diamond, not hexagon. Question marks might be added to condition test for readability. In the legend to Figure 1, what does “without exhaustively testing all connected subgraphs” mean? Does it the output from MSF analysis might not be comprehensive? Legends to Figure 3 and 4 are poorly written. Is the Toll-like receptor signaling one of the 149 shared pathways in Figure 4. It is not clearly described in the legend.
Limitations of their tool: In their implementation, the fold changes and direction of changes are not taken into account. If a resulting subgraph contains both down-regulated genes and up-regulated genes, how should the users interpret it? The authors didn’t test the sensitivity and specificity of their tool in this manuscript.

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

No
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Yes
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Partly

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: bioinformatics and computational biology, transcriptomics and systems biology

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Author Response 22 Mar 2019

Mariam R Farman, Institute for Theoretical Chemistry,Theoretical Biochemistry Group,, University of Vienna, Vienna, 1090, Austria

22 Mar 2019

Author Response
Thank you very much for your suggestions.
1. So far the ModulatedSubPathFinder.jar was only available under the release tag on git hub (https://github.com/MariamFarman/Modulated-SubGraph-Finder/releases). Meanwhile we added the full source
... Continue reading
Thank you very much for your suggestions.

So far the ModulatedSubPathFinder.jar was only available under the release tag on git hub (https://github.com/MariamFarman/Modulated-SubGraph-Finder/releases). Meanwhile we added the full source code to git hub.

The writing of the manuscript has been carefully checked and improved, especially the examples mentioned by the reviewers.

We agree with the reviewer that we used the word “effector” not very cautiously. Therefore, we changed it accordingly. The reviewer is again right to consider the magnitude of the fold change as an important metric of the system behaviour. We consciously ignore it since it is not straight forward to include it in our model, and we consider that the p-value at least partly reflect the magnitude of fold change since it expresses the probability that the observed fold change differs from 1.

The suggestions are taken into account and the text was modified accordingly. The details about KEGG and Reactome were considered necessary since later the information from both the databases would be used to showcase the results of MSF.

Text modified.

The use of extending sub-graph here is if the sub-graph could not be extending any more by one gene because the direct neighbouring genes have high p-values. To avoid producing fragmented sub-graphs, we try to, instead of single genes, append branches of up to three genes simultaneously. Thereby the accumulated signal of the whole branch can compensate for single unfavourable genes. Since we limit the procedure to branches of length three, the number of possibilities to be tested is limited. Again, a branch is only accepted if the overall score of the sub-module is better after extension. Details about criterion of rejection and acceptance of extension was added to the text.

The wording has been changed for better understanding of the paragraph describing merging of sub-graphs. The score to pass merging is that the combined p-value of the sub-graph after merging two sub-graphs (including connector genes) is smaller than the individual p-values of the two sub-graphs. The depth first search traverse is used to find connectors between the two sub-graphs to merge them.

Text modified.

Details about edgeR were added to the text. The interaction file downloaded from Reactome is filtered for only direct interactions . The tutorial on how the file was filtered is also provided on MSF git hub page.

The directions have been added to the figure and as a output for MSF to import in Cytoscape for easy visualization for the user.

Text modified.

While developing the algorithm we believed MSF would not only give insights into the network modulation but could also be used to increase robustness of DGEA of single genes by using additional information from their neighbours. Our analysis disproved this assumption, which we wanted to communicate with this robustness analysis. Given the comments by several reviewers we adapted the corresponding paragraph, omitting the comparison to the DGEA but showing only the overall robustness of our tool which is, after applying strong noise, still reasonable with a median recall rate of 71 to 84%.

MSF analysis comparison with GSEA analysis added. Although GSEA was able to find pathways known from literature to be dis-regulated during Ebola infection, it could not show the cross-talk between the different pathways like MSF does.

By intermediate bottlenecks we actually meant a gene that is actually connecting a number of sources with a number of sinks and thereby a potential point of intervention if one would like to uncouple the input stimuli from the downstream effects. As stated above the reviewer is right about the robustness, the text has been modified accordingly.

The figure legends have been rewritten for better understanding. Figure 1 chart symbols edited, symbols shapes modified as suggested. With the phrase “Without exhaustively testing all connected sub-graphs” we intended to say that not all possible connectors to merge the graphs are tested but sub-graphs are connected with the first connector that passes the threshold. In figure 5, Toll-like receptor signaling is actually in the 164 shared pathways between different time-points of MSF.

We agree and are aware that magnitude and direction of the fold changes are important. On the magnitude we commented already further above. For the direction, we would like to mention that its interpretation is not straight forward without further information of the type of interaction between the genes, which is not always available. To illustrate, the up-regulation of an inhibitor and the down regulation of an activator can have the same effect on the network.
Thank you very much for your suggestions.

So far the ModulatedSubPathFinder.jar was only available under the release tag on git hub (https://github.com/MariamFarman/Modulated-SubGraph-Finder/releases). Meanwhile we added the full source code to git hub.

The writing of the manuscript has been carefully checked and improved, especially the examples mentioned by the reviewers.

We agree with the reviewer that we used the word “effector” not very cautiously. Therefore, we changed it accordingly. The reviewer is again right to consider the magnitude of the fold change as an important metric of the system behaviour. We consciously ignore it since it is not straight forward to include it in our model, and we consider that the p-value at least partly reflect the magnitude of fold change since it expresses the probability that the observed fold change differs from 1.

The suggestions are taken into account and the text was modified accordingly. The details about KEGG and Reactome were considered necessary since later the information from both the databases would be used to showcase the results of MSF.

Text modified.

The use of extending sub-graph here is if the sub-graph could not be extending any more by one gene because the direct neighbouring genes have high p-values. To avoid producing fragmented sub-graphs, we try to, instead of single genes, append branches of up to three genes simultaneously. Thereby the accumulated signal of the whole branch can compensate for single unfavourable genes. Since we limit the procedure to branches of length three, the number of possibilities to be tested is limited. Again, a branch is only accepted if the overall score of the sub-module is better after extension. Details about criterion of rejection and acceptance of extension was added to the text.

The wording has been changed for better understanding of the paragraph describing merging of sub-graphs. The score to pass merging is that the combined p-value of the sub-graph after merging two sub-graphs (including connector genes) is smaller than the individual p-values of the two sub-graphs. The depth first search traverse is used to find connectors between the two sub-graphs to merge them.

Text modified.

Details about edgeR were added to the text. The interaction file downloaded from Reactome is filtered for only direct interactions . The tutorial on how the file was filtered is also provided on MSF git hub page.

The directions have been added to the figure and as a output for MSF to import in Cytoscape for easy visualization for the user.

Text modified.

While developing the algorithm we believed MSF would not only give insights into the network modulation but could also be used to increase robustness of DGEA of single genes by using additional information from their neighbours. Our analysis disproved this assumption, which we wanted to communicate with this robustness analysis. Given the comments by several reviewers we adapted the corresponding paragraph, omitting the comparison to the DGEA but showing only the overall robustness of our tool which is, after applying strong noise, still reasonable with a median recall rate of 71 to 84%.

MSF analysis comparison with GSEA analysis added. Although GSEA was able to find pathways known from literature to be dis-regulated during Ebola infection, it could not show the cross-talk between the different pathways like MSF does.

By intermediate bottlenecks we actually meant a gene that is actually connecting a number of sources with a number of sinks and thereby a potential point of intervention if one would like to uncouple the input stimuli from the downstream effects. As stated above the reviewer is right about the robustness, the text has been modified accordingly.

The figure legends have been rewritten for better understanding. Figure 1 chart symbols edited, symbols shapes modified as suggested. With the phrase “Without exhaustively testing all connected sub-graphs” we intended to say that not all possible connectors to merge the graphs are tested but sub-graphs are connected with the first connector that passes the threshold. In figure 5, Toll-like receptor signaling is actually in the 164 shared pathways between different time-points of MSF.

We agree and are aware that magnitude and direction of the fold changes are important. On the magnitude we commented already further above. For the direction, we would like to mention that its interpretation is not straight forward without further information of the type of interaction between the genes, which is not always available. To illustrate, the up-regulation of an inhibitor and the down regulation of an activator can have the same effect on the network.
Competing Interests: No competing interests were disclosed. Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response 22 Mar 2019

Mariam R Farman, Institute for Theoretical Chemistry,Theoretical Biochemistry Group,, University of Vienna, Vienna, 1090, Austria

22 Mar 2019

Author Response
Thank you very much for your suggestions.
1. So far the ModulatedSubPathFinder.jar was only available under the release tag on git hub (https://github.com/MariamFarman/Modulated-SubGraph-Finder/releases). Meanwhile we added the full source
... Continue reading
Thank you very much for your suggestions.

So far the ModulatedSubPathFinder.jar was only available under the release tag on git hub (https://github.com/MariamFarman/Modulated-SubGraph-Finder/releases). Meanwhile we added the full source code to git hub.

The writing of the manuscript has been carefully checked and improved, especially the examples mentioned by the reviewers.

We agree with the reviewer that we used the word “effector” not very cautiously. Therefore, we changed it accordingly. The reviewer is again right to consider the magnitude of the fold change as an important metric of the system behaviour. We consciously ignore it since it is not straight forward to include it in our model, and we consider that the p-value at least partly reflect the magnitude of fold change since it expresses the probability that the observed fold change differs from 1.

The suggestions are taken into account and the text was modified accordingly. The details about KEGG and Reactome were considered necessary since later the information from both the databases would be used to showcase the results of MSF.

Text modified.

The use of extending sub-graph here is if the sub-graph could not be extending any more by one gene because the direct neighbouring genes have high p-values. To avoid producing fragmented sub-graphs, we try to, instead of single genes, append branches of up to three genes simultaneously. Thereby the accumulated signal of the whole branch can compensate for single unfavourable genes. Since we limit the procedure to branches of length three, the number of possibilities to be tested is limited. Again, a branch is only accepted if the overall score of the sub-module is better after extension. Details about criterion of rejection and acceptance of extension was added to the text.

The wording has been changed for better understanding of the paragraph describing merging of sub-graphs. The score to pass merging is that the combined p-value of the sub-graph after merging two sub-graphs (including connector genes) is smaller than the individual p-values of the two sub-graphs. The depth first search traverse is used to find connectors between the two sub-graphs to merge them.

Text modified.

Details about edgeR were added to the text. The interaction file downloaded from Reactome is filtered for only direct interactions . The tutorial on how the file was filtered is also provided on MSF git hub page.

The directions have been added to the figure and as a output for MSF to import in Cytoscape for easy visualization for the user.

Text modified.

While developing the algorithm we believed MSF would not only give insights into the network modulation but could also be used to increase robustness of DGEA of single genes by using additional information from their neighbours. Our analysis disproved this assumption, which we wanted to communicate with this robustness analysis. Given the comments by several reviewers we adapted the corresponding paragraph, omitting the comparison to the DGEA but showing only the overall robustness of our tool which is, after applying strong noise, still reasonable with a median recall rate of 71 to 84%.

MSF analysis comparison with GSEA analysis added. Although GSEA was able to find pathways known from literature to be dis-regulated during Ebola infection, it could not show the cross-talk between the different pathways like MSF does.

By intermediate bottlenecks we actually meant a gene that is actually connecting a number of sources with a number of sinks and thereby a potential point of intervention if one would like to uncouple the input stimuli from the downstream effects. As stated above the reviewer is right about the robustness, the text has been modified accordingly.

The figure legends have been rewritten for better understanding. Figure 1 chart symbols edited, symbols shapes modified as suggested. With the phrase “Without exhaustively testing all connected sub-graphs” we intended to say that not all possible connectors to merge the graphs are tested but sub-graphs are connected with the first connector that passes the threshold. In figure 5, Toll-like receptor signaling is actually in the 164 shared pathways between different time-points of MSF.

We agree and are aware that magnitude and direction of the fold changes are important. On the magnitude we commented already further above. For the direction, we would like to mention that its interpretation is not straight forward without further information of the type of interaction between the genes, which is not always available. To illustrate, the up-regulation of an inhibitor and the down regulation of an activator can have the same effect on the network.
Thank you very much for your suggestions.

So far the ModulatedSubPathFinder.jar was only available under the release tag on git hub (https://github.com/MariamFarman/Modulated-SubGraph-Finder/releases). Meanwhile we added the full source code to git hub.

The writing of the manuscript has been carefully checked and improved, especially the examples mentioned by the reviewers.

We agree with the reviewer that we used the word “effector” not very cautiously. Therefore, we changed it accordingly. The reviewer is again right to consider the magnitude of the fold change as an important metric of the system behaviour. We consciously ignore it since it is not straight forward to include it in our model, and we consider that the p-value at least partly reflect the magnitude of fold change since it expresses the probability that the observed fold change differs from 1.

The suggestions are taken into account and the text was modified accordingly. The details about KEGG and Reactome were considered necessary since later the information from both the databases would be used to showcase the results of MSF.

Text modified.

The use of extending sub-graph here is if the sub-graph could not be extending any more by one gene because the direct neighbouring genes have high p-values. To avoid producing fragmented sub-graphs, we try to, instead of single genes, append branches of up to three genes simultaneously. Thereby the accumulated signal of the whole branch can compensate for single unfavourable genes. Since we limit the procedure to branches of length three, the number of possibilities to be tested is limited. Again, a branch is only accepted if the overall score of the sub-module is better after extension. Details about criterion of rejection and acceptance of extension was added to the text.

The wording has been changed for better understanding of the paragraph describing merging of sub-graphs. The score to pass merging is that the combined p-value of the sub-graph after merging two sub-graphs (including connector genes) is smaller than the individual p-values of the two sub-graphs. The depth first search traverse is used to find connectors between the two sub-graphs to merge them.

Text modified.

Details about edgeR were added to the text. The interaction file downloaded from Reactome is filtered for only direct interactions . The tutorial on how the file was filtered is also provided on MSF git hub page.

The directions have been added to the figure and as a output for MSF to import in Cytoscape for easy visualization for the user.

Text modified.

While developing the algorithm we believed MSF would not only give insights into the network modulation but could also be used to increase robustness of DGEA of single genes by using additional information from their neighbours. Our analysis disproved this assumption, which we wanted to communicate with this robustness analysis. Given the comments by several reviewers we adapted the corresponding paragraph, omitting the comparison to the DGEA but showing only the overall robustness of our tool which is, after applying strong noise, still reasonable with a median recall rate of 71 to 84%.

MSF analysis comparison with GSEA analysis added. Although GSEA was able to find pathways known from literature to be dis-regulated during Ebola infection, it could not show the cross-talk between the different pathways like MSF does.

By intermediate bottlenecks we actually meant a gene that is actually connecting a number of sources with a number of sinks and thereby a potential point of intervention if one would like to uncouple the input stimuli from the downstream effects. As stated above the reviewer is right about the robustness, the text has been modified accordingly.

The figure legends have been rewritten for better understanding. Figure 1 chart symbols edited, symbols shapes modified as suggested. With the phrase “Without exhaustively testing all connected sub-graphs” we intended to say that not all possible connectors to merge the graphs are tested but sub-graphs are connected with the first connector that passes the threshold. In figure 5, Toll-like receptor signaling is actually in the 164 shared pathways between different time-points of MSF.

We agree and are aware that magnitude and direction of the fold changes are important. On the magnitude we commented already further above. For the direction, we would like to mention that its interpretation is not straight forward without further information of the type of interaction between the genes, which is not always available. To illustrate, the up-regulation of an inhibitor and the down regulation of an activator can have the same effect on the network.
Competing Interests: No competing interests were disclosed. Close
Report a concern

Views

Reviewer Report 12 Nov 2018

Stefanie Widder, Medical University of Vienna, Department of Medicine 1, Research Lab of Infection Biology, Waehringer Guertel 18-20, 1090 Vienna, Austria

Approved

https://doi.org/10.5256/f1000research.17481.r40009

The authors present a novel method for finding groups of genes that concomitantly change their expression profile upon signals and conditional changes. As opposed to standing tools in the field that rely on predefined functional classification, this approach is based on topology of the global interaction network and enables an unbiased identification of larger functional blocks highlighting pathway cross-talk. It furthermore includes a topological method for identifying source and sink pathways that provides a prediction of process causality. The latter is particularly useful for hypothesis generation and add-on experimental validation far beyond the field of biomedicine.

The paper is structured into the presentation of the algorithm, a biomedical use-case with validating background information and a slim benchmarking against differential expression analysis with different cut-offs.

While the method clearly fills an existing gap in high-throughput gene expression analysis and is very elegantly reasoned, I would like to raise a number of comments with regards to writing and benchmarking.

Introduction:

The main line of argumentation gets lost sometimes, in particular in paragraph 1. The narrative would furthermore benefit from actual examples instead of repeating a general statement of ‘changed conditions’. Sometimes, the same argument is repeated in differently phrased sentences.
Paragraph 2: Suppress ‘be subjective to some degree’.
Paragraph 4: Better highlight and delimit the novelty of the presented approach.
Overall, shorter sentences benefit the reader.

Methods:

The context of Hartung’s method is described nicely, yet the actual way how the individual p-values are combined to result in a single measure is omitted. Please include as this information provides instructive benefit to the reader.

Results:

Case Study:

Paragraph 1: Readability would profit from focusing the background information, e.g. 'Ebola infection data was (->were) selected (...)' - rephrase into a half-sentence.
Wording: Until now->currently.

Robustness:

The recall-comparison (MSF to differential expression analysis) is weakening the proposed method, because no increase in recall for MSF can be achieved. It would be informative to see also precision stats in comparison. Generally, one would expect more robust statistics on larger subgraphs (MSF vs. DEG groups).
Also, I am not entirely convinced by adding noise to real and thus – already - noisy data. A small, artificial mock example with detailed known outcome (+/- noise) might be more suitable and supportive.

Figure 4:

This figure is very difficult to follow. I suggest to i) enhance the label sizes and texts, with particular emphasis on delimiting MSF and DEG, ii) improve the figure caption text including fundamental information of what is shown.

Github material:

Importantly - provide an example directory that contains complete examples cases (+*all* files) including the presented Ebola case + options and outcome to enable a swift recapitulation of the tool for the new user.

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Partly
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Partly
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Yes

Competing Interests: No competing interests were disclosed.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

CITE

Report a concern

Author Response 22 Mar 2019

Mariam R Farman, Institute for Theoretical Chemistry,Theoretical Biochemistry Group,, University of Vienna, Vienna, 1090, Austria

22 Mar 2019

Author Response
Thank you for your helpful comments about our manuscript.
1. Paragraph 1 has been rephrased, “changed conditions” has been replaced with example of treated verse healthy example.
2. Paragraph
... Continue reading
Thank you for your helpful comments about our manuscript.

Paragraph 1 has been rephrased, “changed conditions” has been replaced with example of treated verse healthy example.

Paragraph 2 modified accordingly.

The novelty of MSF is provided in paragraph 5. Firstly, MSF does not use the predefined sets of genes to find the modulated sub-graphs but starts building the sub-graph by using the information from DGEA and interactions from whole cell signaling network. Second MSF considers the signal of the neighbouring genes to find significance of the modulated sub-graph.

We revisited the overall text and tried to emphasis more on clear readability.

More details about Hartung’s methods are provided in the appropriate section.

The case study is rephrased and wording modified.

The reviewer is correct about the recall-comparison weakening the proposed method. We have been very strict with testing the robustness of the method by adding extra noise to already noisy data. We expected the method to be more robust than DGEA but unfortunately that was not the case. Since for cut-off based DGEA robustness it does not matter if a few genes p-values go up or down as long as they are below the chosen cut-off. In contrast, for MSF the robustness analysis showed that it makes a difference for the sub-graphs identified. The reviewer is correct again to expect more robustness in the larger sub-graphs which is shown at later time-point 48 hpi that has larger sub-graphs than the other two time-points. If you look at the robustness of MSF alone it varies from 71 % (6 hpi) and 84 % (48 hpi) , we agree that the numbers are not outstanding but we consider them to be reasonable. Since our assumption that a post analysis with MSF can not only gain insights into the pathway modulation but also improve the DGEA for single genes, using their neighbours information did not hold, we removed that aspect to improve readability.

We agree that the data is already noisy even before adding noise. Again, we have been very strict to test the robustness of the method,. As mentioned above the larger the sub-graphs the more robust they are, we are not sure that having a small mock example would elaborate more.

Figure 5 (previously figure 4) set-up has been changed to one cut-off only i.e. 0.05. Figure label size and texts enhanced. Figure caption modified.

Unfortunately we can not provide the Ebola count files since they belong to Olejnik et al. A tutorial is provided to reproduce the results from the Ebola infection data, with details on how to obtain the raw data used from the third party source (GSE84188). MSF output files are also provided in the supplementary material.
Thank you for your helpful comments about our manuscript.

Paragraph 1 has been rephrased, “changed conditions” has been replaced with example of treated verse healthy example.

Paragraph 2 modified accordingly.

The novelty of MSF is provided in paragraph 5. Firstly, MSF does not use the predefined sets of genes to find the modulated sub-graphs but starts building the sub-graph by using the information from DGEA and interactions from whole cell signaling network. Second MSF considers the signal of the neighbouring genes to find significance of the modulated sub-graph.

We revisited the overall text and tried to emphasis more on clear readability.

More details about Hartung’s methods are provided in the appropriate section.

The case study is rephrased and wording modified.

The reviewer is correct about the recall-comparison weakening the proposed method. We have been very strict with testing the robustness of the method by adding extra noise to already noisy data. We expected the method to be more robust than DGEA but unfortunately that was not the case. Since for cut-off based DGEA robustness it does not matter if a few genes p-values go up or down as long as they are below the chosen cut-off. In contrast, for MSF the robustness analysis showed that it makes a difference for the sub-graphs identified. The reviewer is correct again to expect more robustness in the larger sub-graphs which is shown at later time-point 48 hpi that has larger sub-graphs than the other two time-points. If you look at the robustness of MSF alone it varies from 71 % (6 hpi) and 84 % (48 hpi) , we agree that the numbers are not outstanding but we consider them to be reasonable. Since our assumption that a post analysis with MSF can not only gain insights into the pathway modulation but also improve the DGEA for single genes, using their neighbours information did not hold, we removed that aspect to improve readability.

We agree that the data is already noisy even before adding noise. Again, we have been very strict to test the robustness of the method,. As mentioned above the larger the sub-graphs the more robust they are, we are not sure that having a small mock example would elaborate more.

Figure 5 (previously figure 4) set-up has been changed to one cut-off only i.e. 0.05. Figure label size and texts enhanced. Figure caption modified.

Unfortunately we can not provide the Ebola count files since they belong to Olejnik et al. A tutorial is provided to reproduce the results from the Ebola infection data, with details on how to obtain the raw data used from the third party source (GSE84188). MSF output files are also provided in the supplementary material.
Competing Interests: No competing interests were disclosed. Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response 22 Mar 2019

Mariam R Farman, Institute for Theoretical Chemistry,Theoretical Biochemistry Group,, University of Vienna, Vienna, 1090, Austria

22 Mar 2019

Author Response
Thank you for your helpful comments about our manuscript.
1. Paragraph 1 has been rephrased, “changed conditions” has been replaced with example of treated verse healthy example.
2. Paragraph
... Continue reading
Thank you for your helpful comments about our manuscript.

Paragraph 1 has been rephrased, “changed conditions” has been replaced with example of treated verse healthy example.

Paragraph 2 modified accordingly.

The novelty of MSF is provided in paragraph 5. Firstly, MSF does not use the predefined sets of genes to find the modulated sub-graphs but starts building the sub-graph by using the information from DGEA and interactions from whole cell signaling network. Second MSF considers the signal of the neighbouring genes to find significance of the modulated sub-graph.

We revisited the overall text and tried to emphasis more on clear readability.

More details about Hartung’s methods are provided in the appropriate section.

The case study is rephrased and wording modified.

The reviewer is correct about the recall-comparison weakening the proposed method. We have been very strict with testing the robustness of the method by adding extra noise to already noisy data. We expected the method to be more robust than DGEA but unfortunately that was not the case. Since for cut-off based DGEA robustness it does not matter if a few genes p-values go up or down as long as they are below the chosen cut-off. In contrast, for MSF the robustness analysis showed that it makes a difference for the sub-graphs identified. The reviewer is correct again to expect more robustness in the larger sub-graphs which is shown at later time-point 48 hpi that has larger sub-graphs than the other two time-points. If you look at the robustness of MSF alone it varies from 71 % (6 hpi) and 84 % (48 hpi) , we agree that the numbers are not outstanding but we consider them to be reasonable. Since our assumption that a post analysis with MSF can not only gain insights into the pathway modulation but also improve the DGEA for single genes, using their neighbours information did not hold, we removed that aspect to improve readability.

We agree that the data is already noisy even before adding noise. Again, we have been very strict to test the robustness of the method,. As mentioned above the larger the sub-graphs the more robust they are, we are not sure that having a small mock example would elaborate more.

Figure 5 (previously figure 4) set-up has been changed to one cut-off only i.e. 0.05. Figure label size and texts enhanced. Figure caption modified.

Unfortunately we can not provide the Ebola count files since they belong to Olejnik et al. A tutorial is provided to reproduce the results from the Ebola infection data, with details on how to obtain the raw data used from the third party source (GSE84188). MSF output files are also provided in the supplementary material.
Thank you for your helpful comments about our manuscript.

Paragraph 1 has been rephrased, “changed conditions” has been replaced with example of treated verse healthy example.

Paragraph 2 modified accordingly.

The novelty of MSF is provided in paragraph 5. Firstly, MSF does not use the predefined sets of genes to find the modulated sub-graphs but starts building the sub-graph by using the information from DGEA and interactions from whole cell signaling network. Second MSF considers the signal of the neighbouring genes to find significance of the modulated sub-graph.

We revisited the overall text and tried to emphasis more on clear readability.

More details about Hartung’s methods are provided in the appropriate section.

The case study is rephrased and wording modified.

The reviewer is correct about the recall-comparison weakening the proposed method. We have been very strict with testing the robustness of the method by adding extra noise to already noisy data. We expected the method to be more robust than DGEA but unfortunately that was not the case. Since for cut-off based DGEA robustness it does not matter if a few genes p-values go up or down as long as they are below the chosen cut-off. In contrast, for MSF the robustness analysis showed that it makes a difference for the sub-graphs identified. The reviewer is correct again to expect more robustness in the larger sub-graphs which is shown at later time-point 48 hpi that has larger sub-graphs than the other two time-points. If you look at the robustness of MSF alone it varies from 71 % (6 hpi) and 84 % (48 hpi) , we agree that the numbers are not outstanding but we consider them to be reasonable. Since our assumption that a post analysis with MSF can not only gain insights into the pathway modulation but also improve the DGEA for single genes, using their neighbours information did not hold, we removed that aspect to improve readability.

We agree that the data is already noisy even before adding noise. Again, we have been very strict to test the robustness of the method,. As mentioned above the larger the sub-graphs the more robust they are, we are not sure that having a small mock example would elaborate more.

Figure 5 (previously figure 4) set-up has been changed to one cut-off only i.e. 0.05. Figure label size and texts enhanced. Figure caption modified.

Unfortunately we can not provide the Ebola count files since they belong to Olejnik et al. A tutorial is provided to reproduce the results from the Ebola infection data, with details on how to obtain the raw data used from the third party source (GSE84188). MSF output files are also provided in the supplementary material.
Competing Interests: No competing interests were disclosed. Close
Report a concern

Views

Reviewer Report 06 Nov 2018

Guanming Wu, Department of Medical Informatics and Clilnical Epidemiology (DMICE) , Oregon Health and Science University (OHSU), Portland, OR, USA

Not Approved

https://doi.org/10.5256/f1000research.17481.r39844

In this manuscript, Farman et al described an approach to search for network modules based on p-values collected from differential gene expression analysis to address pathway crosstalk, which cannot be addressed in conventional pathway enrichment analysis approaches. During the past decade, many network module-based approaches have been developed to understand the functions of genes collected from differential gene expression analysis and other omics approaches (for a review, see Mitra et al, 2013¹). Though the network approach described here has some innovative ideas (e.g. searching for sources and sinks in subgraphs), however, the authors introduce their approach in the context of pathway analysis, without mentioning these previously published similar approaches, let alone comparing their approach to others. Also, it is worthy of mentioning that the described approach in this manuscript is very similar to jActiveModule (Ideker et al, 2012²), the first of this kind, widely used for network-based data analysis.

The manuscript used the Ebola virus (EBOV) time course gene expression data set to show case MSF, trying to demonstrate the validity and usefulness of the approach. Indeed, the authors found that IFNA1 and IFNB1 are source genes in subgraphs across multiple time points, as reported by literature references. However, the authors have not discussed other genes in the found subgraphs, though the whole lists of them are provided in their GitHub site. IFNA1 and IFNB1 are among other source genes. The authors should develop a statistic approach to evaluate p-values and FDRs for subgraphs and individual source genes, therefore, providing users a way to choose the most significant genes for unknown phenotypes or biological processes. The current way to showcase the usefulness of the approach is not stringent and may not be useful if too many genes are collected in the subgraphs.

The authors compared MSF results with raw gene lists based on p-value cutoffs (Table 1). However, Table 1 is not a fair comparison. Only the largest subgraphs are listed for edgeR + MSF, while all subgraphs are listed for raw gene lists (e.g. for 6 hpi, 3 for edgeR + MSF, 39 for edgeR (p-value <= 0.1)). For a fair comparison, all subgraphs should be listed for edgeR + MSF too.

Section “Comparison to Reactome pathway analysis tool” and Figure 4 compare results produced by Reactome pathway enrichment analysis for different gene lists. The whole section is confusing. First, the section title is misleading: the comparison is for results generated from Reactome pathway analysis tool, not to that tool. Second, I cannot see too much value in this setting using different adjusted p-value cutoffs for gene lists, probably one (e.g 0.05) should be enough, to reduce the clutter in Figure 4. Third, the authors want to point out MSF can enrich Toll-like receptor signaling pathway, but not the raw gene lists. However, such a comparison has not clearly indicated in Figure 4. The set comparisons include too many gene lists. Fourth, the authors should point out what p-value or FDR cutoff value used to choose pathways from the Reactome analysis tool. It is not correct to choose all pathways listed in that tool for comparison. Finally, where “Ten” toll-like receptor cascade pathways come from?

Searching for sources/targets in individual MSF subgraphs based on the directions in the Rectome FI network and then drawing the schematic diagrams as illustrated in Figure 2 are interesting. It will be better to show directions in the Cytoscape network view (the right-side networks in Figure 2). The schematic diagrams in Figure 2 are interesting, but may dramatically simplify things occurring inside cells. The Reactome FI network provides functional relationships among genes or proteins, which are not necessary gene regulatory relationships. The authors should point this out in the manuscript.

Finally, the writing of this manuscript is under question. The authors should really read their manuscript much more carefully. There are far too many typos, wrong uses of punctuations, and grammar errors. For examples, “the Filoviridea family; filamentous” should be “the Filoviridea family: filamentous”; “pathogenesis of Ebola. Thereby facilitating” should be “pathogenesis of Ebola, thereby, facilitating”; “among the most significant gene in the DGEA” should be “among the most significant genes in the DGEA”, and many others.

Is the rationale for developing the new software tool clearly explained?

Partly
Is the description of the software tool technically sound?

Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Yes
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Partly
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Partly

References

1. Mitra K, Carvunis AR, Ramesh SK, Ideker T: Integrative approaches for finding modular structure in biological networks.Nat Rev Genet. 2013; 14 (10): 719-32 PubMed Abstract | Publisher Full Text
2. Ideker T, Ozier O, Schwikowski B, Siegel AF: Discovering regulatory and signalling circuits in molecular interaction networks.Bioinformatics. 2002; 18 Suppl 1: S233-40 PubMed Abstract

Competing Interests: No competing interests were disclosed.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

CITE

Report a concern

Author Response 22 Mar 2019

Mariam R Farman, Institute for Theoretical Chemistry,Theoretical Biochemistry Group,, University of Vienna, Vienna, 1090, Austria

22 Mar 2019

Author Response
First of all, we would like to say thank you for your valuable review and your critical comments.
1. Thank you very much for making us aware of the
... Continue reading
First of all, we would like to say thank you for your valuable review and your critical comments.

Thank you very much for making us aware of the paper by Mitra et al and work by Ideker et al. Frankly, these papers skipped our attention. Meanwhile, the review and the tool JactiveModules has been cited in the paper. Using the same expression data and same interaction file we compared the results from MSF and JactiveModules. Although the methods to find the overall score for the modules is similar, there are differences in the sub-graphs identified. The differences seen in the modules identified by the two methods are because MSF starts building the sub-graphs from one gene, incorporating and combining the p-value of the next gene, with the check that the combined p-value of new sub-graph should be better than the original. On the other hand jActiveModules first transforms all the gene’s p-values to z-scores and tries to find connected sets of genes with unexpectedly high levels of differential expression, in this case high z-scores. And then the overall score of the sub-network is calculated by combining the z-scores of the genes. Then using simulated annealing jActiveModules tries to find the highest scoring modules. Given the observed differences and our focus on the flow of perturbation, namely sinks and sources, we think that our contribution is not redundant to the previous work. To strengthen even further our perspective we also worked on the software itself, which now also scores the sources according to their reliability and the potential impact onto the modified sub-module. Since we believe that this improved the usability of the software critically, we would like to thank the reviewers particularly for point us to this.

Again, we thank the reviewer for the valuable input to evaluate the source genes. We acted on this suggestion by amending the software. We have now incorporated an impact score for each source gene, which expresses the percentage of genes in the sub-module which are downstream of the particular source. This should be helpful to prioritize different sources of one sub-module. MSF also performes a t-test for each source gene, testing if the p-values ofthe downstream genes are different from the upstream genes. This would help to see if the source identified indeed marks the border between two different regulation regimes.

The table (Table 1) has been modified. It shows number of MSF identified sub-graphs with the number of genes in it. When we apply different cut-offs to p-values of genes in the MSF identified sub-graph, it shows how they break from larger interpretable sub-graphs to smaller, less interpretable sub-graphs also consisting of single genes.

Section “Comparison to Reactome pathway analysis tool” has been modified. Figure 5 (previously figure 4) setup has been changed to one cut-off only, i.e. 0.05. Ten Toll-like receptor cascades were seen to be enriched from the genes in MSF identified sub-graphs that did not appear in the cut-off gene list. Since MSF uses no cut-off, the sub-graphs identified had genes from Toll-like receptor cascades even when their signal was weak. Figure 5 caption modified for better understanding. A cut-off of 0.05 was used to choose pathways from Reactome pathway analysis for both MSF and cut-off gene list. The nine Toll-like receptor cascades have been mentioned in the manuscript, tenth being Toll-like receptor itself.

Directions have been added to the output of MSF that could be easily imported into Cytoscape. In addition, source impact score and log-fold sign for each individual gene can also be imported into Cytoscape. Functional relationship provided by Reactome FI has been mentioned. The schematic diagram, meant to give our take on the interpretation of the raw MSF output, was removed since we agree on the oversimplification criticism.

The writing of the manuscript has been carefully checked and improved.
First of all, we would like to say thank you for your valuable review and your critical comments.

Thank you very much for making us aware of the paper by Mitra et al and work by Ideker et al. Frankly, these papers skipped our attention. Meanwhile, the review and the tool JactiveModules has been cited in the paper. Using the same expression data and same interaction file we compared the results from MSF and JactiveModules. Although the methods to find the overall score for the modules is similar, there are differences in the sub-graphs identified. The differences seen in the modules identified by the two methods are because MSF starts building the sub-graphs from one gene, incorporating and combining the p-value of the next gene, with the check that the combined p-value of new sub-graph should be better than the original. On the other hand jActiveModules first transforms all the gene’s p-values to z-scores and tries to find connected sets of genes with unexpectedly high levels of differential expression, in this case high z-scores. And then the overall score of the sub-network is calculated by combining the z-scores of the genes. Then using simulated annealing jActiveModules tries to find the highest scoring modules. Given the observed differences and our focus on the flow of perturbation, namely sinks and sources, we think that our contribution is not redundant to the previous work. To strengthen even further our perspective we also worked on the software itself, which now also scores the sources according to their reliability and the potential impact onto the modified sub-module. Since we believe that this improved the usability of the software critically, we would like to thank the reviewers particularly for point us to this.

Again, we thank the reviewer for the valuable input to evaluate the source genes. We acted on this suggestion by amending the software. We have now incorporated an impact score for each source gene, which expresses the percentage of genes in the sub-module which are downstream of the particular source. This should be helpful to prioritize different sources of one sub-module. MSF also performes a t-test for each source gene, testing if the p-values ofthe downstream genes are different from the upstream genes. This would help to see if the source identified indeed marks the border between two different regulation regimes.

The table (Table 1) has been modified. It shows number of MSF identified sub-graphs with the number of genes in it. When we apply different cut-offs to p-values of genes in the MSF identified sub-graph, it shows how they break from larger interpretable sub-graphs to smaller, less interpretable sub-graphs also consisting of single genes.

Section “Comparison to Reactome pathway analysis tool” has been modified. Figure 5 (previously figure 4) setup has been changed to one cut-off only, i.e. 0.05. Ten Toll-like receptor cascades were seen to be enriched from the genes in MSF identified sub-graphs that did not appear in the cut-off gene list. Since MSF uses no cut-off, the sub-graphs identified had genes from Toll-like receptor cascades even when their signal was weak. Figure 5 caption modified for better understanding. A cut-off of 0.05 was used to choose pathways from Reactome pathway analysis for both MSF and cut-off gene list. The nine Toll-like receptor cascades have been mentioned in the manuscript, tenth being Toll-like receptor itself.

Directions have been added to the output of MSF that could be easily imported into Cytoscape. In addition, source impact score and log-fold sign for each individual gene can also be imported into Cytoscape. Functional relationship provided by Reactome FI has been mentioned. The schematic diagram, meant to give our take on the interpretation of the raw MSF output, was removed since we agree on the oversimplification criticism.

The writing of the manuscript has been carefully checked and improved.
Competing Interests: No competing interests were disclosed. Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response 22 Mar 2019

Mariam R Farman, Institute for Theoretical Chemistry,Theoretical Biochemistry Group,, University of Vienna, Vienna, 1090, Austria

22 Mar 2019

Author Response
First of all, we would like to say thank you for your valuable review and your critical comments.
1. Thank you very much for making us aware of the
... Continue reading
First of all, we would like to say thank you for your valuable review and your critical comments.

Thank you very much for making us aware of the paper by Mitra et al and work by Ideker et al. Frankly, these papers skipped our attention. Meanwhile, the review and the tool JactiveModules has been cited in the paper. Using the same expression data and same interaction file we compared the results from MSF and JactiveModules. Although the methods to find the overall score for the modules is similar, there are differences in the sub-graphs identified. The differences seen in the modules identified by the two methods are because MSF starts building the sub-graphs from one gene, incorporating and combining the p-value of the next gene, with the check that the combined p-value of new sub-graph should be better than the original. On the other hand jActiveModules first transforms all the gene’s p-values to z-scores and tries to find connected sets of genes with unexpectedly high levels of differential expression, in this case high z-scores. And then the overall score of the sub-network is calculated by combining the z-scores of the genes. Then using simulated annealing jActiveModules tries to find the highest scoring modules. Given the observed differences and our focus on the flow of perturbation, namely sinks and sources, we think that our contribution is not redundant to the previous work. To strengthen even further our perspective we also worked on the software itself, which now also scores the sources according to their reliability and the potential impact onto the modified sub-module. Since we believe that this improved the usability of the software critically, we would like to thank the reviewers particularly for point us to this.

Again, we thank the reviewer for the valuable input to evaluate the source genes. We acted on this suggestion by amending the software. We have now incorporated an impact score for each source gene, which expresses the percentage of genes in the sub-module which are downstream of the particular source. This should be helpful to prioritize different sources of one sub-module. MSF also performes a t-test for each source gene, testing if the p-values ofthe downstream genes are different from the upstream genes. This would help to see if the source identified indeed marks the border between two different regulation regimes.

The table (Table 1) has been modified. It shows number of MSF identified sub-graphs with the number of genes in it. When we apply different cut-offs to p-values of genes in the MSF identified sub-graph, it shows how they break from larger interpretable sub-graphs to smaller, less interpretable sub-graphs also consisting of single genes.

Section “Comparison to Reactome pathway analysis tool” has been modified. Figure 5 (previously figure 4) setup has been changed to one cut-off only, i.e. 0.05. Ten Toll-like receptor cascades were seen to be enriched from the genes in MSF identified sub-graphs that did not appear in the cut-off gene list. Since MSF uses no cut-off, the sub-graphs identified had genes from Toll-like receptor cascades even when their signal was weak. Figure 5 caption modified for better understanding. A cut-off of 0.05 was used to choose pathways from Reactome pathway analysis for both MSF and cut-off gene list. The nine Toll-like receptor cascades have been mentioned in the manuscript, tenth being Toll-like receptor itself.

Directions have been added to the output of MSF that could be easily imported into Cytoscape. In addition, source impact score and log-fold sign for each individual gene can also be imported into Cytoscape. Functional relationship provided by Reactome FI has been mentioned. The schematic diagram, meant to give our take on the interpretation of the raw MSF output, was removed since we agree on the oversimplification criticism.

The writing of the manuscript has been carefully checked and improved.
First of all, we would like to say thank you for your valuable review and your critical comments.

Thank you very much for making us aware of the paper by Mitra et al and work by Ideker et al. Frankly, these papers skipped our attention. Meanwhile, the review and the tool JactiveModules has been cited in the paper. Using the same expression data and same interaction file we compared the results from MSF and JactiveModules. Although the methods to find the overall score for the modules is similar, there are differences in the sub-graphs identified. The differences seen in the modules identified by the two methods are because MSF starts building the sub-graphs from one gene, incorporating and combining the p-value of the next gene, with the check that the combined p-value of new sub-graph should be better than the original. On the other hand jActiveModules first transforms all the gene’s p-values to z-scores and tries to find connected sets of genes with unexpectedly high levels of differential expression, in this case high z-scores. And then the overall score of the sub-network is calculated by combining the z-scores of the genes. Then using simulated annealing jActiveModules tries to find the highest scoring modules. Given the observed differences and our focus on the flow of perturbation, namely sinks and sources, we think that our contribution is not redundant to the previous work. To strengthen even further our perspective we also worked on the software itself, which now also scores the sources according to their reliability and the potential impact onto the modified sub-module. Since we believe that this improved the usability of the software critically, we would like to thank the reviewers particularly for point us to this.

Again, we thank the reviewer for the valuable input to evaluate the source genes. We acted on this suggestion by amending the software. We have now incorporated an impact score for each source gene, which expresses the percentage of genes in the sub-module which are downstream of the particular source. This should be helpful to prioritize different sources of one sub-module. MSF also performes a t-test for each source gene, testing if the p-values ofthe downstream genes are different from the upstream genes. This would help to see if the source identified indeed marks the border between two different regulation regimes.

The table (Table 1) has been modified. It shows number of MSF identified sub-graphs with the number of genes in it. When we apply different cut-offs to p-values of genes in the MSF identified sub-graph, it shows how they break from larger interpretable sub-graphs to smaller, less interpretable sub-graphs also consisting of single genes.

Section “Comparison to Reactome pathway analysis tool” has been modified. Figure 5 (previously figure 4) setup has been changed to one cut-off only, i.e. 0.05. Ten Toll-like receptor cascades were seen to be enriched from the genes in MSF identified sub-graphs that did not appear in the cut-off gene list. Since MSF uses no cut-off, the sub-graphs identified had genes from Toll-like receptor cascades even when their signal was weak. Figure 5 caption modified for better understanding. A cut-off of 0.05 was used to choose pathways from Reactome pathway analysis for both MSF and cut-off gene list. The nine Toll-like receptor cascades have been mentioned in the manuscript, tenth being Toll-like receptor itself.

Directions have been added to the output of MSF that could be easily imported into Cytoscape. In addition, source impact score and log-fold sign for each individual gene can also be imported into Cytoscape. Functional relationship provided by Reactome FI has been mentioned. The schematic diagram, meant to give our take on the interpretation of the raw MSF output, was removed since we agree on the oversimplification criticism.

The writing of the manuscript has been carefully checked and improved.
Competing Interests: No competing interests were disclosed. Close
Report a concern

Comments on this article Comments (0)

Version 3

VERSION 3 PUBLISHED 29 Aug 2018

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2	3
Version 3 (revision) 14 Apr 19	read	read	read
Version 2 (revision) 22 Mar 19	read		read
Version 1 29 Aug 18	read	read	read

Guanming Wu, Oregon Health and Science University (OHSU), Portland, USA
Stefanie Widder, Research Lab of Infection Biology, Waehringer Guertel 18-20, 1090 Vienna, Austria
Haibo Liu, Iowa State University, Ames, USA

Comments on this article

All Comments(0)

Add a comment

Browse by related subjects

Back to all reports

Reviewer Report

4 Views

29 Apr 2019 | for Version 3

Guanming Wu, Department of Medical Informatics and Clilnical Epidemiology (DMICE) , Oregon Health and Science University (OHSU), Portland, OR, USA

4 Views Cite this report Responses(0)

Approved

Almost all my comments have been addressed satisfactorily.

Competing Interests

No competing interests were disclosed.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

5 Views

15 Apr 2019 | for Version 3

Stefanie Widder, Medical University of Vienna, Department of Medicine 1, Research Lab of Infection Biology, Waehringer Guertel 18-20, 1090 Vienna, Austria

5 Views Cite this report Responses(0)

Approved

The authors have addressed all comments sufficiently, I therefore recommend the manuscript for indexing.

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

networks, microbiome, cell types

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

5 Views

15 Apr 2019 | for Version 3

Haibo Liu, Department of Animal Science, Iowa State University, Ames, IA, USA

5 Views Cite this report Responses(0)

Approved

Almost all my questions/comments were addressed to a satisfactory degree. Now the manuscript is in good shape.

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

bioinformatics and computational biology, transcriptomics and systems biology

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

15 Views

02 Apr 2019 | for Version 2

Guanming Wu, Department of Medical Informatics and Clilnical Epidemiology (DMICE) , Oregon Health and Science University (OHSU), Portland, OR, USA

15 Views Cite this report Responses(1)

Approved With Reservations

This revision has been improved a lot regarding the MSF software and the analysis and discussion of results. This reviewer appreciates the efforts the authors have made to improve the manuscript and the software tool. However, one of the previous major concerns about poor writing has not been addressed carefully. Many typos, grammar errors (especially related to correct use of singular and plural forms of nouns and followed verbs), and other types of errors still exist. There are many examples, here are just some:

In Abstract, “have proved tobe..”should be “have proved to be…”
In Abstract, “changesdue to” should be “changes due to”
In the first paragraph of Introduction, “during an infection providing…” should be “during an infection, providing…”
“Peerreviewed” should be “Peer-reviewed”
“…and combining it to functional pathway annotations” should be “…and combine it …”
“regulating each others expression” should be “regulating each other’s expression”
“to each a impact score and a measure of its reliability is assigned” should be “to each an impact score and a measure of its reliability are assigned”
“git hub” should be “GitHub”
In the equation for t(p), lambda in the first term in the denominator should have a subscript i.
“default 2 gene” should be “default 2 genes”
“the combined p-value of 3 the merged sub-graph”: should 3 be deleted?
“containing the source weightage and the log-fold chances of all considered genes.” Should be “containing the source weights and the log-fold changes of all considered genes.”
“…use MSF has been provided on git hub” should be “…use MSF have been provided on GitHub”
“the detection of IFN-α/β as point of action for the virus, could be”: comma should not be used
Table 1: “Number of connected sub-graph” should be “Number of connected sub-graphs”
“MSF was compared to jActiveModules8 since they use similar…” should be “MSF was compared to jActiveModules8 since it uses similar…”
“jActive-Modules” should be “jActiveModules”.
In the “Reactome pathway analysis” section: Please make sure “lists” are used in many places (not list).
“The here presented tool, MSF, employs a different approach…”: Delete “The” please

Other comments:

In the 'Abstract', authors pointed out “These methods … rely on binary separation into differentially expressed gene and unaffected genes based on an arbitrarily set p-value cut-off.”. However, as authors correctly described, the second type of classic pathway analysis approaches (e.g. GSEA) doesn’t do this. This should be changed.
T-test is usually applied for data having normal distribution or close to normal distribution. Using t-test for pvalues to calculate source genes’ significance is questionable.
In the “Case Study” section, “The modulated sub-graphs consist predominantly of Cytokines, chemokines (CXCL10, CCL8, CXCL9, CXCL11, CXCR4, CCR7, CCL4L1, CCL3L1, CCL4, CCL8, CCL20, CCL3, CCL19) and Interleukin genes (IL6, IL27, IL23).”: Interleukins are a type of cytokines. Therefore, this sentence should be modified.
Human genes should be in upper case. However, genes listed in the Supplement-Material use lower case, for example, https://github.com/Modulated-Subgraph-Finder/MSF/blob/master/Supplement-Material/6H/SourcesAndSinks.text. This should be changed.

Competing Interests

No competing interests were disclosed.

Respond to this report

Responses (1)

Author Response

15 Apr 2019

Mariam R Farman, Institute for Theoretical Chemistry,Theoretical Biochemistry Group,, University of Vienna, Vienna, 1090, Austria

We would like to thank the reviewer once again for the suggestions.

All the text modifications and rephrasing has been done.
The sentence in the abstract contradicting to the approach of GSEA has been changed.
T-test is now performed on the log-transformed p-values of the individual genes, since log-transformation brings the data close to normal distribution.
The sentence in the case study regarding interleukin genes has been changed.
MSF now outputs the human genes in upper case.

View more View less

Competing Interests

No competing interests were disclosed.

Back to all reports

Reviewer Report

12 Views

01 Apr 2019 | for Version 2

Haibo Liu, Department of Animal Science, Iowa State University, Ames, IA, USA

12 Views Cite this report Responses(1)

Approved With Reservations

The 2nd version is improved much over the first version in readability and completeness. However, there are still quite a few language issues, unclear statements, and improper expression of concepts. See the commented manuscript in the PDF format for details.

In addition, the robustness test is not conclusive. Given that RNA-seq data themselves are noisy, I am not sure why it is necessary to add further noise to the RNA-seq data.

In the 2nd version, the authors compared their MSF tools to two existing tools for network or gene set enrichment analyses of an RNA-seq data from an experimental EBOV infection. But it is not clear how MSF outperform the other two tools.

The authors claimed their tool is fast, but no runtime is provided.

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

bioinformatics and computational biology, transcriptomics and systems biology

Respond to this report

Responses (1)

Author Response

15 Apr 2019

Mariam R Farman, Institute for Theoretical Chemistry,Theoretical Biochemistry Group,, University of Vienna, Vienna, 1090, Austria

Thank you again for your helpful comments about our manuscript.

The language issues are thoroughly checked and unclear statements modified.
We agree that the RNA-seq data is already noisy, If you look at the robustness of MSF it varies from 71 % (6 hpi) and 84 % (48 hpi), the numbers are not outstanding but we consider them to be reasonable. This shows that adding noise to the already noisy data the robustness is reasonable, if the data is less noisy the chances to get the truly dis-regulated genes increases.
We agree that the comparison with jActiveModule was not very conclusive since we did not have a golden standard example data to analyse by both the tools and then compare the results. Although MSF and jActiveModule have the same approach and do find the core genes, MSF additionally identifies the sources of the modulations in the networks which jActiveModule does not provide. The comparison to GSEA showed that although both the tools MSF and GSEA use no p-value cut-off lists and identify the true dis-regulated KEGG pathways, MSF out perform GSEA by providing the actual genes causing the dis-regulation from the lists and shows the cross-talk between the different KEGG pathways in the form of networks which GSEA does not provide.
The runtime for MSF has been provided now.

View more View less

Competing Interests

No competing interests were disclosed.

Back to all reports

Reviewer Report

24 Views

29 Nov 2018 | for Version 1

Haibo Liu, Department of Animal Science, Iowa State University, Ames, IA, USA

24 Views Cite this report Responses(1)

Approved With Reservations

First of all, the code for the tool is not available via Github. I carefully checked multiple times the Github repository: https://github.com/Modulated-Subgraph-Finder/MSF, however I can't find the ModulatedSubPathFinder.jar file, which is the Java implementation of the proposed tool, MSF.
There are too many grammar issues and writing issues. Just mentioned a few, in the first paragraph of Introduction, "mechanism" in Line 6 should be plural, while "stimuli" in Line 14 should be "stimulus", "maybe" in Line 15 should be "may be". Careful proof-reading is strongly recommended.
The authors have a few misconceptions. For example, they treated "effectors" and "sinks" equally. In my opinion, effectors include sources, intermediate genes and sinks, i.e., all genes responding to perturbations. The authors think of the significance of statistical tests in the form of p-values as a confidence level of observing an authentic expression change. This might not be correct. Besides small p-values, the magnitude of fold changes is also important metric of authentic expression change. By the way, the fold change of gene expression is always non-negative. The expression of "sign of fold change" is not meaningful. Only log-transformed fold change is signed.
The flow of information/idea is not fluent in many places. For example, at the end of the first paragraph of Introduction, the authors mentioned the KEGG and Reactome Pathways. Then at the beginning of the second paragraph, they gave a detailed description of the two pathway databases, which might be unnecessary and disrupted the flow to set up the stage to introduce why their tool is necessary and useful. Some information about how their tool was implemented given in the last paragraph of Introduction should be moved to the Implementation section of Methods. Paragraph 3 under the section of “Case Study”, the DEGA results might better be described immediately after the second sentence of Paragraph 2. Similarly, some information in Conclusion should be move to the section of “Implementation”.
The title for the section of "Initial modulated sub-graphs” should be “Initializing modulated sub-graphs “. Under this section, “starting with the most significant one” should be “starting with the next most significant one”. “… not yet in a significantly modulated …” should be “…not yet in the significantly modulated …”.
Under the section of “Extending modulated sub-graphs”, it is not clear how the sub-graphs are extended by adding “MORE THAN ONE” gene at a time. If doing this way, there are infinite possibilities. The criterion to accept or reject added genes is not clear.
Under the section of “Merging modulated sub-graphs”, the authors mentioned that “After detection and extension of the modulated sub-graphs, they are tested if combined subgraphs SCORE better than on their own.” At this point, no merging has been done yet, how are the combined subgraphs tested? What is the SCORE used here? How can a depth-first search traverse from the FIRST sub-graph to the SECOND sub-graph before they are merged (Aren’t the subgraphs not necessarily connected?)?
Under the section of “Finding sources & sinks”, “circular loops” should be just “loop”.
Paragraph 2 Under “Case Study”, some details about edgeR-based DEGA are missing. How the directed cell signaling interactions ere filtered from the Reaction FIs? Based on what?
The directions of edges in right panel of Figure 2 should be showed, because the MSF can generate directed subgraphs.
In the last paragraph of the section of “Modulated sub-graphs at 6 hpi”, “This show cast example…” should be “This show case example…”.
The Robustness test seemed to show that the MSF is not robust enough to extra noise. What is the authors’ conclusion and explanation?
The authors compared the results from their tool to those from the Reactome pathway analysis tool and demonstrated better performance of their tool. Can they also compare the results from their tool to those from the GSEA tools, which don’t set arbitrary cutoff beforehand? The latter comparison might be more convincing.
In Discussion, what are “intermediate bottlenecks”? In Conclusion, the authors claimed their tool is “fast, robust and easy-to-use”. However, they did provide evidence to show their tool is fast. The robustness of their tool is not apparent.
Issues with figures: all legend titles are too long. In Figure 1, Texts in the flow chart symbols are not well-written. There are inconsistent case issues; “initial” should be “initializing”; symbol for condition test (“checked all interaction?”) should be diamond, not hexagon. Question marks might be added to condition test for readability. In the legend to Figure 1, what does “without exhaustively testing all connected subgraphs” mean? Does it the output from MSF analysis might not be comprehensive? Legends to Figure 3 and 4 are poorly written. Is the Toll-like receptor signaling one of the 149 shared pathways in Figure 4. It is not clearly described in the legend.
Limitations of their tool: In their implementation, the fold changes and direction of changes are not taken into account. If a resulting subgraph contains both down-regulated genes and up-regulated genes, how should the users interpret it? The authors didn’t test the sensitivity and specificity of their tool in this manuscript.

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

No
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Yes
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Partly

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

bioinformatics and computational biology, transcriptomics and systems biology

Respond to this report

Responses (1)

Author Response

22 Mar 2019

Mariam R Farman, Institute for Theoretical Chemistry,Theoretical Biochemistry Group,, University of Vienna, Vienna, 1090, Austria

Thank you very much for your suggestions.

So far the ModulatedSubPathFinder.jar was only available under the release tag on git hub (https://github.com/MariamFarman/Modulated-SubGraph-Finder/releases). Meanwhile we added the full source code to git hub.
The writing of the manuscript has been carefully checked and improved, especially the examples mentioned by the reviewers.
We agree with the reviewer that we used the word “effector” not very cautiously. Therefore, we changed it accordingly. The reviewer is again right to consider the magnitude of the fold change as an important metric of the system behaviour. We consciously ignore it since it is not straight forward to include it in our model, and we consider that the p-value at least partly reflect the magnitude of fold change since it expresses the probability that the observed fold change differs from 1.
The suggestions are taken into account and the text was modified accordingly. The details about KEGG and Reactome were considered necessary since later the information from both the databases would be used to showcase the results of MSF.
Text modified.
The use of extending sub-graph here is if the sub-graph could not be extending any more by one gene because the direct neighbouring genes have high p-values. To avoid producing fragmented sub-graphs, we try to, instead of single genes, append branches of up to three genes simultaneously. Thereby the accumulated signal of the whole branch can compensate for single unfavourable genes. Since we limit the procedure to branches of length three, the number of possibilities to be tested is limited. Again, a branch is only accepted if the overall score of the sub-module is better after extension. Details about criterion of rejection and acceptance of extension was added to the text.
The wording has been changed for better understanding of the paragraph describing merging of sub-graphs. The score to pass merging is that the combined p-value of the sub-graph after merging two sub-graphs (including connector genes) is smaller than the individual p-values of the two sub-graphs. The depth first search traverse is used to find connectors between the two sub-graphs to merge them.
Text modified.
Details about edgeR were added to the text. The interaction file downloaded from Reactome is filtered for only direct interactions . The tutorial on how the file was filtered is also provided on MSF git hub page.
The directions have been added to the figure and as a output for MSF to import in Cytoscape for easy visualization for the user.
Text modified.
While developing the algorithm we believed MSF would not only give insights into the network modulation but could also be used to increase robustness of DGEA of single genes by using additional information from their neighbours. Our analysis disproved this assumption, which we wanted to communicate with this robustness analysis. Given the comments by several reviewers we adapted the corresponding paragraph, omitting the comparison to the DGEA but showing only the overall robustness of our tool which is, after applying strong noise, still reasonable with a median recall rate of 71 to 84%.
MSF analysis comparison with GSEA analysis added. Although GSEA was able to find pathways known from literature to be dis-regulated during Ebola infection, it could not show the cross-talk between the different pathways like MSF does.
By intermediate bottlenecks we actually meant a gene that is actually connecting a number of sources with a number of sinks and thereby a potential point of intervention if one would like to uncouple the input stimuli from the downstream effects. As stated above the reviewer is right about the robustness, the text has been modified accordingly.
The figure legends have been rewritten for better understanding. Figure 1 chart symbols edited, symbols shapes modified as suggested. With the phrase “Without exhaustively testing all connected sub-graphs” we intended to say that not all possible connectors to merge the graphs are tested but sub-graphs are connected with the first connector that passes the threshold. In figure 5, Toll-like receptor signaling is actually in the 164 shared pathways between different time-points of MSF.
We agree and are aware that magnitude and direction of the fold changes are important. On the magnitude we commented already further above. For the direction, we would like to mention that its interpretation is not straight forward without further information of the type of interaction between the genes, which is not always available. To illustrate, the up-regulation of an inhibitor and the down regulation of an activator can have the same effect on the network.

View more View less

Competing Interests

No competing interests were disclosed.

Back to all reports

Reviewer Report

26 Views

12 Nov 2018 | for Version 1

Stefanie Widder, Medical University of Vienna, Department of Medicine 1, Research Lab of Infection Biology, Waehringer Guertel 18-20, 1090 Vienna, Austria

26 Views Cite this report Responses(1)

Approved

The main line of argumentation gets lost sometimes, in particular in paragraph 1. The narrative would furthermore benefit from actual examples instead of repeating a general statement of ‘changed conditions’. Sometimes, the same argument is repeated in differently phrased sentences.
Paragraph 2: Suppress ‘be subjective to some degree’.
Paragraph 4: Better highlight and delimit the novelty of the presented approach.
Overall, shorter sentences benefit the reader.

Paragraph 1: Readability would profit from focusing the background information, e.g. 'Ebola infection data was (->were) selected (...)' - rephrase into a half-sentence.
Wording: Until now->currently.

Robustness:

The recall-comparison (MSF to differential expression analysis) is weakening the proposed method, because no increase in recall for MSF can be achieved. It would be informative to see also precision stats in comparison. Generally, one would expect more robust statistics on larger subgraphs (MSF vs. DEG groups).
Also, I am not entirely convinced by adding noise to real and thus – already - noisy data. A small, artificial mock example with detailed known outcome (+/- noise) might be more suitable and supportive.

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Partly
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Partly
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Yes

Competing Interests

No competing interests were disclosed.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (1)

Author Response

22 Mar 2019

Mariam R Farman, Institute for Theoretical Chemistry,Theoretical Biochemistry Group,, University of Vienna, Vienna, 1090, Austria

Thank you for your helpful comments about our manuscript.

Paragraph 1 has been rephrased, “changed conditions” has been replaced with example of treated verse healthy example.
Paragraph 2 modified accordingly.
The novelty of MSF is provided in paragraph 5. Firstly, MSF does not use the predefined sets of genes to find the modulated sub-graphs but starts building the sub-graph by using the information from DGEA and interactions from whole cell signaling network. Second MSF considers the signal of the neighbouring genes to find significance of the modulated sub-graph.
We revisited the overall text and tried to emphasis more on clear readability.
More details about Hartung’s methods are provided in the appropriate section.
The case study is rephrased and wording modified.
The reviewer is correct about the recall-comparison weakening the proposed method. We have been very strict with testing the robustness of the method by adding extra noise to already noisy data. We expected the method to be more robust than DGEA but unfortunately that was not the case. Since for cut-off based DGEA robustness it does not matter if a few genes p-values go up or down as long as they are below the chosen cut-off. In contrast, for MSF the robustness analysis showed that it makes a difference for the sub-graphs identified. The reviewer is correct again to expect more robustness in the larger sub-graphs which is shown at later time-point 48 hpi that has larger sub-graphs than the other two time-points. If you look at the robustness of MSF alone it varies from 71 % (6 hpi) and 84 % (48 hpi) , we agree that the numbers are not outstanding but we consider them to be reasonable. Since our assumption that a post analysis with MSF can not only gain insights into the pathway modulation but also improve the DGEA for single genes, using their neighbours information did not hold, we removed that aspect to improve readability.
We agree that the data is already noisy even before adding noise. Again, we have been very strict to test the robustness of the method,. As mentioned above the larger the sub-graphs the more robust they are, we are not sure that having a small mock example would elaborate more.
Figure 5 (previously figure 4) set-up has been changed to one cut-off only i.e. 0.05. Figure label size and texts enhanced. Figure caption modified.
Unfortunately we can not provide the Ebola count files since they belong to Olejnik et al. A tutorial is provided to reproduce the results from the Ebola infection data, with details on how to obtain the raw data used from the third party source (GSE84188). MSF output files are also provided in the supplementary material.

View more View less

Competing Interests

No competing interests were disclosed.

Back to all reports

Reviewer Report

53 Views

06 Nov 2018 | for Version 1

Guanming Wu, Department of Medical Informatics and Clilnical Epidemiology (DMICE) , Oregon Health and Science University (OHSU), Portland, OR, USA

53 Views Cite this report Responses(1)

Not Approved

Is the rationale for developing the new software tool clearly explained?

Partly
Is the description of the software tool technically sound?

Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Yes
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Partly
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Partly

References

Competing Interests

No competing interests were disclosed.

Respond to this report

Responses (1)

Author Response

22 Mar 2019

Mariam R Farman, Institute for Theoretical Chemistry,Theoretical Biochemistry Group,, University of Vienna, Vienna, 1090, Austria

First of all, we would like to say thank you for your valuable review and your critical comments.

Thank you very much for making us aware of the paper by Mitra et al and work by Ideker et al. Frankly, these papers skipped our attention. Meanwhile, the review and the tool JactiveModules has been cited in the paper. Using the same expression data and same interaction file we compared the results from MSF and JactiveModules. Although the methods to find the overall score for the modules is similar, there are differences in the sub-graphs identified. The differences seen in the modules identified by the two methods are because MSF starts building the sub-graphs from one gene, incorporating and combining the p-value of the next gene, with the check that the combined p-value of new sub-graph should be better than the original. On the other hand jActiveModules first transforms all the gene’s p-values to z-scores and tries to find connected sets of genes with unexpectedly high levels of differential expression, in this case high z-scores. And then the overall score of the sub-network is calculated by combining the z-scores of the genes. Then using simulated annealing jActiveModules tries to find the highest scoring modules. Given the observed differences and our focus on the flow of perturbation, namely sinks and sources, we think that our contribution is not redundant to the previous work. To strengthen even further our perspective we also worked on the software itself, which now also scores the sources according to their reliability and the potential impact onto the modified sub-module. Since we believe that this improved the usability of the software critically, we would like to thank the reviewers particularly for point us to this.
Again, we thank the reviewer for the valuable input to evaluate the source genes. We acted on this suggestion by amending the software. We have now incorporated an impact score for each source gene, which expresses the percentage of genes in the sub-module which are downstream of the particular source. This should be helpful to prioritize different sources of one sub-module. MSF also performes a t-test for each source gene, testing if the p-values ofthe downstream genes are different from the upstream genes. This would help to see if the source identified indeed marks the border between two different regulation regimes.
The table (Table 1) has been modified. It shows number of MSF identified sub-graphs with the number of genes in it. When we apply different cut-offs to p-values of genes in the MSF identified sub-graph, it shows how they break from larger interpretable sub-graphs to smaller, less interpretable sub-graphs also consisting of single genes.
Section “Comparison to Reactome pathway analysis tool” has been modified. Figure 5 (previously figure 4) setup has been changed to one cut-off only, i.e. 0.05. Ten Toll-like receptor cascades were seen to be enriched from the genes in MSF identified sub-graphs that did not appear in the cut-off gene list. Since MSF uses no cut-off, the sub-graphs identified had genes from Toll-like receptor cascades even when their signal was weak. Figure 5 caption modified for better understanding. A cut-off of 0.05 was used to choose pathways from Reactome pathway analysis for both MSF and cut-off gene list. The nine Toll-like receptor cascades have been mentioned in the manuscript, tenth being Toll-like receptor itself.
Directions have been added to the output of MSF that could be easily imported into Cytoscape. In addition, source impact score and log-fold sign for each individual gene can also be imported into Cytoscape. Functional relationship provided by Reactome FI has been mentioned. The schematic diagram, meant to give our take on the interpretation of the raw MSF output, was removed since we agree on the oversimplification criticism.
The writing of the manuscript has been carefully checked and improved.

View more View less

Competing Interests

No competing interests were disclosed.

Alongside their report, reviewers assign a status to the article:

Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested

Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.

Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions

[1] Bayerlová M, Jung K, Kramer F, et al.: Comparative study on gene set and pathway topology-based enrichment methods. BMC Bioinformatics. 2015; 16(1): 334. PubMed Abstract | Publisher Full Text | Free Full Text

[2] Cárdenas WB, Loo YM, Gale M Jr, et al.: Ebola virus vp35 protein binds double-stranded RNA and inhibits alpha/beta interferon production induced by RIG-I signaling. J Virol. 2006; 80(11): 5168–5178. PubMed Abstract | Publisher Full Text | Free Full Text

[3] Fabregat A, Jupe S, Matthews L, et al.: The Reactome Pathway Knowledgebase. Nucleic Acids Res. 2018; 46(D1): D649–D655. PubMed Abstract | Publisher Full Text | Free Full Text

[4] Falasca L, Agrati C, Petrosillo N, et al.: Molecular mechanisms of Ebola virus pathogenesis: focus on cell death. Cell Death Differ. 2015; 22(8): 1250–9. PubMed Abstract | Publisher Full Text | Free Full Text

[5] Farman M: Modulated-Subgraph-Finder/MSF V.1 (Version V.1). Zenodo. 2018. http://www.doi.org/10.5281/zenodo.1400242

[6] García-Campos MA, Espinal-Enríquez J, Hernández-Lemus E: Pathway Analysis: State of the Art. Front Physiol. 2015; 6: 383. PubMed Abstract | Publisher Full Text | Free Full Text

[7] Hartung J: A note on combining dependent tests of significance. Technical report, Technical Report, SFB 475: Komplexitätsreduktion in Multivariaten Datenstrukturen, Universität Dortmund, 1998. Reference Source

[8] Kanehisa M, Goto S: Kegg: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000; 28(1): 27–30. PubMed Abstract | Publisher Full Text | Free Full Text

[9] Khatri P, Sirota M, Butte AJ: Ten years of pathway analysis: current approaches and outstanding challenges. PLoS Comput Biol. 2012; 8(2): e1002375. PubMed Abstract | Publisher Full Text | Free Full Text

[10] Konde MK, Baker DP, Traore FA, et al.: Interferon β-1a for the treatment of Ebola virus disease: A historically controlled, single-arm proof-of-concept trial. PLoS One. 2017; 12(2): e0169255. PubMed Abstract | Publisher Full Text | Free Full Text

[11] Love MI, Huber W, Anders S: Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014; 15(12): 550. PubMed Abstract | Publisher Full Text | Free Full Text

[12] Malone JH, Oliver B: Microarrays, deep sequencing and the true measure of the transcriptome. BMC Biol. 2011; 9(1): 34. PubMed Abstract | Publisher Full Text | Free Full Text

[13] Michalak P: Coexpression, coregulation, and cofunctionality of neighboring genes in eukaryotic genomes. Genomics. 2008; 91(3): 243–248. PubMed Abstract | Publisher Full Text

[14] Morris J, Jensen LJ, Doncheva NT: stringApp 1.3.0. [Online; accessed 19-Februar-2018]. 2018. Reference Source

[15] Olejnik J, Forero A, Deflubé LR, et al.: Ebolaviruses Associated with Differential Pathogenicity Induce Distinct Host Responses in Human Macrophages. J Virol. 2017; 91(11): pii: e00179-17. PubMed Abstract | Publisher Full Text | Free Full Text

[16] Prins KC, Cárdenas WB, Basler CF: Ebola virus protein vp35 impairs the function of interferon regulatory factor-activating kinases IKKepsilon and TBK-1. J Virol. 2009; 83(7): 3069–3077. PubMed Abstract | Publisher Full Text | Free Full Text

[17] Rhein BA, Powers LS, Rogers K, et al.: Interferon-γ Inhibits Ebola Virus Infection. PLoS Pathog. 2015; 11(11): e1005263. PubMed Abstract | Publisher Full Text | Free Full Text

[18] Robinson MD, McCarthy DJ, Smyth GK: edger: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010; 26(1): 139–140. PubMed Abstract | Publisher Full Text | Free Full Text

[19] Schnoes AM, Ream DC, Thorman AW, et al.: Biases in the experimental annotations of protein function and their effect on our understanding of protein function space. PLoS Comput Biol. 2013; 9(5): e1003063. PubMed Abstract | Publisher Full Text | Free Full Text

[20] Shannon P, Markiel A, Ozier O, et al.: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003; 13(11): 2498–2504. PubMed Abstract | Publisher Full Text | Free Full Text

[21] Veljkovic V, Glisic S, Muller CP, et al.: In silico analysis suggests interaction between Ebola virus and the extracellular matrix. Front Microbiol. 2015; 6: 135. PubMed Abstract | Publisher Full Text | Free Full Text

[22] Wu G, Feng X, Stein L: Reactome FIs. [Online; Version 2016]. 2016. Reference Source

MSF: Modulated Sub-graph Finder

Abstract

Keywords

Introduction

Methods

Implementation

Overview of our method

Figure 1. Graphical representation of the Modulated Sub-graph Finder (MSF) heuristical approach to detect modulated sub-graphs in a global gene regulatory network without exhaustively testing all connected sub-graphs.

Results

Case Study

Table 1. Comparison of connected sub-graphs of modulated genes in the global network between the analysis results of Modulated Sub-graph Finder (MSF) and mapping the raw list of differentially expressed genes from the standard edgeR analysis, applying different p-value cut-offs, onto the network.

Modulated sub-graphs at 6 hpi

Figure 2. Visualisation of the three modulated sub-graphs identified by Modulated Sub-graph Finder (MSF) at 6 h after Ebola Virus (EBOV) infection in gene detail and their epitomized representation to depict the flow of perturbation in the directed network.

Robustness

Comparison to Reactome pathway analysis tool

Figure 4. The Upset plot shows the number of shared pathways between different time-point and cut offs.

Discussion

Conclusions

Data availability

Software availability

Grant information

Supplementary material

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated

Figure 1. Graphical representation of the `Modulated Sub-graph Finder (MSF)` heuristical approach to detect modulated sub-graphs in a global gene regulatory network without exhaustively testing all connected sub-graphs.

Figure 2. Visualisation of the three modulated sub-graphs identified by `Modulated Sub-graph Finder (MSF)` at 6 h after Ebola Virus (EBOV) infection in gene detail and their epitomized representation to depict the flow of perturbation in the directed network.