ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Software Tool Article

CoNet app: inference of biological association networks using Cytoscape

[version 1; peer review: 2 approved with reservations]
PUBLISHED 27 Jun 2016
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Cytoscape gateway.

Abstract

Here we present the Cytoscape app version of our association network inference tool CoNet. Though CoNet was developed with microbial community data from sequencing experiments in mind, it is designed to be generic and can detect associations in any data set where biological entities (such as genes, metabolites or species) have been observed repeatedly. The CoNet app supports Cytoscape 2.x and 3.x and offers a variety of network inference approaches, which can also be combined. Here we briefly describe its main features and illustrate its use on microbial count data obtained by 16S rDNA sequencing of arctic soil samples. The CoNet app is available at: http://apps.cytoscape.org/apps/conet.

Keywords

network generation, network construction, network inference, association networks, microbial networks, CoNet, Cytoscape

Introduction

Modern sequencing technology in combination with dedicated analysis pipelines allows determining the relative abundances of microbial community members, thereby obtaining microbial count data. Such community profiling experiments have been carried out for thousands of samples from a variety of ecosystems, ranging from the world’s oceans (Bork et al., 2015) to the human gut (Falony et al., 2016; The Human Microbiome Project Consortium, 2012).

The analysis of species abundance patterns has a long tradition in ecology (Connor & Simberloff, 1979; Diamond, 1975; Gotelli & McCabe, 2002). More specifically, co-occurrence analysis detects significant co-occurrences or mutual exclusions across samples, which are interpreted as representing ecological relationships such as mutualism or competition or being due to similar responses to environmental factors. Co-occurrence analysis is an instance of network inference, an exploratory data analysis technique that attempts to unravel relationships between objects from repeated observations. The large number of microbial count tables resulting from the multitude of recent sequencing projects (e.g. Bork et al., 2015; Falony et al., 2016; Gilbert et al., 2014; The Human Microbiome Project Consortium, 2012) opens the way to unraveling the complex relationships between microorganisms from their abundances across samples. CoNet was developed to carry out microbial network inference, but its generic design makes it applicable to any data set where objects have been observed repeatedly.

Methods/Implementation

The CoNet app wraps the CoNet command line tool. The command line and Cytoscape 2.× app version are implemented in Java 1.6, whereas the Cytoscape 3.× app version requires Java 1.7.

Implementation challenges and decisions

In general, the CoNet app is designed with minimum contact to Cytoscape, to ensure consistent behavior across different Cytoscape versions and to ease porting to future Cytoscape versions. The CoNet app is linked to Cytoscape only via its main menu and graph visualization classes. The Cytoscape-version-specific implementation of the graph visualization class is loaded via reflection at run time and is entirely separated from graph generation.

A major challenge for the implementation of the CoNet app is inclusion of the large number of options available in CoNet, which allows users to customize each network inference step, from data preprocessing via threshold setting, network construction and assessment of significance. This problem was solved by implementing a single user input handling class, which collects and checks user input from the various menus and submits it to CoNet once the GO button is pushed. This design allows to export and to read in user settings files, which make experiments carried out with the CoNet app more reproducible.

Another challenge is the command line support. Network inference from large data sets is not feasible within Cytoscape and CoNet is best run on command line for these cases. To facilitate this step for the inexperienced user, the current settings of the CoNet app can be exported as a command line call, by clicking the "Generate command line call" button. This call can then be executed on command line by including the CoNet jar file in the class path. Networks generated on command line can be loaded either via Cytoscape network import functions (if saved in gml format ((Himsolt)) or more conveniently via the CoNet app (if saved in the custom gdl format). The CoNet app's manual includes a step-by-step tutorial for command line usage.

The CoNet app also integrates the popular network inference R Bioconductor package minet (Meyer et al., 2008). We decided to integrate it loosely via Rserve, a Java-R bridge capable of transferring R objects to Java and vice versa (http://rforge.net/Rserve/). Thus, advanced users can install and launch the Rserve server in R and configure the Rserve client settings (i.e. host and port) in CoNet app's configuration menu. The CoNet app's manual explains Rserve installation and usage.

Finally, we also implemented solutions for error and help display. The CoNet app displays help pages in html format, which allows the user to follow links within these pages. The CoNet app's pdf manual is compiled from the help pages using prince (http://www.princexml.com/). Each menu is linked to its specific help page, easing navigation.

When an error has been captured, an error report is generated that includes the error message as well as the CoNet app's current settings.

Network inference workflow

CoNet takes a presence/absence, count or abundance matrix as input, where rows represent the objects of interest and columns their observations across locations or time points. Optionally, a second input matrix can be provided. This is of interest when two different measurements have been made for the same samples, for instance counts of microorganisms and concentrations of metabolites. CoNet's output consists of a network where significantly associated objects are connected by edges.

Depending on the data type, a number of filters needs to be applied. For instance, for 16S rDNA count data, taxa with too few non-zero observations need to be removed and the data needs to be normalized or rarefied to account for sequencing depth differences. In the next step, the user can select from a number of different correlations (Pearson, Spearman, Kendall), similarities (mutual information, Steinhaus, distance correlation etc.) or dissimilarities (Kullback Leibler, Euclidean, Bray Curtis, Jensen-Shannon etc.) to score the association strength between the objects. For presence/absence (also termed incidence) data, the hypergeometric distribution or Jaccard distance can be chosen for the same purpose. CoNet's special strength is its capability to combine multiple such measures and/or to combine these measures with other network inference algorithms, e.g. those implemented in minet. The idea behind such an ensemble approach to network inference is to exploit the fact that different methods make different mistakes. If erroneous edges predicted by one method are not supported by the others, they can be filtered out, thereby reducing the number of false positives. The thresholds for the measures can be either set manually (using sliding windows for bounded measures) or automatically, by specifying the desired number of edges in the output network. The network can then be displayed either as a multigraph (with as many edges between two objects as selected measures) or as a graph (where scores of individual measures are combined). Optionally, the significance of the associations can be computed, e.g. with a permutation test. Multiple testing correction can be performed with either Bonferroni or Benjamini-Hochberg procedures. Figure 1 summarizes this workflow.

7968dcee-8c6a-4181-93fb-f5b5fbe313fc_figure1.gif

Figure 1. Network inference workflow in CoNet.

Special features

CoNet offers a series of features that distinguish it from other network inference tools, such as its support for object groups. This feature allows a user to assign objects to different groups (e.g. metabolites and enzymes). Relationships can then be inferred only between different object types (resulting in a bipartite network) or only within the same object type. CoNet's treatment of two input matrices is built upon this feature.

Furthermore, CoNet can handle row metadata, which allows for instance to infer links between objects at different hierarchical levels (e.g. between order Lactobacillales and genus Ureaplasma) while preventing links between different levels of the same hierarchy (e.g. Lactobacillales and Lactobacillaceae). CoNet can also parse sample metadata such as temperature or oxygen concentration, which are then correlated with the objects in the input matrix while being excluded from normalization. In addition, CoNet recognizes abundance tables generated from biom files (McDonald et al., 2012) and, in its Cytoscape 3.× version, reads biom files in HDF5 format directly, using the BiomIO Java library (Ladau). Phylogenetic lineages in these tables are automatically parsed and displayed as node attributes of the resulting network. CoNet also computes a few node properties, such as a node's total edge number as well as the number of positive and negative edges, the total row sum and the number of samples in which the object was observed (e.g. was different from zero or a missing value).

To ease the selection of suitable preprocessing steps, CoNet can display input matrix properties and recommendations based on them. Importantly, CoNet can also handle missing values, by omitting sample pairs with missing values from the association strength calculation. Finally, CoNet supports a few input and output network formats absent in Cytoscape, including adjacency matrices (import), dot (the format of GraphViz (http://www.graphviz.org/)) and VisML (VisANT's format (Hu et al., 2013)) (both for export).

Results

Use case: microbial relationships in the arctic soil

We demonstrate the abilities of the CoNet app on a real-world example taken from the Qiita database (The Qiita Development Team, 2015). The Qiita database, which merges the previously separated QIIME and EMP databases, is a rich resource for processed 16S rDNA sequence data: each study is accompanied by a microbial count file in biom format computed from the raw sequence data with the QIIME pipeline (Caporaso et al., 2010).

In our example, we will demonstrate how to build an association network from microbial count data obtained from arctic soil samples (Chu et al., 2010). This data set was chosen for its sample number (sufficient to compute associations but short run times) as well as for the biological insights that are gained from the network analysis. The example showcases the CoNet app's ability to compute associations between higher taxonomic levels and to take environmental metadata into account, which is important for the interpretation of predicted microbial relationships.

In the Qiita database, the arctic soil study can be found under the title "Soil bacterial diversity in the Arctic is not fundamentally different from that found in other biomes" (study identifier: 104, see Supplementary material). This data set consists of 52 soil samples from the arctic tundra, which were sequenced with Roche FLX using primers targeting the V1V2 region of the 16S rDNA. The processed data can be downloaded from the Qiita study page (in Data Types, click on 16S, then click on the URL appearing below, expand the Files network, click on the file object containing BIOM in its name and then download the file with suffix .biom). The study also provides a mapping file with sample metadata (on the Qiita study page, click Sample Information and then the Sample Info button). We extract the pH of each sample by loading the sample information file into Excel, selecting the sample_name and ph columns and saving them to a separate, tab-delimited file.

Combining multiple measures

The CoNet app is composed of the main window and several menus, including a "Data" menu with input and output options, a "Preprocessing and filter" menu, a "Methods" menu to select network construction methods, a "Merge" menu where the user can specify how results from different network construction methods should be merged, a "Randomization" menu for the assessment of edge significance and finally a "Config" menu for configuration.

In the following, we will build a network from the arctic tundra biom file. First, in the "Data" menu, the arctic tundra biom file is selected and the option "Biom file in HDF5" is enabled (direct biom file parsing is only supported in the Cytoscape 3.× version of the CoNet app). In the sub-menu "Metadata and Features", the option "explore links between higher-level taxa" is enabled together with the option "Parent-child exclusion" to compute correlations between higher-level taxa while preventing edges between taxa within the same lineage (e.g. Lactobacillales and Lactobacillaceae). Sample metadata (pH in this case) are passed to the CoNet app via the "Select file" button in the "Features" corner of the "Metadata and Features" sub-menu. Both "Transpose" and "Match samples" need to be enabled to convert sample metadata into rows and to match sample metadata identifiers to biom file identifiers.

In the "Preprocessing and filtering" menu, the parameter "row_minocc" is set to 20 to discard taxa with less than 20 non-zero values across samples. The sum of the discarded rows can be kept by enabling "Keep sum of filtered rows". In addition, "col_norm" is activated to divide each matrix entry by the sum of its corresponding column, thus avoiding the inference of spurious links due to sequencing depth differences.

In the "Methods" menu, Pearson, Spearman, Bray Curtis, Kullback Leibler and mutual information are selected. Their thresholds can be automatically set such that 1,000 top-scoring and 1,000 bottom-scoring edges (for anti-correlations) are included for each measure in the initial network, by typing "1000" as the value of the edge selection parameter and enabling "Top and bottom" in the "Threshold setting" sub-menu. At this stage, pushing "GO" will result in a multigraph, where microbial taxa are connected by up to five different measure-specific edges.

Assessment of edge significance

The significance of edges, that is their p-values, is computed in two CoNet launches, the first of which generates the permutation distributions and an intermediate network and the second the bootstrap distributions and the final network.

For the first launch, the user selects the "edgeScores" routine in the "Randomization" menu, with "shuffle_rows" as resampling parameter, and enables "Renormalize" (for details on renormalization, see Faust et al., 2012). The user then specifies a folder and a file name to export permutation scores and enables "Save randomizations" in the "Save" corner of the "Randomization" menu. Pushing "GO" will then launch the computation of edge- and measure-specific permutation distributions. Permutation alone is sufficient to set p-values on the edges, but we found that a combination of permutation and bootstrap is more stringent (Faust et al., 2012). Thus, the network generated in this first step should be considered as an intermediate result.

In order to compute bootstrap distributions and the final network, the user prepares a second CoNet launch, by selecting the "bootstrap" resampling method and a p-value merging method, for instance "brown" (Brown 1975), in the "Randomization" menu. P-value merging will unite measure-specific p-values for the same edge into a single edge-specific p-value. "Renormalize" is disabled and "benjaminihochberg" is selected as the multiple testing correction method. In the "Save" corner of the "Randomization" menu, another file name should be specified to store bootstrap distributions in a separate file. P-values of the final network are computed from both permutation and bootstrap distributions, thus previously generated permutation distributions have to be loaded into the CoNet app. This is done by selecting the permutation file generated in the previous step with the "Load null distributions" button. Pushing "GO" will then result in the final network, shown in Figure 2A.

7968dcee-8c6a-4181-93fb-f5b5fbe313fc_figure2.gif

Figure 2.

A: Result network obtained for bacterial counts from the arctic soil 16S rDNA example data set, downloaded from the Qiita database. B: Same as A, but with negative edges discarded. The remaining edges form clusters with different microbial composition. C: Neighbors of the pH node form two clusters: one correlated and the other anti-correlated to pH, which reflects the opposite pH preferences of the cluster members.

The CoNet app does not layout resulting networks, to leave the choice of the (potentially time-consuming) layout algorithm to the user. Here, the "Organic" layout from yFiles was applied and nodes were colored according to their class using Cytoscape's node coloring functionality.

Once permutation and bootstrap distributions have been computed, network generation can be quickly repeated by loading both distributions via the "Load null distributions" and "Load randomization file" buttons, respectively. Figure 2B shows the same network re-generated from pre-computed distributions, but with "positive edges only" enabled in the "Preprocessing and filter" menu. Figure 2C displays the neighbors of the pH node, which were selected and instantiated as a separate network using Cytoscape's node selection function "First neighbors of selected nodes" for undirected networks.

The computation of permutation and bootstrap distributions took ~5 minutes each for 100 iterations on a standard laptop.

Input and settings files for the use case can be found in the Supplementary material.

Discussion

Insights into arctic soil microbiota

After removal of negative edges, the arctic soil network forms two prominent clusters (Figure 2B), which are enriched with representatives of different classes, such that one cluster features mostly members of the Solibacteres and Acidobacteria, whereas the other consists mostly of Alphaproteobacteria and Chloracidobacteria. When examining the neighbors of the pH node (Figure 2C), members of the former cluster are found to be anti-correlated to pH, whereas members of the latter are correlated to it. Thus, network analysis helps to identify pH as a major driving factor for microbial soil communities, as has been found previously (Fierer & Jackson, 2006). The correlations with pH have also been described by the authors of the soil study (Chu et al., 2010). However, network analysis adds more details (correlations are computed on lower taxonomic levels) and discovers additional taxonomic groups impacted by pH, e.g. Chloracidobacteria. Furthermore, network inference suggests candidates for cross-feeding. For instance, the neighboring nodes of Bradyrhizobium, a nitrogen fixer that produces ammonium, may represent taxa that depend on ammonium as main nitrogen source.

Related apps

The CoNet app offers mostly similarity-based network inference. Complementary apps that implement various Bayesian network inference algorithms are Cyni Toolbox (http://www.proteomics.fr/Sysbio/CyniProject), bayelviraApp (http://apps.cytoscape.org/apps/bayelviraapp) and MONET (Lee & Lee, 2005). ARACNE (http://apps.cytoscape.org/apps/aracne) exploits mutual information to build networks (Margolin et al., 2006). ExpressionCorrelation (http://www.baderlab.org/Software/ExpressionCorrelation) and MetaNetter (http://apps.cytoscape.org/apps/metanetter) also offer similarity-based network inference techniques, in case of the former specialized to gene expression and in the latter to metabolomics data. Results from these different network inference approaches could be combined with Cytoscape tools such as Merge Networks.

Conclusion

In this article, we have demonstrated the CoNet app on a typical 16S data set. Alternative use cases are for instance the inference of function networks (i.e. co-occurrence of orthologous gene groups) from metagenomics or metatranscriptomics data or taxon-metabolite networks from 16S and metabolomics data.

We hope that CoNet's integration into Cytoscape will lower the barrier for its employment by users less familiar with the command line version. Due to its flexibility and comprehensiveness, CoNet can be useful in a variety of applications and we thus hope it will find a broad user base.

Software availability

CoNet app page: http://apps.cytoscape.org/apps/conet

CoNet tool web page: http://systemsbiology.vub.ac.be/conet

Latest source code: http://sourceforge.net/projects/conet/

Archived source code as at the time of publication: Zenodo, Biological network inference in Cytoscape, doi: 10.5281/zenodo.55715 (Faust & Raes, 2016)

License: GNU General Public License version 2.0

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 27 Jun 2016
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Faust K and Raes J. CoNet app: inference of biological association networks using Cytoscape [version 1; peer review: 2 approved with reservations]. F1000Research 2016, 5:1519 (https://doi.org/10.12688/f1000research.9050.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 27 Jun 2016
Views
49
Cite
Reviewer Report 12 Sep 2016
Alexander Eiler, Department of Ecology and Genetics, Uppsala University, Uppsala, Sweden 
Approved with Reservations
VIEWS 49
Be more precious, here. "The idea behind such an ensemble approach to network inference is to exploit the fact that different methods make different mistakes." 

These are different statistical inferences so based on the underlaying algorithms results ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Eiler A. Reviewer Report For: CoNet app: inference of biological association networks using Cytoscape [version 1; peer review: 2 approved with reservations]. F1000Research 2016, 5:1519 (https://doi.org/10.5256/f1000research.9740.r15254)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 14 Oct 2016
    Karoline Faust, VUB, Belgium
    14 Oct 2016
    Author Response
    "These are different statistical inferences so based on the underlaying algorithms results will be different."
    Thanks for pointing this out. We have now added an overview table of the strengths ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 14 Oct 2016
    Karoline Faust, VUB, Belgium
    14 Oct 2016
    Author Response
    "These are different statistical inferences so based on the underlaying algorithms results will be different."
    Thanks for pointing this out. We have now added an overview table of the strengths ... Continue reading
Views
62
Cite
Reviewer Report 13 Jul 2016
Paul Wilmes, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Luxembourg City, Luxembourg 
Anna Heintz-Buschart, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Luxembourg City, Luxembourg 
Approved with Reservations
VIEWS 62
The article describes a Cytoscape plugin “CoNet app” designed for the inference of networks from microbial abundance or incidence matrices. The effort combining a versatile network inference tool with a user-friendly and widely used network visualization and analysis framework, such ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Wilmes P and Heintz-Buschart A. Reviewer Report For: CoNet app: inference of biological association networks using Cytoscape [version 1; peer review: 2 approved with reservations]. F1000Research 2016, 5:1519 (https://doi.org/10.5256/f1000research.9740.r14620)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 14 Oct 2016
    Karoline Faust, VUB, Belgium
    14 Oct 2016
    Author Response
    "The effort combining a versatile network inference tool with a user-friendly and widely used network visualization and analysis framework, such as Cytoscape is very valuable to the community."
     
    We ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 14 Oct 2016
    Karoline Faust, VUB, Belgium
    14 Oct 2016
    Author Response
    "The effort combining a versatile network inference tool with a user-friendly and widely used network visualization and analysis framework, such as Cytoscape is very valuable to the community."
     
    We ... Continue reading

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 27 Jun 2016
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.