Differentially correlated genes in co-expression networks control phenotype transitions

Background: Co-expression networks are a tool widely used for analysis of “Big Data” in biology that can range from transcriptomes to proteomes, metabolomes and more recently even microbiomes. Several methods were proposed to answer biological questions interrogating these networks. Differential co-expression analysis is a recent approach that measures how gene interactions change when a biological system transitions from one state to another. Although the importance of differentially co-expressed genes to identify dysregulated pathways has been noted, their role in gene regulation is not well studied. Herein we investigated differentially co-expressed genes in a relatively simple mono-causal process (B lymphocyte deficiency) and in a complex multi-causal system (cervical cancer). Methods: Co-expression networks of B cell deficiency (Control and BcKO) were reconstructed using Pearson correlation coefficient for two mus musculus datasets: B10.A strain (12 normal, 12 BcKO) and BALB/c strain (10 normal, 10 BcKO). Co-expression networks of cervical cancer (normal and cancer) were reconstructed using local partial correlation method for five datasets (total of 64 normal, 148 cancer). Differentially correlated pairs were identified along with the location of their genes in BcKO and in cancer networks. Minimum Shortest Path and Bi-partite Betweenness Centrality where statistically evaluated for differentially co-expressed genes in corresponding networks. Results: We show that in B cell deficiency the differentially co-expressed genes are highly enriched with immunoglobulin genes (causal genes). In cancer we found that differentially co-expressed genes act as “bottlenecks” rather than causal drivers with most flows that come from the key driver genes to the peripheral genes passing through differentially co-expressed genes. Using in vitro knockdown experiments for two out of 14 differentially co-expressed genes found in cervical cancer (FGFR2 and CACYBP), we showed that they play regulatory roles in cancer cell growth. Conclusion: Identifying differentially co-expressed genes in co-expression networks is an important tool in detecting regulatory genes involved in alterations of phenotype.

Recent technological advances have moved the focus of biologists from how to measure biological parameters to how to analyze and interpret tens of thousands of measurements, frequently called omics data. The first solutions for such a problem were limited to hierarchical clustering [1][2][3] and simple comparisons between classes of data through the identification of differentially expressed genes (DEGs) 4,5 . Nowadays, reconstruction and interrogation of biological networks have become a widely used approach to get insights from different types of omics data 6,7 .
After establishing co-expression networks for different states of one biological system, differential co-expression analysis investigates their structural changes when a system goes through a state transition. This analysis, first proposed more than a decade ago 8,9 , identifies the pairs of genes that have their interaction changed during such transition. Several later publications have suggested different algorithms and statistics to determine differential gene co-expression [10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26][27] . Fewer studies, however, attempted to evaluate the biological significance of these changes 18,21 . Also, to the best of our knowledge, there have been no studies that would investigate how this approach performs depending on the type and complexity of the biological system analyzed.
Commonly, a state transition of a biological system is related to perturbation of a set of genes, which propagates through network interactions and affects other genes. Thus, there is a possibility that differentially co-expressed (DC) genes (directly or indirectly) contribute to the propagation of perturbations. In order to investigate the role of DC genes in a state transition of a biological system, we considered two biological processes 28,29 previously analyzed by our group. The first one (B cell deficiency in mice) is a homogenous, one-causal-factor process, while the second one (cervical cancer) represents a heterogeneous multi-causal system.
In this work, a co-expression network is an undirected graph, where the set of nodes consists of a set of DEGs, and a pair of nodes is connected if there is a significant correlation between them. Differential co-expression analysis is done by identifying the pairs of genes that suffer significant changes in correlation between two states. Throughout this paper such pairs are called differentially correlated pairs (DCPs) and the genes forming these pairs are considered DC genes.

B cell deficiency
We started by analyzing the B cell knockout (BcKO) data 28 , which represents a relatively simple experimental model with only one causal factor (B lymphocytes) and homogenous subject groups since this experiment was performed in highly inbred strains of mice.
In order to select the nodes to reconstruct the co-expression networks (BcKO and Control) we compared gene expression in jejunum between BcKO and control mice and found 509 DEGs (Dataset 1). Next, the edges for each network were determined using significantly correlated pairs of DEGs ( Figure 1). To identify DCPs we used the method introduced in 21 which compares correlations in the BcKO group and in the Control group. Eighty DCPs were found (Dataset 2), of which 56 represent correlation gains (edges which were not present in Control network but showed up in BcKO) and 24 represent losses.  Contains information such as "change direction" (whether each pair gained or lost correlation/edge), "sign of local partial correlation" in BcKO data and control data, "regulation" (whether each gene of each pair is up-or down-regulated in BcKO), "number of Ig genes" in each pair.  (Figure 2A). Moreover, we found strong enrichment for Ig genes among DC genes in correlation gain (24% (15 out of 63) of DC genes are Ig genes vs 2.7% (11 out of 415) of other DEGs are Ig genes), while no enrichment was observed for correlation lost as a result of B cell deficiency ( Figure 2B). Thus, these results support the idea that differentially expressed genes that acquire correlations during transition from one biological state to another have a high chance to play causal roles in such transition.

Cervical cancer
Analysis of gene expression data. In order to study differentially co-expressed genes in a more complex biological model we turned to cancer. It is well known that cancers of the same clinically/ morphological type can be very different on molecular levels. One of the most studied causes for such diversity is the different sets of chromosomal aberrations and mutations harbored by tumors otherwise defined as the same cancer. In previous study 29 , we have found 36 cervical cancer driver genes located in multiple chromosomal aberrations (Dataset 4). Thus we decided to use cervical cancer data from 29 for investigation of the role of DCPs in complex biological processes due to its heterogeneity and previously acquired knowledge of essential causal genes. Contains the chromosomal aberration genes considered causal along with annotation and whether they are considered DC genes or not.
We used the DEGs between tumor and normal tissue as the nodes of the co-expression networks. Since the number of samples (five datasets, 148 tumor samples and 67 normal samples) was larger than in BcKO study (two datasets, 22 paired samples), we used the partial correlation coefficient as a measure of co-expression ( Figure 3). The potential advantage of using partial correlation is that it aims to infer edges that are a result of direct regulatory relations 6 . Partial correlations were calculated through the Local Partial Correlation (LCP) method 30 (Material and Methods).
In this study seven DCPs composed of 14 DC genes were found. Interestingly, all DCPs were differential correlations gained in tumors (Table 1). Only one of the 36 key drivers (CEP70) was identified among the 14 DC genes. Accordingly, no enrichment of key driver genes among DC genes was detected in this analysis.
Even though we observed that DCPs are not necessarily formed by key drivers, it is known from literature that most of the DC genes found play regulatory roles in other types of cancer 31-48 . Thus we hypothesized that DCPs are located downstream of key drivers and can be responsible for changes of regulatory chain events coming from the key drivers and spreading throughout the network. In order to verify this hypothesis, we investigated how close DC genes are to key drivers and whether their "signal flow" 49 in the tumor coexpression network is stronger than that of the other genes. In order to verify this hypothesis we investigated two network measures: Minimum Shortest Path and Bi-partite Betweenness Centrality.
First we compared the shortest paths from key driver genes to DC genes and to all other DEGs in the network. We found that DC genes are located statistically closer than the rest of genes in the network to key drivers ( Figure 4A, Wilcoxon test < 0.014 and Permutation test < 0.021). Then we used Bi-partite Betweenness Centrality 6 as a measure of the signal flow from key drivers to peripheral genes (genes with only one edge) 6 . We evaluated this measure for DC genes and remaining DEGs and observed that DC genes had much higher values than other genes in the network. Figure 4B illustrates a comparison of boxplots of bi-partite betweenness centrality between these two groups concerning DCPs and the rest (non DCPs, non-key drivers, nonperipheral). We can observe that the bi-partite betweenness Figure 3. Co-expression networks for cervical cancer data. The nodes are composed by DEGs and the edges represent significant local partial correlation between nodes. A few causal genes (key drivers) and DCP edges are located in the high connectivity region, but scattered throughout the network. Only one key driver is amongst the genes in DCPs.  centralities of DCPs are concentrated in higher values than the rest. Mann-Whitney test gave us a p-value of 7.868 X 10 -5 , which gives us evidence that the distribution of Bi-Partite Betweenness Centrality in DCP genes is higher. For more details see Figure S2. Thus, altogether these results suggest that DC genes might be "bottlenecks", that is, required to transmit a signal from key drivers to other genes in the network, therefore, supplement the hypothesis of a regulatory role of DC genes ( Figure S1).

Knockdown experiments.
In addition, data from other cancers provide indirect support for the idea of regulatory role of DC genes in cervical cancer 31-48 . However, since a role of these DC genes in carcinogenesis was not as straightforward as for immunoglobulin genes in B cell deficiency, we decided to perform experimental tests. Among the DC genes found for cervical cancer, there were seven up-regulated and seven down-regulated in cancer.
Therefore, for validation experiments we chose one down-regulated (FGFR2) and one up-regulated (CACYBP) gene that have not been previously studied in cervical cancer for regulatory properties, but have a potential connection with cell death or proliferation based on their Gene Ontology annotations. In order to test if FGFR2 and CACYBP play critical regulatory roles in cancer pathogenesis, we evaluated the effect on in vitro knockdown of these genes on cell proliferation in a cervical carcinoma cell line.
First, we tested two cervical cancer cell lines (Hela and ME180) and found that only ME180 had detectable expression levels of both genes. In order to perform these tests, we evaluated siRNAs and observed that they were able to knock down expression of both genes by at least 70% ( Figure 5A). CACYBP is up-regulated in tumor tissue, as compared to normal tissue ( Figure 5B). Consequently, if CACYBP has regulatory potential, as predicted by our analysis, it should function as an oncogene promoting cell proliferation. Therefore, the knockdown of this gene should result in a decrease of cell growth/survival. Since FGFR2 was found down-regulated in cervical carcinomas ( Figure 5B) its potential regulatory role would be as a tumor suppressor. Therefore, the knockdown of this gene is expected to increase cell growth. The subsequent analysis of cell proliferation confirmed our predictions for both genes: knockdown of CACYBP led to a decrease of cell growth, while knockdown of FGFR2 induced higher cell proliferation ( Figure 5C). Thus, these results provide additional support to our in silico prediction that DC genes may play a regulatory role in cell proliferation related to tumor growth. Figure 1 http://dx.doi.org/10.5256/f1000research.9708.d14210

Dataset 6. Cytoscape Edges and Nodes tables from network in
The datasets are sufficient to reproduce Figure 4. Raw data for Figure 5A: qRT PCR siRNA test.  Figure 5C: Three xCellingence experiments.

Discussion
In the current study, the differential co-expression analysis 21 was applied to two relatively well-investigated biological systems 28,29 in order to evaluate the potential importance of genes found using differential correlation analyses. Overall, the obtained results support the idea that DC genes play a regulatory role. While in B cell deficiency DCPs were found highly enriched with immunoglobulin genes (i.e. causal genes for alterations in the gut) we did not observe enrichment for key driver genes in cervical cancers. Rather, DCPs of cervical cancer seem to be located downstream of causal genes. Indeed, those DCPs have been found closer to key regulators than other genes in the network, actually representing "bottlenecks" for communication between driver genes previously published in 29 and the rest of the network ( Figure 4). Furthermore, some differentially co-expressed genes in cervical cancer have been previously implicated in processes such as metastasis, oncogenic autophagy and apoptosis. For example, CACYBP has been shown to promote colorectal cancer metastasis 31 , TRPM3 was observed to play a role in oncogenic autophagy in clear cell renal cell carcinoma 32,33 , and AK2 was reported to activate apoptotic pathway 34 . Several genes are investigated for prognostic value for cancers such as myeloma 35 , lymphoma 36 , breast 37-41 and gastrointestinal 42,43 cancers. At least two genes were previously proposed as targets for anti-cancer agents: DHFR 44 and FGFR2 45 . Moreover, CACYBP and ZSCAN18 were also reported as putative tumor suppressor genes in renal cell carcinoma 30,46,47 .
In addition, we have tested two DC genes and confirmed their regulatory role (FGFR2 as a tumor suppressor and CACYBP as a potential oncogene in cervical cancer) by manipulating their expression in vitro. Altogether, published observations and our experimental validation for these two genes support the idea that DC genes revealed in the current study play a regulatory role and can be candidate targets for cervical cancer treatment.
Interestingly, while in the model of B cell deficiency, the DC genes are highly enriched with causal regulatory genes, there was only one key driver in cervical cancer (CEP70), despite the DC genes in this system still seeming to play a regulatory role overall. Such a difference is potentially related to the fact that the mouse system studied in 28 is highly homogeneous (inbred mice) with only one cause of alterations (i.e. absence of B lymphocytes). Cervical cancer, however, is a heterogeneous system with different chromosomal aberrations and consequently turned-on expression of different driver genes in different patients. Therefore, we can speculate that differential correlations point to regulatory genes that are shared by majority of samples. This hypothesis warrants further investigation, especially considering that DCPs could represent common therapeutic targets for tumors that originated as a result of different genomic or epi-genomic events.
In conclusion, this study provided additional evidence for the previously suggested idea 8-27 that genes presenting alterations in correlation patterns between different phenotypes (i.e. states of biological system) play a critical regulatory role in transitions from one state to another. Furthermore, although our results do not allow for full generalization, they indicate that gain and not loss of correlations connects critical genes involved in transitions to new phenotypes. However, further studies are required to understand how changes in correlation patterns can point to genes with critical capacity to guide a biological system into certain state/ phenotype.

Material and methods
Preparation of microarray data BcKO. All microarray data were analyzed using BRB Array-Tools developed by the Biometric Research Branch of the National Cancer Institute under the direction of R. Simon (http://linus.nci. nih.gov/BRB-ArrayTools.html). Array data were filtered to limit analysis to probes with greater than 50% of samples showing spot intensities of >10 and spot sizes >10 pixels, and a median normalization was applied.
Cervical cancer. Same as in cervical cancer 29 . The data were analyzed using BRB Array-Tools using the original normalization used in three studies 50-52 and median normalization over entire the array for the fourth study 53 . For all studies, we only considered genes found in at least 70% of arrays.

Filtering and meta-analysis of microarray data
In every analysis (DEGs, DCPs and networks), filter of direction (same sign of correspondent parameter -difference of mean, difference of correlation, correlation and partial correlation) was required in a fixed number of datasets (2 out of 2 in BcKO and 3 out of 5 in cervical cancer). Then meta-analysis was done through Fisher combined probability test 54 . Next, the pairs with false discovery rate (fdr) 55 lower than a threshold are chosen. At last, only the pairs that pass PUC 56 are considered correlated and therefore represent edges in the network.

Analysis of microarray data
BcKO. DEGs between groups of samples were identified by random variance paired t-test p-value lower than 5% with adjustment for multiple hypotheses by setting the fdr below 10% in BRB Array-Tools. Co-expression networks (BcKO and Control) were inferred through Pearson correlation with p-value < 20% and fdr adjustment below 2.5%. DCPs were calculated for pairs that were initially correlated (p-value < 20%) in at least one state. Then differences of Pearson correlation were tested following 21 with a p-value below 10% and fdr < 2%. At last only the DCPs that showed up in one of the networks were selected.
Cervical cancer. DEGs were retrieved from a cervical cancer paper 29 . Correlation networks and DCPs followed the same procedure and in BcKO but with different p-values (correlation p-value < 10% with fdr < 10 -8 and difference of correlation p-value < 10% with fdr < 0.25%). Partial correlation was computed using local partial correlation method 30 . The initial significance was p-value lower than 40% and then fdr < 5%.
For more details about the thresholds used, see Table S3 and  Table S4.
Local partial correlation network Two aspects of cervical cancer data motivated us to use local partial correlation for this system. First of all, we have more samples throughout five datasets (see Supplementary Table S1 and Supplementary Table S2) which allows us to have more confidence in our results and second we already know that tumors in general present heterogeneous causal factors. The partial correlation approach gives us the alternative to only consider edges that represent direct regulatory relations.
In this paper we used the new approach developed in 30 called local partial correlation. This approach was elaborated specially for cases when there are more variables than samples, which happens regularly in genetics and is a serious problem in classical statistics. First we calculate the correlation network. Then for each significantly correlated pair the inverse method is applied exclusively to the correlation sub-matrix formed only by the closest neighbors of the pair along with the genes forming the pair, Figure 6. If the number of closest neighbors is still higher than the number of samples n, then we decreasingly rank the correlations of the neighbors to either genes in the pair and select the first n/2 neighbors. For each sub-matrix, we only keep the partial correlation value regarding the pair that formed that sub-matrix and then calculate its p-value also based on the sub-matrix. R script for calculation is available in Supplementary Material.
Partial correlations were estimated only for the significant (Pearson) correlations in co-expression network. Thus the same definition of DCPs (by Pearson correlation) can still represent structural changes as long as it remains present in one of the two networks. Figure 3 illustrates the local partial correlation network for cervical cancer using only tumor data. It has 578 connected nodes and 824 edges. Figure 6. Local partial correlation scheme: we calculate the LPC for pair X 2 , X 5 , (red nodes/edge). The neighborhood of this pair is the set of nodes X 3 , X 6 , X 8 , X 9 (black nodes/edges). X 1 , X 4 , X 7 (blue nodes) are significantly correlated with the black nodes (blue edges), but not with the red nodes. Thus the inverse method is applied exclusively to the correlation sub-matrix formed only by the genes X 2 , X 5 , X 3 , X 6 , X 8 , X 9 . In correlation matrices the gray entries are statistically non-significant empirical correlations.

Minimum shortest path
The shortest path is a method that calculates distances between 2 nodes in a network. It consists of the minimum number of edges connecting 2 nodes. In this case we want to know the minimum number of edges connecting one node, either DCP gene or not, to a group of nodes: the key drivers Figure 7. For each gene we calculate the shortest path to all key drivers and get the minimum value. Then we compare the minimum shortest path to key drivers coming from DCP genes and the remaining genes. Figure 4A shows that the minimum shortest path to key drivers tend to be smaller when originated in DCP genes.

Bi-partite betweenness centrality
Betweenness Centrality measures the node's centrality in a network by counting the number of shortest paths from all vertices to all other vertices that pass through that node. A gene with high betweenness centrality has a great influence on the transfer of signal through the network Figure 8.
However we are interested in the signal passing from key drivers throughout the network. For this reason we decided to apply the measure previously developed by our lab 6 called Bi-partite Betweenness Centrality. It measures the amount of shortest path  going from all genes in one group of vertices to all genes in a different group of vertices. In our case, the groups of genes are the key drivers and the peripheral genes (genes connected to only one edge).

Experimental design
FGFR2 and CACYBP knockdown experiment ME180 cells were transfected with FGFR2-, CACYBP-specific siRNA or control siRNA using Lipofectamine RNAiMAX Transfection Reagent. Cell growth rate during 72h after siRNA transfection was measured using xCelligence system as described below.  Table 3.

Evaluation of siRNA efficacy in knocking down
qRT PCR set up: sample was heated to 95°C, followed by 40 cycles of 95°C for 10 sec and 60°C for 30 sec.

Evaluation of cell growth after knock down of gene targets.
CACYBP is up-regulated in tumor tissue, as compared to normal tissue ( Figure 5B). Consequently, if CACYBP has regulatory potential, as predicted by our analysis, it should function as an oncogene promoting cell proliferation. Therefore, the knockdown of this gene should result in a decrease of cell growth/survival. Since FGFR2 was found down-regulated in cervical carcinomas ( Figure 5B) its potential regulatory role would be as a tumor suppressor. Therefore, the knockdown of this gene is expected to increase cell growth.
Cell growth was evaluated using xCelligence system (The RTCA DP Instrument) using manufacturer's protocol. ME180 was cultured in RPMI media with 10% FBS and 1% Penicillin-Streptomycin added. The cells were seeded at density 4000 cells per well (E-Plate 16) in 200 uL of cell culture media.
24 hours after seeding, the experiment was paused for transfecton. Before transfection, 100 uL of media was taken from each well.  Final evaluation of growth was done according to the value of  Inhibition Index: >0 -there is a decrease in growth; 0 -no difference between treated with targeting and treated with non-targeting siRNA; <0 -there is a growth after treating with targeting siRNA.

Data availability
BcKO: Gene expression files containing array data from 28 are available under the GSE23934 superseries in the Gene Expression Omnibus (GEO) data repository. We worked with two groups of samples: B10.A littermates and BALB/C (Table S1).

Cervical cancer:
We have used the same datasets as in previous study 29 available at GEO: GSE7410 50 , GSE6791 51 , GSE7803 52 , GSE9750 53 , GSE26342 29 (Table S21). F1000Research: Dataset 7. Raw data for Figure 5A, C, 10.5256/ f1000research.9708.d142103 63 Author contributions LDT ran the data analysis and DV ran the experimental analysis. LDT, DV, AY and AM conceived the analysis and wrote the paper. NS helped with discussions and revised the first manuscript.

Competing interests
No competing interests were disclosed. The manuscript "Differentially correlated genes in co-expression networks control phenotype transitions" investigates the differentially co-expressed genes in two biological processes, a homogeneous one-causal-factor process (B cell deficiency) and a heterogeneous multi-causal system (cervical cancer). The authors have adopted the Pearson correlation and partial correlation for the inference of networks.

Major revision:
The networks were inferred from local partial correlation method, which is able to identify a linear relationship between two variables X and Y (genes), and this relationship may or may not be mediate by another gene Z. It is not clear why the authors have adopted the Pearson correlation for B cell deficiency analysis and the partial correlation for cervical cancer analysis. Moreover, it would be interesting to highlight the gain obtained by adopting the partial correlation. For instance, what were the relationships inferred with the partial correlation that would not be inferred using Pearson correlation?
Another important issue is that even with partial correlation, only pairwise of relationships are identified. In the study presented at http://dx.doi.org/10.1109/JSTSP.2008.923841, it presents the Intrinsically Multivariate Predictive (IMP) Genes, which are genes that depend on a subset of predictors. How did the authors deal with these IMP genes?
It is not clear how and why the microarray data was filtered. The authors could better describe how the data was filtered and how the parameters were adopted.
The title "Differentially correlated genes in co-expression networks control phenotype transitions" is too rigid leading to the understanding that all correlated genes control the phenotype transitions. I believe that is not true. Authors could provide a more appropriate title.

F1000Research
Overall, I think this study is technically sound and properly executed.
In Figure 2, Ig is underlined as if marked by a spellchecker.
I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
No competing interests were disclosed. Competing Interests: