TReNCo: Topologically associating domain (TAD) aware regulatory network construction

Christopher Bennett; Viren Amin; Daehwan Kim; Murat Can Cobanoglu; Venkat Malladi

doi:10.12688/f1000research.110936.1

Home Browse TReNCo: Topologically associating domain (TAD) aware regulatory network...

ALL Metrics

-

Views

-

Downloads

Get PDF

Get XML

Export

▬

✚

Research Article

TReNCo: Topologically associating domain (TAD) aware regulatory network construction

[version 1; peer review: 2 not approved]

Christopher Bennett¹, Viren Amin², Daehwan Kim¹, Murat Can Cobanoglu¹, Venkat Malladi ¹

Christopher Bennett¹, Viren Amin², [...] Daehwan Kim¹, Murat Can Cobanoglu¹, Venkat Malladi ¹

PUBLISHED 14 Apr 2022

Author details Author details

¹ Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, Texas, 75390, USA
² Baebies, Durham, NC, USA

Christopher Bennett
Roles: Conceptualization, Formal Analysis, Investigation, Methodology, Software, Writing – Original Draft Preparation, Writing – Review & Editing

Viren Amin
Roles: Conceptualization, Investigation, Methodology, Software, Writing – Original Draft Preparation, Writing – Review & Editing

Daehwan Kim
Roles: Supervision, Writing – Original Draft Preparation, Writing – Review & Editing

Murat Can Cobanoglu
Roles: Conceptualization, Methodology, Supervision, Writing – Original Draft Preparation, Writing – Review & Editing

Venkat Malladi
Roles: Conceptualization, Investigation, Methodology, Project Administration, Software, Writing – Original Draft Preparation, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

This article is included in the Cell & Molecular Biology gateway.

This article is included in the Bioinformatics gateway.

Abstract

Introduction: There has long been a desire to understand, describe, and model gene regulatory networks controlling numerous biologically meaningful processes like differentiation. Despite many notable improvements to models over the years, many models do not accurately capture subtle biological and chemical characteristics of the cell such as high-order chromatin domains of the chromosomes.
Methods: Topologically Associated Domains (TAD) are one of these genomic regions that are enriched for contacts within themselves. Here we present TAD-aware Regulatory Network Construction or TReNCo, a memory-lean method utilizing epigenetic marks of enhancer and promoter activity, and gene expression to create context-specific transcription factor-gene regulatory networks. TReNCo utilizes common assays, ChIP-seq, RNA-seq, and TAD boundaries as a hard cutoff, instead of distance based, to efficiently create context-specific TF-gene regulatory networks.
Results: We used TReNCo to define the enhancer landscape and identify transcription factors (TFs) that drive the cardiac development of the mouse.
Conclusion: Our results show that we are able to build specialized adjacency regulatory network graphs containing biologically relevant connections and time dependent dynamics.

Keywords

GRN; gene regulatory network; epigenomics; TAD

Corresponding author: Venkat Malladi

Competing interests: No competing interests were disclosed.

Grant information: This work was supported in part by the National Institute of General Medical Sciences (NIH) under grants R01-GM135341 and by the Cancer Prevention Research Institute of Texas (CPRIT) under grant RR170068 to D.K and for V.S.M provided by Cancer Prevention and Research Institute of Texas (RP150596). All authors read and approved the final manuscript.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright: © 2022 Bennett C et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Bennett C, Amin V, Kim D et al. TReNCo: Topologically associating domain (TAD) aware regulatory network construction [version 1; peer review: 2 not approved]. F1000Research 2022, 11:426 (https://doi.org/10.12688/f1000research.110936.1) First published: 14 Apr 2022, 11:426 (https://doi.org/10.12688/f1000research.110936.1) Latest published: 14 Apr 2022, 11:426 (https://doi.org/10.12688/f1000research.110936.1)

Author summary

The regulation of genes is the basis of all biological processes. Gene regulatory networks (GRN) are powerful tools for understanding complex biological connections between interacting genetic elements. They have proven invaluable in understanding driving forces in normal development and differentiation of cells and in understanding the factors involved in cancer progression. Despite the improvements in network construction, we lack a comprehensive method for constructing highly personalized networks to the genetic background of the cells used and validating the findings. Furthermore, as the complexity of the network graphs increases, we need more computationally efficient ways of representing and exploring the networks. To this end, we developed a method which utilizes epigenetic marks of enhancer/promoter activity, gene expression, and Transcriptionally Active Domains (TADs) to rapidly construct accurate context-specific GRN. Our results show that we are able to expand current methods for generating regulatory networks to take advantage of transcriptionally active chromatin domains. The networks produced in this way contain an expanded set of potentially relevant biological connections that can be explored. We believe this method opens the possibility to understand deeper connections and new possibilities for biological discovery.

Introduction

It is of critical importance to understand, model, and describe gene regulatory networks (GRN) that control diverse cellular functions of interest like those that drive differentiation or transitions from one development stage to another (Lee et al. 2002; DeRisi et al. 1997; Goode et al. 2016). With the advent of next-generation sequencing technologies, it is now commonplace to reconstruct these networks to connect transcription factors (TFs) to the genes they regulate (Karlebach and Shamir 2008). One classic method is integration of cis-regulatory elements, like enhancers, and gene expression via matrix factorization to form network graphs between genes and TFs (Marbach et al. 2016). Generally, this is done using Chromatin Immunoprecipitation (ChIP) for H3K27ac to identify enhancers and RNA-seq to identify controlled genes. In many cases connections are determined through perturbations in upstream components like TFs and observing resultant changes in downstream expression levels (Gasperini et al. 2019). This method works exceptionally well for certain classes of TF and for closely linked enhancer-gene interactions. However, it commonly uses arbitrary length cut offs to prevent enhancers from erroneously influencing genes in distant parts of the genome. This can lead to enhancers having shorter or broader ranges of influence than what occurs biologically. As many recent chromosome-confirmation-capture (e.g. 5C, Hi-C and ChIA-PET) experiments have shown, there can be very broad and dynamic interactions made between different parts of a chromosome (Branco and Pombo 2007; McCord et al. 2020). Thus, it is more relevant to dynamically limit enhancers range of influence to only the topologically linked portions of the genome an enhancer is confined to, also known as Topologically Associated Domains (TADs). These regions are highly conserved across cell types and are known to limit the influence of cis-regulatory elements by physically separating them (Beagan and Phillips-Cremins 2020). Thus, it is critical that these cutoffs are included in the model to fully represent and capture the true biological processes occurring.

Here we present TAD-aware Regulatory Network Construction or TReNCo, a powerful, memory efficient tool for constructing regulatory networks from enhancer, promoter, and gene expression data without the need for perturbations. We designed TReNCo to construct a graph of interaction weights between TFs and the genes that they control using TAD boundaries to dynamically limit the range of enhancer influence. We utilize dynamic programming to factor matrices within TADs and combine network into a full adjacency matrix for a regulatory graph. With this method, we are able to capture biologically relevant interactions between known TFs and their gene targets. We show that this network contains many subtle interactions that could be a treasure trove of novel or uncharacterized interactions. We believe this method opens the possibility for understanding deeper mechanistic connections and new possibilities for identifying biological targets for drug discovery.

Methods

Data preprocessing

TReNCo begins by generating distinct transcription start sites (TSSs), using a parsing tool, MakeGencodeTSS, for protein-coding genes from Gencode annotation files: Mouse Gencode version 4 by default. Promoters are then constructed using bedtools (2.29) slop (Quinlan and Hall 2010) to extend 1000 nucleotides upstream and 200 nucleotides downstream of the TSS in a strand specific extension. Enhancer boundaries are then generated by using bedops (2.4) merge (Neph et al. 2012) to merge the user defined H3K27ac ChIP peak bed files and excluding overlaps with promoter regions.

A transcript expression matrix with normalized log₂ transcripts/fragments per kilobase million (TPM) is generated from the provided RNA-seq expression tables with each row corresponding to a gene and the columns corresponding to a sample. The same is done for enhancers with bedtools coverage (Quinlan and Hall 2010) being used to calculate the coverage for each enhancer for each sample using the enhancer regions defined previously.

Log-odds ratio for TF (TF) binding to promoters and enhancers is calculated with MEME-suite software, FIMO (Grant et al. 2011), using cis-bp motif database on promoter and enhancer sequences extracted from the genome, default mm10 (GRCm38), using bedtools getfasta (Quinlan and Hall 2010) and the bed files generated previously. TF matrices are constructed by reformatting the native output of FIMO and converting TF names to gene symbols.

ENCODE data

Gene expression and H3K27ac ChIP data was selected from ENCODE. Data was chosen using a script to select mouse heart data corresponding to embryonic day 10.5, 11.5, 12.5, 13.5, 14.5, 15.5, 16.5, postnatal day 0 and 8 weeks old and had a matched set of gene expression and ChIP-seq. In total every sample had at least technical duplicates with embryonic day 14.5, postnatal day 0, and 8 weeks time points having 4 replicates of gene expression and embryonic day 14.5 having 4 ChIP-seq replicates. Two of the embryonic day 14.5 gene expression data were dropped due to poor correlation (average R2 less than 0.7) with the rest of the data set (S1 Table, for accession numbers).

Statistical analysis and data visualization

Plots and graphs were built using seaborn for python and ggplot2 for R scripts. All heatmaps were built in python and analyzed with scikit-learn. Gene networks from previous studies were downloaded from the corresponding journals and converted to a list. Genes networks for Gata4, Srf, Mef2a, and Nfx2-5 were subset from our networks and targets for these TFs were converted to lists. Venn-diagrams of gene list overlaps were built in python using venn2 package. Analysis of GO-terms was performed in R using enrichGO in clusterProfiler with org. Mm.eg.db database. Plots for GO-terms were built using the built in dotplot function in clusterProfiler.

Results and discussion

Model design and function

We designed TReNCo utilizing a previously reported core matrix factorization method with a distance-based scoring system broken down into subunits based on TAD boundaries (Marbach et al. 2016; Cuellar-Partida et al. 2012). In brief, our algorithm uses normalized gene expression count tables from RNA-seq (tsv files) and H3K27ac ChIP-seq read alignments (bam files), peaks (bed files), and TAD boundaries (a bed file) (Figure 1). Though the source of these data can vary, we designed TReNCo with ENCODE uniform processing pipelines in mind. We first generated initial expression matrices for gene expression (G) and enhancer expression (E) by sample. This was accomplished using the count tables from RNA-seq and building count tables for all enhancers merged into non-overlapping segments from the ChIP-seq data. These counts were used to calculate the TPM which are then log-scaled. A key file, provided by the user, is used to link related files to build a full expression matrix and, secondarily, serves to reduce memory usage by allowing batch processing of data.

Figure 1. TReNCo model.

A) Diagram with required inputs, gene expression, enhancers, and TAD boundaries input into a model using the basic equations shown leading to a gene interaction graph result. ‘+’ and ‘x’ indicate standard matrix additions and multiplications, respectively. The other operations such as ⊙, log, square root, and arctan are all element-wise. B) Pseudo-code for constructing a Gene Interaction Graph.

We next worked to establish TF-gene linkages by identifying TF binding sites in promoters and enhancers using a program, FIMO (a part of the MEME software suite and report the log-odds score of TF binding) a major weight needed for establishing interaction (Grant et al. 2011). We designed a simple pipeline to generate promoter and enhancer master bed files and remove any potential overlaps between promoters and enhancers to ensure that TFs are not double counted to a gene. Furthermore, these files contained the union of all promoters and enhancers between the samples in order to streamline the identification of TFs. This was a critical time saving step as FIMO cannot be multithreaded and can take upward of 24 hours to run. By using master files, we could run FIMO only once per process leading to a huge performance boost.

With these core datasets, for each sample we were able to select sample-specific genes, enhancers, and TF interactions. To ensure proper TAD boundaries were followed and to improve speeds through multithreading, we designed a dynamic programming algorithm to process these datasets by TADs and generate TAD-specific distance weight matrices (D^t) for each set. These matrix subsets were factored with the square-root of a TAD-specific interaction matrix produced via vector multiplication between the gene (g^k,t) and enhancer (e^k,t) expression profiles resulting in a TAD-weight matrix (W^k,t). To generate an enhancer-specific graph edges, the weight matrix was factored with the TAD-specific enhancer-TF by gene matrix (M^t) normalized to the maximum value of the matrix. This was done to set a standard scale of log-odds that was comparable between enhancers and promoters. We designed this component with the assumption that enhancer-TF binding should be similar in promoters and should be weighted the as a log-odds scale in the network. A promoter-TF by gene specific subnetwork (P_t) was produced in a similar manner as the enhancer-specific network with weighing done using a TAD-specific gene expression vector since all distances between promoters and genes are 1. Arctangents were applied to both matrices due to the properties of the transformation where larger values approach an asymptote of π/2 while smaller values are approximately scaled linearly. This scaling draws larger value outliers into a tighter range without heavily influencing lower values and assumes a maximum impact a TF can have on a gene. The resulting matrices were added together and further weighted by normalized TF gene expression to lower the influence of lowly expressed TFs while minimally changing the effects from highly expressed TFs. The resulting TAD subgraphs were concatenated together into a full network adjacency graph matrix. Since this was a sparse matrix, TReNCo represents it as an adjacency array allowing us to store the information in much less space than is needed for a matrix.

Model validation

To validate the model, we used the extensive cohort of matched gene expression and H3K27ac ChIP-seq analyses in ENCODE (Davis et al. 2018) (S1 Table) and used TAD boundaries generated from HI-C data from Mouse ES (Gorkin et al. 2020; Dixon et al. 2012). We decided to use mouse heart data due to the abundance of well correlated time point data spanning embryonic day 10.5 to 8 weeks after birth, highly characterized heart developmental processes, and the availability of previously documented TF-gene networks (Akerberg et al. 2019; Schlesinger et al. 2011) (Figure 2). While the ChIP-seq data is not highly correlated across all the sample types, the gene expression data has an R-squared of at least 0.7 between different biological samples. One e14.5 experiment set had an average R-squared of approximately 0.6 with all other biological samples. To remove this potentially problematic dataset in this analysis before the larger more computationally expensive processes occur, we added an optional soft filter in TReNCo to automatically remove any samples with an average R-squared less than 0.7 across all samples, for gene expression data. We were left with a set of highly correlated data that led us to conclude that this dataset was sufficient to use to TReNCo.

Figure 2. ENCODE Datasets.

A) Timeline of basic mouse cardiovascular development with life stage on top and developmental stages on the bottom; B) Number of samples for each timepoint and data type; C) Correlation heatmap between samples and replicates at each time point.

Previous studies of the mouse heart have identified Gata4, Mef2a, Nkx2-5, Tbx5, and Srf as important embryonic lethal TFs critical for development (Gittenberger-De Groot et al. 2005). When looking at the distribution of these TFs over time, we observed that there are many subtle dynamics in how the TFs’ weights shift. Gata4, Mef2a, Tbx5, and Nkx2-5 show a multimodal distribution with three major peaks and varying differences between time points though mostly the distributions overlapped (Figure 3A, S2 Figure, S3 Figure, S4 Figure). We found that the weight distribution followed a similar trend; the dominant population of edge weights appears less than 0.1, a second mid population is between 0.1 and 0.3, and a final population above 0.3 that stretches up to 1. Another TF, Foxs1, demonstrated a more pronounced time point dependent change in addition to a tri-modal edge score (Figure 3B). Interestingly, Srf did not show this trend and tended to have lower weight edges throughout the distributions. To visualize the timepoint dynamics more clearly, we generated a heatmap of the distributions with inflection points added to determine changes in gene weights that may occur (Figure 3C and D). Inflection points, in this case, are simple differences in weights between each time point and the previous time point. These data are ideal for highlighting changes between each time point. An additional differential heatmap of all weight differences with respect to the embryonic day 10.5 point was generated to visualize change from a central time (S1 Figure). It was clear that these TF’s have time dependent dynamics in our model. Gata4, Nkx2-5, and Tbx5 appear to interact with most of their targets constantly throughout early development as indicated by a mostly yellow (no change) inflection point heat map until adult heart. These TF’s have been shown to be important in normal cardiac development (Misra et al. 2014) and act as potential as cardiac reprogramming factors from embryonic fibroblast (Hashimoto et al. 2019). At this time, we observed a net decrease in the Gata4 network weights as observed by an increase in negative inflection points and a decrease in positive values. Mef2a showed a similar trend as Gata4 with a minor increase in network weights leading up to birth, which has been previously shown to be important in postnatal heart development and regulation (Desjardins and Naya 2016). Srf shows a different trend with most of the weights being relatively low until P0 where there is a minor but noticeable uptick in the network weights. This observation matches the biological importance of Srf in early cardiac development and its critical role in maintaining adult heart function (Mokalled et al. 2015). Foxs1 demonstrates the most profound change over time with the initial weights being very low and increasing over time until embryonic day 16.5. After this time the weights begin to decrease into adulthood but never go away completely. This may be due to the role of Foxs1 as a key factor in vascular development (De Val 2011) which in important in earlier development.

Figure 3. TF-gene edge weights.

A) Histograms of TF-gene interaction weights for 5 different genes, separated by developmental time points; B) Histogram as in A, for Foxs1 TF separated into individual developmental time point plots; C) Heatmap of TF-gene interaction weights sorted by time points; D) Heatmap as in C, showing gene inflection points calculated by log2 the ratio of gene weights. Green indicates increase gene weight from the last time point while Red indicates a decrease.

Model comparison

There have been a number of studies on mouse cardiac TF regulatory networks with one study looking at the regulatory networks of Gata4, Mef2a, Nkx2-5, and Srf and providing the interactions they identified (Schlesinger et al. 2011). We extracted the interactions of the aforementioned TFs from our network and compared it with the previously identified interactors (Figure 4A and S8). We found that our networks contain over 10,000 putative interactions (weight edge weight greater than 0) that were not reported previously. Interestingly, regardless of the timepoint, our networks captured only about 63% of Gata4 targets, 61.5% of Mef2a targets, and 57% Nkx2-5 and Srf targets of the previous network’s interactions leaving a large portion of their networks unique to their analysis (Figure 4B). We speculate there are two likely explanations for the absence of a 100% overlap: 1) the previous network established interactions using the canonical distance-based cutoff leading to some genes being added or removed erroneously if cutoffs differed from our TAD boundaries; or 2) while our TAD boundaries are more accurate than distance-based cutoffs, the TADs we use are not fully representative of cardiac specific TADs leading to loss of some connections in our network. Regardless of the reason, we wanted to understand if the main overlap between our networks was due to the previous study finding the strongest interactors of the TFs. To test this, we performed the Kolmogorov–Smirnov test (KS-test) on the cumulative distribution between the overlap edge weights and the full edge weights (Figure 4C and S2 Table). We found that in all time points the overlapping genes identified have higher mean edge weights than the total (Figure 4D). This implies that we are identifying true strongly interacting targets and a broad set of possible true but weakly interacting targets.

Figure 4. Model comparison.

A) TF-gene interaction Venn Diagram overlap between TReNCo model and previous study; B) Line plot showing percent TF-gene interactions captured from a previous study with the TReNCo model; C) CDF plot with weight of overlapping interactions vs background in TReNCo model; D) Bar plot of mean TF-gene weights for Specific/overlapping interactions (red), all other background interactons (green) and all interactions (blue).

To further support the biological relevance of our networks, we selected the full TF network, the overlapping network, and the connections unique to our network, and performed GO-term enrichment analysis (Figure 5, S9 Figure). We see that in the case of Mef2a, there is a similar core of regulatory processes that are maintained from 10.5e and 8w (Figure 5). Of interest, we found that in younger 10.5e hearts there was significant enrichment for development related genes as opposed to older 8w hearts which had T cell activation terms enriched. This makes sense when considering recent studies showing Mef2a involvement in inflammation-related processes and the interaction of T cell activation and inflammation (Skapenko et al. 2005; Xiong et al. 2019). Furthermore, we found that overlapping targets between our data and previous data contained terms enriched for cardiac development while full and unique networks showed enrichment for cardiac related and general biological terms. Thus, it is reasonable to conclude that our network contains true biologically relevant interactions in cardiac tissue throughout development.

Figure 5. GO-term enrichment.

Enrichment for gene terms between embryonic day 10.5 and 8 week old adult heart. Highlighted terms show first difference in term list.

Model discussion

The TReNCo model is powered by explicitly modeling the enhancer landscape for each cell type, detecting enhancer activity changes and TF-enhancer relationships across all cell types, limited by TAD boundaries. Our results show that this method can identify cell type–specific TFs that are biologically relevant while also providing potential candidates for further biological study and validation.

In addition, the model enables further analysis of key TFs using a limited amount of data. This allows the model the be easily be applied to any organisms, tissues and cell types with at least H3K27ac ChIP-seq and RNA-seq data and TAD boundaries. We recognize that H3K27ac is not the only available enhancer mark and feel that other marks (e.g., H3K4me1, ATAC-seq) (Heintzman et al. 2007; Davie et al. 2015) could be substituted into the model with little change necessary. The integration of additional data types would allow for the extension and increase the versatility of the model, providing increased confidence in the gene regulatory networks. Genomic data indicating open regions of chromatin (e.g., ATAC-seq, DNase-seq) (Crawford et al. 2006; Davie et al. 2015) could be added to the enhancer and promoter signal (Figure 1) and help increase the dynamic range of weight score and provide insight into interactions that are poised in earlier time points or cell types versus true interactions. Additionally, integrating other histone modifications (Heintzman et al. 2007; Rajgopal et al. 2014), may provide a filter for enhancer identification, which would reduce the false positives. We believe that adding or substituting any of the data described above would allow for greater use and would improve the model, which is all made easier due to the flexibility of the model.

Conclusions

Our results show that we were able to expand current methods for generating regulatory networks to take advantage of TADs to limit the predicted influence of enhancers. In this way, we were able to produce highly similar results as reported previously with the added benefit of the networks containing an expanded set of potentially relevant biological connections that can be explored. Additionally, we have developed a framework that can be exploited for a diverse array of species and cell types requiring only two experimental assays, H3K27ac ChIP-seq and RNA-seq. We believe this method opens the possibility for understanding deeper connections and new possibilities for biological discovery.

Authors’ contributions

C.B., V.A., D.K., M.C., and V.S.M. performed validation analysis and discussed the results of TReNCo. C.B., V.A., M.C., and V.S.M. designed and implemented TReNCo. C.B., V.A., D.K., M.C., and V.S.M. wrote the manuscript.

Data availability

Underlying data

Zenodo: TReNCo: Topologically associating domain (TAD) aware regulatory network construction (extended data). https://doi.org/10.5281/zenodo.6392155

• The ENCODE data are available from ENCODE data portal:
- o S1 Table. ENCODE Data for Validation.

Extended data

Zenodo: TReNCo: Topologically associating domain (TAD) aware regulatory network construction (extended data). https://doi.org/10.5281/zenodo.6392155 (Bennett et al. 2022)

Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).

Software availability

Project name: TReNCo

Project home page: https://git.biohpc.swmed.edu/BICF/Software/trenco

Archived source code at time of publication: https://doi.org/10.5281/zenodo.6394452 (Bennett et al. 2021)

Operating system(s): Linux, Mac OS X and Windows

Programming language: Python

License: MIT

References

Akerberg BN, Gu F, VanDusen NJ, et al.A reference map of murine cardiac transcription factor chromatin occupancy identifies dynamic and conserved enhancers. Nat. Commun. 2019; 10: 4907. (Accessed July 16, 2020). PubMed Abstract | Publisher Full Text Reference Source
Beagan JA, Phillips-Cremins JE. On the existence and functionality of topologically associating domains. Nat. Genet. 2020; 52: 8–16. (Accessed March 24, 2021). PubMed Abstract | Publisher Full Text
Bennett C, Amin V, Kim D, et al.: TReNCo (1.0.0). Zenodo. 2021. Publisher Full Text
Bennett C, Amin V, Kim D, et al.: TReNCo: Topologically associating domain (TAD) aware regulatory network construction (extended data) [Data set]. Zenodo. 2022. Publisher Full Text
Branco MR, Pombo A. Chromosome organization: new facts, new models. Trends Cell Biol. 2007; 17: 127–134. (Accessed March 24, 2021). PubMed Abstract | Publisher Full Text Reference Source
Crawford GE, Holt IE, Whittle J, et al.: Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS). Genome Res. 2006 Jan; 16(1): 123–131. Publisher Full Text | PubMed Abstract | Free Full Text
Cuellar-Partida G, Buske FA, McLeay RC, et al.: Epigenetic priors for identifying active transcription factor binding sites. Bioinformatics. 2012; 28: 56–62. (Accessed September 5, 2019). PubMed Abstract | Publisher Full Text
Davie K, Jacobs J, Atkins M, et al.: Discovery of transcription factors and regulatory regions driving in vivo tumor development by ATAC-seq and FAIRE-seq open chromatin profiling. PLoS Genet. 2015 Feb 13; 11(2): e1004994. PubMed Abstract | Publisher Full Text | Free Full Text
Davis CA, Hitz BC, Sloan CA, et al.The Encyclopedia of DNA elements (ENCODE): Data portal update. Nucleic Acids Res. 2018; 46: D794–D801. (Accessed March 12, 2021). PubMed Abstract | Publisher Full Text Reference Source
De Val S: Key transcriptional regulators of early vascular development. Arterioscler. Thromb. Vasc. Biol. 2011; 31: 1469–1475. (Accessed March 29, 2021). PubMed Abstract
DeRisi JL, Iyer VR, Brown PO: Exploring the metabolic and genetic control of gene expression on a genomic scale. Science (80-). 1997; 278: 680–686. (Accessed March 24, 2021). Reference Source
Desjardins C, Naya F: The Function of the MEF2 Family of Transcription Factors in Cardiac Development, Cardiogenomics, and Direct Reprogramming. J. Cardiovasc. Dev. Dis. 2016; 3: 26. (Accessed March 29, 2021). PubMed Abstract
Dixon JR, Selvaraj S, Yue F, et al.: Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012 Apr 11; 485(7398): 376–380. PubMed Abstract | Publisher Full Text | Free Full Text
Gasperini M, Hill AJ, McFaline-Figueroa JL, et al.A Genome-wide Framework for Mapping Gene Regulation via Cellular Genetic Screens. Cell. 2019; 176: 377–390.e19. (Accessed November 11, 2019). PubMed Abstract | Publisher Full Text Reference Source
Gittenberger-De Groot AC, Bartelings MM, Deruiter MC, et al.: Basics of cardiac development for the understanding of congenital heart malformations. Pediatr. Res. 2005; 57: 169–176. (Accessed March 24, 2021). PubMed Abstract | Publisher Full Text Reference Source
Goode DK, Obier N, Vijayabaskar MS, et al.: Dynamic Gene Regulatory Networks Drive Hematopoietic Specification and Differentiation. Dev. Cell. 2016; 36: 572–587. PubMed Abstract | Publisher Full Text
Gorkin DU, Barozzi I, Zhao Y, et al.: An atlas of dynamic chromatin landscapes in mouse fetal development. Nature. 2020 Jul; 583(7818): 744–751. Erratum in: Nature. 2020 Oct; 586(7831): E31. Erratum in: Nature. 2021 Jan; 589(7842): E4. PubMed Abstract | Publisher Full Text | Free Full Text
Grant CE, Bailey TL, Noble WS: FIMO: scanning for occurrences of a given motif. Bioinformatics. 2011; 27: 1017–1018. (Accessed June 18, 2019). PubMed Abstract
Hashimoto H, Wang Z, Garry GA, et al.: Cardiac Reprogramming Factors Synergistically Activate Genome-wide Cardiogenic Stage-Specific Enhancers. Cell Stem Cell. 2019; 25: 69–86.e5. (Accessed March 29, 2021). PubMed Abstract
Heintzman ND, Stuart RK, Hon G, et al.: Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat. Genet. 2007 Mar; 39(3): 311–8. Epub 2007 Feb 4. PubMed Abstract | Publisher Full Text
Karlebach G, Shamir R: Modelling and analysis of gene regulatory networks. Nat. Rev. Mol. Cell Biol. 2008; 9: 770–780. (Accessed March 24, 2021). Publisher Full Text Reference Source
Lee TI, Rinaldi NJ, Robert F, et al.: Transcriptional regulatory networks in Saccharomyces cerevisiae. Science (80-). 2002; 298: 799–804. (Accessed March 24, 2021). PubMed Abstract | Publisher Full Text Reference Source
Marbach D, Lamparter D, Quon G, et al.: Tissue-specific regulatory circuits reveal variable modular perturbations across complex diseases. Nat. Methods. 2016; 13: 366–370. (Accessed September 5, 2019). Reference Source
McCord RP, Kaplan N, Giorgetti L: Chromosome Conformation Capture and Beyond: Toward an Integrative View of Chromosome Structure and Function. Mol. Cell. 2020; 77: 688–708. PubMed Abstract | Publisher Full Text
Misra C, Chang SW, Basu M, et al.: Disruption of myocardial Gata4 and Tbx5 results in defects in cardiomyocyte proliferation and atrioventricular septation. Hum. Mol. Genet. 2014; 23: 5025–5035. (Accessed March 29, 2021). PubMed Abstract
Mokalled MH, Carroll KJ, Cenik BK, et al.: Myocardin-related transcription factors are required for cardiac development and function. Dev. Biol. 2015; 406: 109–116. (Accessed March 29, 2021). PubMed Abstract
Neph S, Kuehn MS, Reynolds AP, et al.: BEDOPS: high-performance genomic feature operations. Bioinformatics. 2012; 28(14): 1919–1920. PubMed Abstract | Publisher Full Text | Free Full Text
Quinlan AR, Hall IM: BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010; 26(6): 841–842. PubMed Abstract | Publisher Full Text | Free Full Text
Rajagopal N, Ernst J, Ray P, et al.: Distinct and predictive histone lysine acetylation patterns at promoters, enhancers, and gene bodies. G3 (Bethesda). 2014 Aug 12; 4(11): 2051–2063. PubMed Abstract | Publisher Full Text | Free Full Text
Schlesinger J, Schueler M, Grunert M, et al.: The Cardiac Transcription Network Modulated by Gata4, Mef2a, Nkx2.5, Srf, Histone Modifications, and MicroRNAs ed. D. Schübeler. PLoS Genet. 2011; 7: e1001313. (Accessed December 12, 2019). PubMed Abstract | Publisher Full Text
Skapenko A, Leipe J, Lipsky PE, et al.: The role of the T cell in autoimmune inflammation. Arthritis Res. Ther. 2005; 7 Suppl 2: S4–14. (Accessed April 21, 2021). PubMed Abstract | Publisher Full Text
Xiong Y, Wang L, Jiang W, et al.: MEF2A alters the proliferation, inflammation-related gene expression profiles and its silencing induces cellular senescence in human coronary endothelial cells. BMC Mol. Biol. 2019; 20: 8. (Accessed April 21, 2021). PubMed Abstract | Publisher Full Text

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 14 Apr 2022

Author details Author details

¹ Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, Texas, 75390, USA
² Baebies, Durham, NC, USA

Christopher Bennett
Roles: Conceptualization, Formal Analysis, Investigation, Methodology, Software, Writing – Original Draft Preparation, Writing – Review & Editing

Viren Amin
Roles: Conceptualization, Investigation, Methodology, Software, Writing – Original Draft Preparation, Writing – Review & Editing

Daehwan Kim
Roles: Supervision, Writing – Original Draft Preparation, Writing – Review & Editing

Murat Can Cobanoglu
Roles: Conceptualization, Methodology, Supervision, Writing – Original Draft Preparation, Writing – Review & Editing

Venkat Malladi
Roles: Conceptualization, Investigation, Methodology, Project Administration, Software, Writing – Original Draft Preparation, Writing – Review & Editing

Competing interests

No competing interests were disclosed.

Grant information

This work was supported in part by the National Institute of General Medical Sciences (NIH) under grants R01-GM135341 and by the Cancer Prevention Research Institute of Texas (CPRIT) under grant RR170068 to D.K and for V.S.M provided by Cancer Prevention and Research Institute of Texas (RP150596). All authors read and approved the final manuscript.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Article Versions (1)

version 1

Published: 14 Apr 2022, 11:426

https://doi.org/10.12688/f1000research.110936.1

Copyright

© 2022 Bennett C et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

0

SEE MORE DETAILS

CITE

how to cite this article

Bennett C, Amin V, Kim D et al. TReNCo: Topologically associating domain (TAD) aware regulatory network construction [version 1; peer review: 2 not approved]. F1000Research 2022, 11:426 (https://doi.org/10.12688/f1000research.110936.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Version 1

VERSION 1

PUBLISHED 14 Apr 2022

Views

5

Reviewer Report 12 Oct 2023

Sonika Tyagi, Department of Infectious Diseases, The Alfred Hospital and Central Clinical School, Monash University, & School of Computing Technologies, RMIT University, Melbourne, Australia

Not Approved

https://doi.org/10.5256/f1000research.122597.r181405

The authors present a workflow called TAD-aware Regulatory Network Construction or TreNCo that combines gene expression data, histone capture of Enhances/promoters (ChIP-seq) to TAD boundaries information generated at genome scale- claimed to build GRNs. RNA-seq and ChIP-seq data was matched ... Continue reading

The authors present a workflow called TAD-aware Regulatory Network Construction or TreNCo that combines gene expression data, histone capture of Enhances/promoters (ChIP-seq) to TAD boundaries information generated at genome scale- claimed to build GRNs. RNA-seq and ChIP-seq data was matched and separately available genomic TAD data was used.

Existing approaches to define ehancer and promoter distances are ad-hoc. Recently, availability of genome wide interaction data using assays like HiC are a more accurate substitute to include dynamic enhancer/promoter interactions with transcriptional signals to study biological networks. Therefore, the authors fill a significant gap in the field by attempting to streamline this process. However, it needs to be noted that TADs may change depending on the development stage of the cells. Hence, using matched HiC data is ideal when performing given integrative analyses. Authors should discuss this in the manuscript.

Known gene annotations are used identify TSS and promoters. Histone marks data was used to define enhancer boundaries and coverage. Similarly, RNAseq data yield the gene expression table. TF motifs were located on promoter and ehancer sequences using MEME suite.

Major comments:

In the methods section versions of software and databases used are missing, this is required for reproducibility of analysis.
The workflow has been tested only on 4 heart development TFs. It is also not clear how the TF weights were calculated as shown in the Figure 3-C.
Authors should justify why capture data for TFs was not used instead of making predictions themselves.
How many genes are we looking at here after merging data for TFs and Enhancers?
The low overlap of authors results with previously published data indicates the limitation of using static TAD data that’s not matched. This should be emphasised when discussing the limitations of the approach in the discussion section. Could authors not find matched RNAseq, ChIPSeq and HiC data? That would be a good way to confirm these speculations of low mapping of interactions predicted by trenco pipeline.
The title of the paper suggests that the method is to build network at the end. But authors are only generating a list of genes that is then mapped to find pathway enrichments.
The authors have used GO enrichment analysis to associate predicted interactions to biological function. I would highly recommend presenting a GSEA analysis here since multiple genes may be linked to a GO category and usually genes within a TAD domain tend to be co-regulated.
The authors have provided the source code via Git and instructions to do a conda install of the workflow. MIT user license applies. This page can benefit by demonstrating a step by step analysis of an example data. I could not test the software myself as the conda env set up step resulted in an error ('Solving environment: failed') and requires further debugging.
Authors should include information on how automated the pipeline is. Some guidance must be provided to prepare the mapping files as described in the methods section.
Authors have pointed out that other type of regulatory element capture data can be included in the workflow- I think this will be an important extension to the tool to provide a comprehensive regulatory integration approach. ATAC data for example, is more readily available than HiC data.
How easy is it to include a new assay data in this integrative approach?
Figure resolutions are poor.

Minor

Its much easier to provide feedback with line numbers in the manuscript.
In the author summary:Transcriptionally Active Domains (TADs) should be “Topologically associated domains”.
In 2023, the use of term “next gen sequencing” doesn’t really hold true. We are now past the third generation of sequencing. I would recommend the term “high throughput sequencing” instead.
This sentence has an extra “the” in it: “We designed this component with the assumption that enhancer-TF binding should be similar in promoters and should be weighted the as a log-odds scale in the network”

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Partly
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

Partly

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Bioinformatics, Data Science, Machine Learning

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

CITE

Report a concern

Respond or Comment

Views

18

Reviewer Report 05 Aug 2022

Junbai Wang, Department of Clinical Molecular Biology, Institute of Clinical Medicine, University of Oslo, Oslo, Norway

Not Approved

https://doi.org/10.5256/f1000research.122597.r136507

Bennett et al. proposed a method to build gene regulatory networks by integrating gene expression profiles (RNA-seq), enhancer marker (H3K27ac, ChIP-seq), and TAD boundary information. The authors tested the method on mouse data by using public data in ENCODE. The ... Continue reading

Bennett et al. proposed a method to build gene regulatory networks by integrating gene expression profiles (RNA-seq), enhancer marker (H3K27ac, ChIP-seq), and TAD boundary information. The authors tested the method on mouse data by using public data in ENCODE. The authors claim that their method "makes sense" and network target genes have around 60% of overlapping to the previous study.

However, there are four major problems in the manuscript:

The description of their method/algorithm for integrating gene expression, enhancer marker, and TAD boundary information is very poor. It is not clear to me how the final gene regulatory network is built.
The authors used FIMO to predict TF binding on gene promoters, this may not be correct because many TFs share the same binding motifs, how can we distinguish them from such in silico predictions without looking at in vivo TF-DNA binding data such as Chip-seq et al?
The results presentation also confuses me, especially the quality of Figures 3, 4, and 5 are poor and there is very little description of these figures in the main text; therefore, the conclusions are not supported by the results.
Authors have to reorganize their manuscript in a better way that presents the methods, results, and discussion sections clearly in the manuscript for readers to understand it.

Is the work clearly and accurately presented and does it cite the current literature?

No
Is the study design appropriate and is the work technically sound?

No
Are sufficient details of methods and analysis provided to allow replication by others?

No
If applicable, is the statistical analysis and its interpretation appropriate?

Partly
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

Partly

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Bioinformatics in gene regulatory networks

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

CITE

Report a concern

Respond or Comment

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 14 Apr 2022

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2
Version 1 14 Apr 22	read	read

Junbai Wang, University of Oslo, Oslo, Norway
Sonika Tyagi, RMIT University, Melbourne, Australia

Comments on this article

All Comments(0)

Add a comment

Sign up for content alerts

Browse by related subjects

Back to all reports

Reviewer Report

5 Views

12 Oct 2023 | for Version 1

Sonika Tyagi, Department of Infectious Diseases, The Alfred Hospital and Central Clinical School, Monash University, & School of Computing Technologies, RMIT University, Melbourne, Australia

5 Views Cite this report Responses(0)

Not Approved

The authors present a workflow called TAD-aware Regulatory Network Construction or TreNCo that combines gene expression data, histone capture of Enhances/promoters (ChIP-seq) to TAD boundaries information generated at genome scale- claimed to build GRNs. RNA-seq and ChIP-seq data was matched and separately available genomic TAD data was used.

Existing approaches to define ehancer and promoter distances are ad-hoc. Recently, availability of genome wide interaction data using assays like HiC are a more accurate substitute to include dynamic enhancer/promoter interactions with transcriptional signals to study biological networks. Therefore, the authors fill a significant gap in the field by attempting to streamline this process. However, it needs to be noted that TADs may change depending on the development stage of the cells. Hence, using matched HiC data is ideal when performing given integrative analyses. Authors should discuss this in the manuscript.

Known gene annotations are used identify TSS and promoters. Histone marks data was used to define enhancer boundaries and coverage. Similarly, RNAseq data yield the gene expression table. TF motifs were located on promoter and ehancer sequences using MEME suite.

Major comments:

In the methods section versions of software and databases used are missing, this is required for reproducibility of analysis.
The workflow has been tested only on 4 heart development TFs. It is also not clear how the TF weights were calculated as shown in the Figure 3-C.
Authors should justify why capture data for TFs was not used instead of making predictions themselves.
How many genes are we looking at here after merging data for TFs and Enhancers?
The low overlap of authors results with previously published data indicates the limitation of using static TAD data that’s not matched. This should be emphasised when discussing the limitations of the approach in the discussion section. Could authors not find matched RNAseq, ChIPSeq and HiC data? That would be a good way to confirm these speculations of low mapping of interactions predicted by trenco pipeline.
The title of the paper suggests that the method is to build network at the end. But authors are only generating a list of genes that is then mapped to find pathway enrichments.
The authors have used GO enrichment analysis to associate predicted interactions to biological function. I would highly recommend presenting a GSEA analysis here since multiple genes may be linked to a GO category and usually genes within a TAD domain tend to be co-regulated.
The authors have provided the source code via Git and instructions to do a conda install of the workflow. MIT user license applies. This page can benefit by demonstrating a step by step analysis of an example data. I could not test the software myself as the conda env set up step resulted in an error ('Solving environment: failed') and requires further debugging.
Authors should include information on how automated the pipeline is. Some guidance must be provided to prepare the mapping files as described in the methods section.
Authors have pointed out that other type of regulatory element capture data can be included in the workflow- I think this will be an important extension to the tool to provide a comprehensive regulatory integration approach. ATAC data for example, is more readily available than HiC data.
How easy is it to include a new assay data in this integrative approach?
Figure resolutions are poor.

Minor

Its much easier to provide feedback with line numbers in the manuscript.
In the author summary:Transcriptionally Active Domains (TADs) should be “Topologically associated domains”.
In 2023, the use of term “next gen sequencing” doesn’t really hold true. We are now past the third generation of sequencing. I would recommend the term “high throughput sequencing” instead.
This sentence has an extra “the” in it: “We designed this component with the assumption that enhancer-TF binding should be similar in promoters and should be weighted the as a log-odds scale in the network”

Is the work clearly and accurately presented and does it cite the current literature?

Partly
Is the study design appropriate and is the work technically sound?

Partly
Are sufficient details of methods and analysis provided to allow replication by others?

Partly
If applicable, is the statistical analysis and its interpretation appropriate?

Partly
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

Partly

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Bioinformatics, Data Science, Machine Learning

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

18 Views

05 Aug 2022 | for Version 1

Junbai Wang, Department of Clinical Molecular Biology, Institute of Clinical Medicine, University of Oslo, Oslo, Norway

18 Views Cite this report Responses(0)

Not Approved

Bennett et al. proposed a method to build gene regulatory networks by integrating gene expression profiles (RNA-seq), enhancer marker (H3K27ac, ChIP-seq), and TAD boundary information. The authors tested the method on mouse data by using public data in ENCODE. The authors claim that their method "makes sense" and network target genes have around 60% of overlapping to the previous study.

However, there are four major problems in the manuscript:

The description of their method/algorithm for integrating gene expression, enhancer marker, and TAD boundary information is very poor. It is not clear to me how the final gene regulatory network is built.
The authors used FIMO to predict TF binding on gene promoters, this may not be correct because many TFs share the same binding motifs, how can we distinguish them from such in silico predictions without looking at in vivo TF-DNA binding data such as Chip-seq et al?
The results presentation also confuses me, especially the quality of Figures 3, 4, and 5 are poor and there is very little description of these figures in the main text; therefore, the conclusions are not supported by the results.
Authors have to reorganize their manuscript in a better way that presents the methods, results, and discussion sections clearly in the manuscript for readers to understand it.

Is the work clearly and accurately presented and does it cite the current literature?

No
Is the study design appropriate and is the work technically sound?

No
Are sufficient details of methods and analysis provided to allow replication by others?

No
If applicable, is the statistical analysis and its interpretation appropriate?

Partly
Are all the source data underlying the results available to ensure full reproducibility?

Partly
Are the conclusions drawn adequately supported by the results?

Partly

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Bioinformatics in gene regulatory networks

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

Respond to this report

Responses (0)

[1] Akerberg BN, Gu F, VanDusen NJ, et al.A reference map of murine cardiac transcription factor chromatin occupancy identifies dynamic and conserved enhancers. Nat. Commun. 2019; 10: 4907. (Accessed July 16, 2020). PubMed Abstract | Publisher Full Text Reference Source

[2] Beagan JA, Phillips-Cremins JE. On the existence and functionality of topologically associating domains. Nat. Genet. 2020; 52: 8–16. (Accessed March 24, 2021). PubMed Abstract | Publisher Full Text

[3] Bennett C, Amin V, Kim D, et al.: TReNCo (1.0.0). Zenodo. 2021. Publisher Full Text

[4] Bennett C, Amin V, Kim D, et al.: TReNCo: Topologically associating domain (TAD) aware regulatory network construction (extended data) [Data set]. Zenodo. 2022. Publisher Full Text

[5] Branco MR, Pombo A. Chromosome organization: new facts, new models. Trends Cell Biol. 2007; 17: 127–134. (Accessed March 24, 2021). PubMed Abstract | Publisher Full Text Reference Source

[6] Crawford GE, Holt IE, Whittle J, et al.: Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS). Genome Res. 2006 Jan; 16(1): 123–131. Publisher Full Text | PubMed Abstract | Free Full Text

[7] Cuellar-Partida G, Buske FA, McLeay RC, et al.: Epigenetic priors for identifying active transcription factor binding sites. Bioinformatics. 2012; 28: 56–62. (Accessed September 5, 2019). PubMed Abstract | Publisher Full Text

[8] Davie K, Jacobs J, Atkins M, et al.: Discovery of transcription factors and regulatory regions driving in vivo tumor development by ATAC-seq and FAIRE-seq open chromatin profiling. PLoS Genet. 2015 Feb 13; 11(2): e1004994. PubMed Abstract | Publisher Full Text | Free Full Text

[9] Davis CA, Hitz BC, Sloan CA, et al.The Encyclopedia of DNA elements (ENCODE): Data portal update. Nucleic Acids Res. 2018; 46: D794–D801. (Accessed March 12, 2021). PubMed Abstract | Publisher Full Text Reference Source

[10] De Val S: Key transcriptional regulators of early vascular development. Arterioscler. Thromb. Vasc. Biol. 2011; 31: 1469–1475. (Accessed March 29, 2021). PubMed Abstract

[11] DeRisi JL, Iyer VR, Brown PO: Exploring the metabolic and genetic control of gene expression on a genomic scale. Science (80-). 1997; 278: 680–686. (Accessed March 24, 2021). Reference Source

[12] Desjardins C, Naya F: The Function of the MEF2 Family of Transcription Factors in Cardiac Development, Cardiogenomics, and Direct Reprogramming. J. Cardiovasc. Dev. Dis. 2016; 3: 26. (Accessed March 29, 2021). PubMed Abstract

[13] Dixon JR, Selvaraj S, Yue F, et al.: Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012 Apr 11; 485(7398): 376–380. PubMed Abstract | Publisher Full Text | Free Full Text

[14] Gasperini M, Hill AJ, McFaline-Figueroa JL, et al.A Genome-wide Framework for Mapping Gene Regulation via Cellular Genetic Screens. Cell. 2019; 176: 377–390.e19. (Accessed November 11, 2019). PubMed Abstract | Publisher Full Text Reference Source

[15] Gittenberger-De Groot AC, Bartelings MM, Deruiter MC, et al.: Basics of cardiac development for the understanding of congenital heart malformations. Pediatr. Res. 2005; 57: 169–176. (Accessed March 24, 2021). PubMed Abstract | Publisher Full Text Reference Source

[16] Goode DK, Obier N, Vijayabaskar MS, et al.: Dynamic Gene Regulatory Networks Drive Hematopoietic Specification and Differentiation. Dev. Cell. 2016; 36: 572–587. PubMed Abstract | Publisher Full Text

[17] Gorkin DU, Barozzi I, Zhao Y, et al.: An atlas of dynamic chromatin landscapes in mouse fetal development. Nature. 2020 Jul; 583(7818): 744–751. Erratum in: Nature. 2020 Oct; 586(7831): E31. Erratum in: Nature. 2021 Jan; 589(7842): E4. PubMed Abstract | Publisher Full Text | Free Full Text

[18] Grant CE, Bailey TL, Noble WS: FIMO: scanning for occurrences of a given motif. Bioinformatics. 2011; 27: 1017–1018. (Accessed June 18, 2019). PubMed Abstract

[19] Hashimoto H, Wang Z, Garry GA, et al.: Cardiac Reprogramming Factors Synergistically Activate Genome-wide Cardiogenic Stage-Specific Enhancers. Cell Stem Cell. 2019; 25: 69–86.e5. (Accessed March 29, 2021). PubMed Abstract

[20] Heintzman ND, Stuart RK, Hon G, et al.: Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat. Genet. 2007 Mar; 39(3): 311–8. Epub 2007 Feb 4. PubMed Abstract | Publisher Full Text

[21] Karlebach G, Shamir R: Modelling and analysis of gene regulatory networks. Nat. Rev. Mol. Cell Biol. 2008; 9: 770–780. (Accessed March 24, 2021). Publisher Full Text Reference Source

[22] Lee TI, Rinaldi NJ, Robert F, et al.: Transcriptional regulatory networks in Saccharomyces cerevisiae. Science (80-). 2002; 298: 799–804. (Accessed March 24, 2021). PubMed Abstract | Publisher Full Text Reference Source

[23] Marbach D, Lamparter D, Quon G, et al.: Tissue-specific regulatory circuits reveal variable modular perturbations across complex diseases. Nat. Methods. 2016; 13: 366–370. (Accessed September 5, 2019). Reference Source

[24] McCord RP, Kaplan N, Giorgetti L: Chromosome Conformation Capture and Beyond: Toward an Integrative View of Chromosome Structure and Function. Mol. Cell. 2020; 77: 688–708. PubMed Abstract | Publisher Full Text

[25] Misra C, Chang SW, Basu M, et al.: Disruption of myocardial Gata4 and Tbx5 results in defects in cardiomyocyte proliferation and atrioventricular septation. Hum. Mol. Genet. 2014; 23: 5025–5035. (Accessed March 29, 2021). PubMed Abstract

[26] Mokalled MH, Carroll KJ, Cenik BK, et al.: Myocardin-related transcription factors are required for cardiac development and function. Dev. Biol. 2015; 406: 109–116. (Accessed March 29, 2021). PubMed Abstract

[27] Neph S, Kuehn MS, Reynolds AP, et al.: BEDOPS: high-performance genomic feature operations. Bioinformatics. 2012; 28(14): 1919–1920. PubMed Abstract | Publisher Full Text | Free Full Text

[28] Quinlan AR, Hall IM: BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010; 26(6): 841–842. PubMed Abstract | Publisher Full Text | Free Full Text

[29] Rajagopal N, Ernst J, Ray P, et al.: Distinct and predictive histone lysine acetylation patterns at promoters, enhancers, and gene bodies. G3 (Bethesda). 2014 Aug 12; 4(11): 2051–2063. PubMed Abstract | Publisher Full Text | Free Full Text

[30] Schlesinger J, Schueler M, Grunert M, et al.: The Cardiac Transcription Network Modulated by Gata4, Mef2a, Nkx2.5, Srf, Histone Modifications, and MicroRNAs ed. D. Schübeler. PLoS Genet. 2011; 7: e1001313. (Accessed December 12, 2019). PubMed Abstract | Publisher Full Text

[31] Skapenko A, Leipe J, Lipsky PE, et al.: The role of the T cell in autoimmune inflammation. Arthritis Res. Ther. 2005; 7 Suppl 2: S4–14. (Accessed April 21, 2021). PubMed Abstract | Publisher Full Text

[32] Xiong Y, Wang L, Jiang W, et al.: MEF2A alters the proliferation, inflammation-related gene expression profiles and its silencing induces cellular senescence in human coronary endothelial cells. BMC Mol. Biol. 2019; 20: 8. (Accessed April 21, 2021). PubMed Abstract | Publisher Full Text

TReNCo: Topologically associating domain (TAD) aware regulatory network construction

Abstract

Keywords

Author summary

Introduction

Methods

Data preprocessing

ENCODE data

Statistical analysis and data visualization

Results and discussion

Model design and function

Figure 1. TReNCo model.

Model validation

Figure 2. ENCODE Datasets.

Figure 3. TF-gene edge weights.

Model comparison

Figure 4. Model comparison.

Figure 5. GO-term enrichment.

Model discussion

Conclusions

Authors’ contributions

Data availability

Underlying data

Extended data

Software availability

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated