Improving prediction of core transcription factors for cell reprogramming and transdifferentiation

Mikhail Raevskiy; Anna Kondrashina; Yulia Medvedeva

doi:10.12688/f1000research.75321.1

Home Browse Improving prediction of core transcription factors for cell reprogramming...

ALL Metrics

-

Views

-

Downloads

Get PDF

Get XML

Export

▬

✚

Method Article

Improving prediction of core transcription factors for cell reprogramming and transdifferentiation

[version 1; peer review: 2 not approved]

Mikhail Raevskiy ^1,2, Anna Kondrashina¹, Yulia Medvedeva ^1,3,4

PUBLISHED 13 Jan 2022

Author details Author details

¹ Department of Biological and Medical Physics, Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, 141701, Russian Federation
² Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
³ Institute of Bioengineering, Research Center of Biotechnology, Moscow, 119071, Russian Federation
⁴ National Medical Research Center for Endocrinology, Moscow, 115478, Russian Federation

Mikhail Raevskiy
Roles: Formal Analysis, Methodology, Software, Writing – Original Draft Preparation, Writing – Review & Editing

Anna Kondrashina
Roles: Software, Validation, Visualization

Yulia Medvedeva
Roles: Funding Acquisition, Project Administration, Supervision, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

This article is included in the Cell & Molecular Biology gateway.

This article is included in the Bioinformatics gateway.

Abstract

Identification of transcription factors (TFs) that could induce and direct cell conversion remains a challenge. Though several hundreds of TFs are usually transcribed in each cell type, the identity of a cell is controlled and can be achieved through the ectopic overexpression of only a small subset of so-called core TFs. Currently, the experimental identification of the core TFs for a broad spectrum of cell types remains challenging. Computational solutions to this problem would provide a better understanding of the mechanisms controlling cell identity during natural embryonic or malignant development, as well as give a foundation for cell-based therapy. Herein, we propose a computational approach based on over-enrichment of transcription factors binding sites (TFBS) in differentially accessible chromatin regions that could identify the potential core TFs for a variety of primary human cells involved in hematopoiesis. Our approach enables the integration of both transcriptomic (single-cell RNA sequencing, scRNA-seq) and epigenenomic (single-cell assay for transposable-accessible chromatin, scATAC-seq) data at the single-cell resolution to search for core TFs, and can be scalable to predict subsets of core TFs and their role in a given conversion between cells.

Keywords

cell conversion, scRNA-seq, scATAC-seq, transcription factors, epigenetics

Corresponding authors: Mikhail Raevskiy, Yulia Medvedeva

Competing interests: No competing interests were disclosed.

Grant information: This study was supported by Ministry of Science and Higher
Education of the Russian Federation (agreement no.075-
15-2020-899).
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright: © 2022 Raevskiy M et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Raevskiy M, Kondrashina A and Medvedeva Y. Improving prediction of core transcription factors for cell reprogramming and transdifferentiation [version 1; peer review: 2 not approved]. F1000Research 2022, 11:38 (https://doi.org/10.12688/f1000research.75321.1) First published: 13 Jan 2022, 11:38 (https://doi.org/10.12688/f1000research.75321.1) Latest published: 13 Jan 2022, 11:38 (https://doi.org/10.12688/f1000research.75321.1)

Introduction

The cell identity is largely controlled by transcription factors (TFs). TFs regulate gene expression by binding DNA in a sequence-specific manner, targeting short sequences called transcription factor binding sites (TFBS). Although almost half of all TFs are expressed in a particular cell type,²¹ only a minor share of these TFs — so-called core TFs — are sufficient to maintain cell identity by defining the corresponding gene expression programs.¹¹^,²²^,²³ The identification of core TFs for a large number of cell types would be a valuable addition for an atlas of transcription regulators supplementing the Encyclopedia of Regulatory DNA Elements (ENCODE, Ref. 16). Such an atlas, in turn, would facilitate systematic investigation of regulatory networks and contribute to establishing and refining direct cell conversion protocols for clinically relevant cell types.⁶^,⁷

Systematic determination of core TFs controlling individual cell type identity has previously been attempted. Initial efforts were mainly focused on the experimental screening of the TFs, presumably regulating the deferentially expressed genes (DEGs) in the comparison between query cell type, and a small number of alternative cell types that could potentially serve as an initial stage for conversion. Some of these TFs could play a role as regulators controlling cellular identities. For example, studies showed that over-expression of MyoD1 in fibroblasts leads to its conversion into the muscle cells,¹⁹ while inhibition of Oct4 resulted in the suppression of the pluripotent stem cell population during mammalian embryo development.¹² Recent experiments with TF over-expression leading to conversion of cells to another cell type appeared to be used as a stringent test of the potential of specific TFs to establish and maintain cell identity.¹¹^,²²^,²³ Nonetheless, while being illustrative validation for each TF, such experiments are still time- and labor-consuming, and resulting observations are limited to specific cell types.

The growth of genome-wide sequencing technologies allowed to develop computational systems capable of predicting candidate core TFs.²^,⁹^,¹⁴^,¹⁷ However, being broad in scope and easily scalable, these methods infer predictions using preferably only bulk RNA sequencing (RNA-seq) data, which estimates the average gene expression level across a hundred thousands to millions of cells. As a result, they are insufficient for analysis of heterogeneous systems, such as early embryonic populations or complex tissues, including brain or bone marrow.

Here we propose an approach that uses single-cell expression and DNA accessibility data to select core TFs for cell differentiation or directed conversion. A distinct feature of the approach is incorporating not only TFs expression levels in the original and target cell types, but also (1) the chromatin conditions in gene regulatory elements, as well as (2) TF putative binding sites. Thus, this method simultaneously takes into account the accessibility and expression profile of the initial and terminal cell types involved in the conversion. Additionally, our method uses modified gene set enrichment analysis (GSEA)¹⁸ for the selection of core TFs, thus reducing the number of arbitrary thresholds in the pipeline.

Results

To validate our method, we applied it to hematopoietic differentiation datasets,⁵^,¹ since this process has been extensively studied. We provided TFs for the hematopoietic stem cells (HSC) differentiation into CD4(+) cells as an example (Table 1). The detected TFs are critical for the HSC-to-CD4(+) cells differentiation. The top-ranked TF, TCF7, is a transcription activator recruited in T-cell lymphocyte differentiation and is necessary for the survival of immature CD4(+) and CD8(+) thymocytes.¹³^,¹⁰ RORA gene plays a crucial role in the regulation of embryonic development, differentiation and immunity.⁴ TBX21 is a lineage-defining TF, which initiates Th1(CD4(+)) lineage development from naive T helper (CD4(+)) precursor cells.²⁴^,¹⁰ The LEF1 TF has a higher affinity to a functionally important site in the T-cell receptor-alpha enhancer, and thereby its presence in these regions increases the activity of the enhancer.³

Table 1. A predicted list of transcription factors for HSC to CD4(+) lymphocytes differentiation.

HGNC gene	GSEA p-val	GSEA q-val
TCF7	$4.36 \times 10^{- 33}$	$1.50 \times 10^{- 31}$
RORA	$6.98 \times 10^{- 31}$	$2.17 \times 10^{- 29}$
NR1D1	$1.83 \times 10^{- 29}$	$5.29 \times 10^{- 28}$
TBX21	$1.57 \times 10^{- 8}$	$6.20 \times 10^{- 8}$
LEF1	$2.53 \times 10^{- 7}$	$8.24 \times 10^{- 7}$

Methods

The proposed approach (Figure 1) consists of the following steps. First, for two given cell types involved in cell differentiation or conversion pathways, the minimal spanning tree (MST) is reconstructed based on the open chromatin in regulatory regions (Figure 2, Figure 3). Then, a differential accessibility analysis (DAA) between initial and final cell types is performed to retrieve a list of genomic regions (ATAC-seq peaks) ranked by the statistical significance of a change in chromatin accessibility for a given cell conversion (Figures 4, 5). Next, the sequences corresponding to each of the ranked regions undergo the functional annotation with TFBS. Finally, TFs ranking is inferred by GSEA,¹⁸ which was adjusted to estimate the tendency of TFBS for given TF under investigation to be over-represented at the most statistically significant genomic regions for a given cell differentiation or conversion.

Figure 1. Schematic overview of the proposed approach within the typical pipeline of TFs selection.

Figure 2. The Minimal Spanning Tree (MST), reconstructed on scATAC-seq data (GSE74912) for the system of the 8 hematopoietic cell types.

HSC, hematopoietic stem cells; MPP, multipotentent progenitor; LMPP, lymphoid-primed multipotent progenitor; CLP, common lymphoid progenitor; NK, natural killer cells.

Figure 3. UMAP clustering of (A) scATAC-seq and (B) scRNA-seq of the 13 primary hematopoietic cell types (GSE74912).

Figure 4. Heatmap of the most differentially accessible scATAC-seq regions between hematopoietic stem cell (HSC) and CD4(+) T helper cells (CD4Tcell) cells (GSE74912).

Figure 5. Heatmap of the most differentially expressed genes from scRNA-seq data between hematopoietic stem cell (HSC) and CD4(+) T helper cells (CD4Tcell) cells (GSE74912).

Reconstruction of cell trajectories with scATAC-seq data

scATAC-seq data (GEO: (GSE96769, GSE111586)) were used to reconstruct the minimal spanning tree (MST) of hematopoietic cell types, the hierarchy of which was aligned along pseudo-time, reflecting a degree of pluripotency of the cells observed in the single-cell assay for transposable-accessible chromatin (scATAC-seq) dataset.¹⁵ Thus, the obtained MST presents a collection of possible cell trajectories among the analyzed cell types.

Differential accessibility analysis

Similarly to DEG analysis,²⁰ a differential accessibility analysis (DAA) of genomic regions was performed between two given cell types on the cell trajectory by hrefhttps://www.bioconductor.org/packages/devel/bioc/manuals/slingshot/man/slingshot.pdfSlingshot v2.3. Accordingly, for each cell population on the MST, such a subset of regions ranked by p-value can be obtained, discriminating given cell population from others.

TFs filtration and TFBS annotation

We excluded from the downstream analysis TFs that had either a near-zero median expression (below 5% percentile) in the final cell type or had a higher expression in the original cell types based on scRNA-seq data (GEO: GSE74912). Thus, only TFs uniquely expressed in a final cell population were considered.

Genomic regions (scATAC-seq peaks from GSE74912) were listed and ranked based on the significance of DAA (p-value < 0.01) performed by Monocle2, and used for functional annotation by TFBS using position weight matrices (PWM, p-value < 0.0001) from the HOCOMOCO database.⁸

TFs ranking via GSEA-like enrichment analysis

GSEA¹⁸ was modified to perform the TF ranking according to their significance for a given cell conversion.

Since TF sequence preferences and, therefore, the quantity of TFBS for each TF is different, TFs annotations are presented highly unequally in the regions ranking. Thereby, GSEA here was utilized to infer the degree of TFBS abundance at the top of the regions ranking for a given conversion.

Consequently, for GSEA, the genomic regions ranking annotated with TFBS was taken as a pre-ranked list of TFs and each separate factor as a signature gene set. The final TFs ranking obtained from GSEA, thus, represents the significance of distinct TFs for cell differentiation or conversion.

Discussion

The proposed pipeline utilizes both transcriptomic and epigenenomic data at the single-cell resolution to search for core TFs that enable cell differentiation and conversion within the human hematopoietic system. The transcription factors rankings obtained (Table 1) suggest that the current approach is capable of predicting subsets of core TFs as well as reflecting their importance for cell differentiation and conversion between cells.

Conclusions

Herein, we described a method for integrating single-cell chromatin accessibility and gene expression data that can successfully select core TFs for cell differentiation and conversion in silico.

Data availability

Underlying data

Gene Expression Omnibus: A Single-Cell Atlas of in vivo Mammalian Chromatin Accessibility, https://identifiers.org/geo: GSE111586

Gene Expression Omnibus:Single-cell epigenomics maps the continuous regulatory landscape of human hematopoietic differentiation [scATAC-Seq], https://identifiers.org/geo: GSE96769

Gene Expression Omnibus: ATAC-seq data, https://identifiers.org/geo: GSE74912

Extended data

Analysis code available from: https://github.com/annykay/transFactorsPrediction

Archived analysis code as at time of publication: https://doi.org/10.5281/zenodo.5799254

License: MIT

Competing interests

No competing interests were disclosed.

Grant information

The study was supported by Ministry of Science and Higher Education of the Russian Federation (agreement no. 075-15-2020-899).

References

1. Buenrostro JD, Corces MR, Lareau CA, et al.: Integrated Single-Cell Analysis Maps the Continuous Regulatory Landscape of Human Hematopoietic Differentiation. Cell. 2018; 173(6): 1535–1548.e16. PubMed Abstract | Publisher Full Text
2. Cahan P, Li H, Morris SA, et al.: CellNet: Network biology applied to stem cell engineering. Cell. 2014; 158: 903–915. PubMed Abstract | Publisher Full Text
3. Choi YS, Gullicksrud JA, Xing S, et al.: LEF-1 and TCF-1 orchestrate TFH differentiation by regulating differentiation circuits upstream of the transcriptional repressor Bcl6. Nat. Immunol. 2015; 16: 980–990. PubMed Abstract | Publisher Full Text
4. Cook DN, Kang HS, Jetten AM: Retinoic Acid-Related Orphan Receptors (RORs): Regulatory Functions in Immunity, Development, Circadian Rhythm, and Metabolism. Nuclear Receptor Research. 2015; 2. PubMed Abstract | Publisher Full Text
5. Corces MR, Buenrostro JD, Wu B, et al.: Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution. Nat. Genet. 2016; 48: 1193–1203. PubMed Abstract | Publisher Full Text
6. Henriques T, Gilchrist DA, Nechaev S, et al.: Stable pausing by rna polymerase II provides an opportunity to target and integrate regulatory signals. Mol. Cell. 2013; 52: 517–528. PubMed Abstract | Publisher Full Text
7. Iwafuchi-Doi M, Zaret KS: Pioneer transcription factors in cell reprogramming.2014.
8. Kulakovskiy IV, Vorontsov IE, Yevshin IS, et al.: HOCOMOCO: Towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-Seq analysis. Nucleic Acids Res. 2018; 46: D252–D259. PubMed Abstract | Publisher Full Text
9. Lang AH, Li H, Collins JJ, et al.: Epigenetic Landscapes Explain Partially Reprogrammed Cells and Identify Key Reprogramming Genes. PLoS Comput. Biol. 2014; 10: e1003734. PubMed Abstract | Publisher Full Text
10. Luckheeram RV, Zhou R, Verma AD, et al.: CD4 +T cells: Differentiation and functions.2012.
11. Morris SA, Daley GQ: A blueprint for engineering cell fate: Current technologies to reprogram cell identity.2013.
12. Nichols J, Zevnik B, Anastassiadis K, et al.: Formation of pluripotent stem cells in the mammalian embryo depends on the POU transcription factor Oct4. Cell. 1998; 95: 379–391. PubMed Abstract | Publisher Full Text
13. Nish SA, Zens KD, Kratchmarov R, et al.: CD4+ T cell effector commitment coupled to self-renewal by asymmetric cell divisions. J. Exp. Med. 2017; 214: 39–47. PubMed Abstract | Publisher Full Text
14. Rackham OJ, Firas J, Fang H, et al.: A predictive computational framework for direct reprogramming between human cell types. Nat. Genet. 2016; 48: 331–335. PubMed Abstract | Publisher Full Text
15. Reid JE, Wernisch L: Pseudotime estimation: Deconfounding single cell time series. Bioinformatics. 2016; 32: 2973–2980. PubMed Abstract | Publisher Full Text
16. Rivera CM, Ren B: Mapping human epigenomes.2013.
17. Roost MS, Van Iperen L, Ariyurek Y, et al.: KeyGenes, a Tool to Probe Tissue Differentiation Using a Human Fetal Transcriptional Atlas. Stem Cell Reports. 2015; 4: 1112–1124. PubMed Abstract | Publisher Full Text
18. Subramanian A, Tamayo P, Mootha VK, et al.: Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U. S. A. 2005; 102: 15545–15550. PubMed Abstract | Publisher Full Text
19. Tapscott SJ, Davis RL, Thayer MJ, et al.: MyoDL: a Myc Requiring Nuclear Phosphoprotein to Convert Region Homology Myoblasts Fibroblasts to. Adv. Sci. 2010.
20. Tarazona S: Differential Expression in RNA-Seq. Gene Expr. 2011.
21. Vaquerizas JM, Kummerfeld SK, Teichmann SA, et al.: A census of human transcription factors: Function, expression and evolution.2009.
22. Vierbuchen T, Wernig M: Molecular Roadblocks for Cellular Reprogramming.2012.
23. Yamanaka S: Induced pluripotent stem cells: Past, present, and future.2012.
24. Zhu J, Yamane H, Paul WE: Differentiation of Effector CD4 T Cell Populations. Annu. Rev. Immunol. 2010; 28: 445–489. PubMed Abstract | Publisher Full Text

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 13 Jan 2022

Author details Author details

¹ Department of Biological and Medical Physics, Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, 141701, Russian Federation
² Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
³ Institute of Bioengineering, Research Center of Biotechnology, Moscow, 119071, Russian Federation
⁴ National Medical Research Center for Endocrinology, Moscow, 115478, Russian Federation

Mikhail Raevskiy
Roles: Formal Analysis, Methodology, Software, Writing – Original Draft Preparation, Writing – Review & Editing

Anna Kondrashina
Roles: Software, Validation, Visualization

Yulia Medvedeva
Roles: Funding Acquisition, Project Administration, Supervision, Writing – Review & Editing

Competing interests

No competing interests were disclosed.

Grant information

This study was supported by Ministry of Science and Higher
Education of the Russian Federation (agreement no.075-
15-2020-899).
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Article Versions (1)

version 1

Published: 13 Jan 2022, 11:38

https://doi.org/10.12688/f1000research.75321.1

Copyright

© 2022 Raevskiy M et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

0

SEE MORE DETAILS

CITE

how to cite this article

Raevskiy M, Kondrashina A and Medvedeva Y. Improving prediction of core transcription factors for cell reprogramming and transdifferentiation [version 1; peer review: 2 not approved]. F1000Research 2022, 11:38 (https://doi.org/10.12688/f1000research.75321.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Version 1

VERSION 1

PUBLISHED 13 Jan 2022

Views

18

Reviewer Report 31 Jan 2022

Erdem B. Dashinimaev, Koltzov Institute of Developmental Biology, Russian Academy of Sciences, Moscow, Russian Federation; Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Pirogov Russian National Research Medical University, Moscow, Russian Federation

Not Approved

https://doi.org/10.5256/f1000research.79178.r119675

Major comments:

On the whole, the article is unclear. I have no doubt that some thoughtful and valuable work has been done in which interesting data have been obtained, but in this manner, the entire work

Major comments:

On the whole, the article is unclear. I have no doubt that some thoughtful and valuable work has been done in which interesting data have been obtained, but in this manner, the entire work is presented incomprehensibly.
The abstract and introduction chapters are well written.
The Results chapter is poorly written and unclear, especially since the Methods chapter, which describes the proposed Pipeline, has moved behind the Results chapter. Considering that the main point of the article is the publication of the pipeline under development, perhaps it makes sense to eliminate the Methods chapter and move it to the Results chapter, taking into account the logic of the narrative. For example, the fact that the main result of the applied method was Table 1 becomes clear only at the end of the article from the Discussion chapter
The Discussion chapter was written extremely poorly. I suggest that this chapter should include references to similar published works (pipelines) from other teams in the field, and discuss their similarities and differences from your work (pipelines). It is also necessary to discuss the results in terms of the value of the data obtained, their potential applications in different fields of science and biotechnology. It is also necessary to highlight the disadvantages and limitations of your method and possible ways to solve them.

Minor comments:

Fig.2. - missing the transcription of the abbreviations of cell type names.
Perhaps, based on the logic of the narrative, swap Fig. 2 and Fig. 3.
The text contains a lot of abbreviations that are not explained in any way - for example TFBS, ATAC-seq, scATAC-seq, GSEA, DEG, Monocle2, PWM, HOCOMOCO, etc.

Of course, for narrow specialists these acronyms make sense, but one of the tasks of scientific publications is to convey information to a wider audience in the most accessible way possible. All the more so given the multidisciplinary nature of F1000Research.
"Pluripotency" is better replaced with "stemness". ¹

Conclusion:

I believe that the article requires significant revisions and rewrites.

Is the rationale for developing the new method (or application) clearly explained?

Partly
Is the description of the method technically sound?

Partly
Are sufficient details provided to allow replication of the method development and its use by others?

Partly
If any results are presented, are all the source data underlying the results available to ensure full reproducibility?

No
Are the conclusions about the method and its performance adequately supported by the findings presented in the article?

No

References

1. Melton D: ‘Stemness’. 2014. 7-17 Publisher Full Text

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Cell biology, human cell reprogramming, regenerative biomedicine

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

CITE

Report a concern

Respond or Comment

Views

30

Reviewer Report 27 Jan 2022

Valentiva Boeva, Department of Computer Science, Institute for Machine Learning, ETH Zurich, Zurich, Switzerland

Samuel Gunz, Department of Computer Science, Institute for Machine Learning, ETH Zurich, Zurich, Switzerland

Not Approved

https://doi.org/10.5256/f1000research.79178.r119678

Rationale:

The rationale of developing this method is clearly stated. The authors identify experimental work to be the bottleneck in identifying core transcription factors (TFs) and therefore suggest a computational approach to solve this challenge. However, the ... Continue reading

Rationale:

The rationale of developing this method is clearly stated. The authors identify experimental work to be the bottleneck in identifying core transcription factors (TFs) and therefore suggest a computational approach to solve this challenge. However, the authors do not state if there are existing methods in the field that address this problem. This information should be included, also in case there are similar methods that are related to the problem of identifying core TFs.

Methods:

The described methodology is overall understandable and seems technically sound. Nevertheless, parts of the described methods are too short and should be further specified. It includes

Details on the construction of the MST (what were exactly the input data? Some details on the algorithm or a reference would be useful too).
More details on how GSEA was adjusted in this method. What is the input into GSEA? Why is it applied?
Details on the parsing of the obtained TF rankings (e.g. if a threshold was used).

No information is given about data pre-processing requirements (scRNA-seq, scATAC-seq) to use the presented method. If the authors want their method to be used on other datasets this information should be included.

Data used in this study includes ‘A Single-Cell Atlas of in vivo Mammalian Chromatin Accessibility’ (GSE111586) which was obtained from mice. An explanation why this data was used should be included.

Reproducibility:

The analysis of this manuscript cannot be reproduced for a number of reasons. First, the code cannot be found from the link of the GitHub repository. (I assume that the correct link should be https://github.com/annykay/transFactorsPrediction- ) Second, major parts of the analysis are missing in the GitHub repository, as well as in the archived code repository. The missing parts include the construction of the MST, TF filtration and TFBS annotation and TF ranking via GSEA-like enrichment analysis and parsing of the obtained TF rankings.

In addition, there were a number of issues to reproduce the figures using the scripts that were available in the archived code repository.

Data referenced for Figure 3 (A) (GSE74912) do not correspond to scATAC-seq but to bulk ATAC-seq data.
The data referenced for Figure 3 (B) scRNA-seq are not retrievable from the accession number the authors specify. Another accession number was listed in the code (GSE74246) which seems to corresponds to bulk RNA-seq (not scRNA-seq) data.
A random seed for the UMAP algorithm and other parameters should be specified in the scripts such that the results are reproducible.

Results and discussion:

The main result of this study is a table of important TFs in hematopoietic cell differentiation. It is not clear why only these 5 TFs are reported (see comment on parsing of the obtained TF rankings). Moreover, the result is not clearly put into context in the discussion. Questions that have to be discussed include:

Are the reported TFs all TFs involved in hematopoietic cell differentiation known from the literature?
Are TFs involved in hematopoietic differentiation absent in the final enrichment list, if yes how many and what could be the reason that the method did not identify them?
Were some core TFs newly discovered using this method?
Can alternative methods, if they exist, retrieve these TFs?

Given the title, the authors claim that they can identify core TFs for cell reprograming. It is clear why the same idea could in principle be used in both cell reprograming and differentiation. However, the method is not applied to identify core TFs in cell reprograming in the manuscript. While the authors claim that their method can determine core TFs for cell differentiation this does not directly imply that the method can also be used to determine core TFs for cell reprogramming. If this result should be included in the title and the conclusion, results supporting this claim must be shown in the manuscript.

Other comments:

There is a typo in referencing slingshot in paragraph Differential accessibility analysis.
The links to the Gene Expression Omnibus do not always work.

Is the rationale for developing the new method (or application) clearly explained?

Yes
Is the description of the method technically sound?

Yes
Are sufficient details provided to allow replication of the method development and its use by others?

Partly
If any results are presented, are all the source data underlying the results available to ensure full reproducibility?

No
Are the conclusions about the method and its performance adequately supported by the findings presented in the article?

Partly

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Epigenetics, transcriptional control, bioinformatics

We confirm that we have read this submission and believe that we have an appropriate level of expertise to state that we do not consider it to be of an acceptable scientific standard, for reasons outlined above.

CITE

Report a concern

Respond or Comment

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 13 Jan 2022

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2
Version 1 13 Jan 22	read	read

Valentiva Boeva, ETH Zurich, Zurich, Switzerland

Samuel Gunz, ETH Zurich, Zurich, Switzerland
Erdem B. Dashinimaev, Russian Academy of Sciences, Moscow, Russian Federation; Pirogov Russian National Research Medical University, Moscow, Russian Federation

Comments on this article

All Comments(0)

Add a comment

Sign up for content alerts

Browse by related subjects

Back to all reports

Reviewer Report

18 Views

31 Jan 2022 | for Version 1

Erdem B. Dashinimaev, Koltzov Institute of Developmental Biology, Russian Academy of Sciences, Moscow, Russian Federation; Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Pirogov Russian National Research Medical University, Moscow, Russian Federation

18 Views Cite this report Responses(0)

Not Approved

Major comments:

On the whole, the article is unclear. I have no doubt that some thoughtful and valuable work has been done in which interesting data have been obtained, but in this manner, the entire work is presented incomprehensibly.
The abstract and introduction chapters are well written.
The Results chapter is poorly written and unclear, especially since the Methods chapter, which describes the proposed Pipeline, has moved behind the Results chapter. Considering that the main point of the article is the publication of the pipeline under development, perhaps it makes sense to eliminate the Methods chapter and move it to the Results chapter, taking into account the logic of the narrative. For example, the fact that the main result of the applied method was Table 1 becomes clear only at the end of the article from the Discussion chapter
The Discussion chapter was written extremely poorly. I suggest that this chapter should include references to similar published works (pipelines) from other teams in the field, and discuss their similarities and differences from your work (pipelines). It is also necessary to discuss the results in terms of the value of the data obtained, their potential applications in different fields of science and biotechnology. It is also necessary to highlight the disadvantages and limitations of your method and possible ways to solve them.

Minor comments:

Fig.2. - missing the transcription of the abbreviations of cell type names.
Perhaps, based on the logic of the narrative, swap Fig. 2 and Fig. 3.
The text contains a lot of abbreviations that are not explained in any way - for example TFBS, ATAC-seq, scATAC-seq, GSEA, DEG, Monocle2, PWM, HOCOMOCO, etc.

Of course, for narrow specialists these acronyms make sense, but one of the tasks of scientific publications is to convey information to a wider audience in the most accessible way possible. All the more so given the multidisciplinary nature of F1000Research.
"Pluripotency" is better replaced with "stemness". ¹

Conclusion:

I believe that the article requires significant revisions and rewrites.

Is the rationale for developing the new method (or application) clearly explained?

Partly
Is the description of the method technically sound?

Partly
Are sufficient details provided to allow replication of the method development and its use by others?

Partly
If any results are presented, are all the source data underlying the results available to ensure full reproducibility?

No
Are the conclusions about the method and its performance adequately supported by the findings presented in the article?

No

References

1. Melton D: ‘Stemness’. 2014. 7-17 Publisher Full Text

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Cell biology, human cell reprogramming, regenerative biomedicine

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

30 Views

27 Jan 2022 | for Version 1

Valentiva Boeva, Department of Computer Science, Institute for Machine Learning, ETH Zurich, Zurich, Switzerland

Samuel Gunz, Department of Computer Science, Institute for Machine Learning, ETH Zurich, Zurich, Switzerland

30 Views Cite this report Responses(0)

Not Approved

Rationale:

The rationale of developing this method is clearly stated. The authors identify experimental work to be the bottleneck in identifying core transcription factors (TFs) and therefore suggest a computational approach to solve this challenge. However, the authors do not state if there are existing methods in the field that address this problem. This information should be included, also in case there are similar methods that are related to the problem of identifying core TFs.

Methods:

The described methodology is overall understandable and seems technically sound. Nevertheless, parts of the described methods are too short and should be further specified. It includes

Details on the construction of the MST (what were exactly the input data? Some details on the algorithm or a reference would be useful too).
More details on how GSEA was adjusted in this method. What is the input into GSEA? Why is it applied?
Details on the parsing of the obtained TF rankings (e.g. if a threshold was used).

No information is given about data pre-processing requirements (scRNA-seq, scATAC-seq) to use the presented method. If the authors want their method to be used on other datasets this information should be included.

Data used in this study includes ‘A Single-Cell Atlas of in vivo Mammalian Chromatin Accessibility’ (GSE111586) which was obtained from mice. An explanation why this data was used should be included.

Reproducibility:

The analysis of this manuscript cannot be reproduced for a number of reasons. First, the code cannot be found from the link of the GitHub repository. (I assume that the correct link should be https://github.com/annykay/transFactorsPrediction- ) Second, major parts of the analysis are missing in the GitHub repository, as well as in the archived code repository. The missing parts include the construction of the MST, TF filtration and TFBS annotation and TF ranking via GSEA-like enrichment analysis and parsing of the obtained TF rankings.

In addition, there were a number of issues to reproduce the figures using the scripts that were available in the archived code repository.

Data referenced for Figure 3 (A) (GSE74912) do not correspond to scATAC-seq but to bulk ATAC-seq data.
The data referenced for Figure 3 (B) scRNA-seq are not retrievable from the accession number the authors specify. Another accession number was listed in the code (GSE74246) which seems to corresponds to bulk RNA-seq (not scRNA-seq) data.
A random seed for the UMAP algorithm and other parameters should be specified in the scripts such that the results are reproducible.

Results and discussion:

The main result of this study is a table of important TFs in hematopoietic cell differentiation. It is not clear why only these 5 TFs are reported (see comment on parsing of the obtained TF rankings). Moreover, the result is not clearly put into context in the discussion. Questions that have to be discussed include:

Are the reported TFs all TFs involved in hematopoietic cell differentiation known from the literature?
Are TFs involved in hematopoietic differentiation absent in the final enrichment list, if yes how many and what could be the reason that the method did not identify them?
Were some core TFs newly discovered using this method?
Can alternative methods, if they exist, retrieve these TFs?

Given the title, the authors claim that they can identify core TFs for cell reprograming. It is clear why the same idea could in principle be used in both cell reprograming and differentiation. However, the method is not applied to identify core TFs in cell reprograming in the manuscript. While the authors claim that their method can determine core TFs for cell differentiation this does not directly imply that the method can also be used to determine core TFs for cell reprogramming. If this result should be included in the title and the conclusion, results supporting this claim must be shown in the manuscript.

Other comments:

There is a typo in referencing slingshot in paragraph Differential accessibility analysis.
The links to the Gene Expression Omnibus do not always work.

Is the rationale for developing the new method (or application) clearly explained?

Yes
Is the description of the method technically sound?

Yes
Are sufficient details provided to allow replication of the method development and its use by others?

Partly
If any results are presented, are all the source data underlying the results available to ensure full reproducibility?

No
Are the conclusions about the method and its performance adequately supported by the findings presented in the article?

Partly

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Epigenetics, transcriptional control, bioinformatics

We confirm that we have read this submission and believe that we have an appropriate level of expertise to state that we do not consider it to be of an acceptable scientific standard, for reasons outlined above.

Respond to this report

Responses (0)

[1] 1. Buenrostro JD, Corces MR, Lareau CA, et al.: Integrated Single-Cell Analysis Maps the Continuous Regulatory Landscape of Human Hematopoietic Differentiation. Cell. 2018; 173(6): 1535–1548.e16. PubMed Abstract | Publisher Full Text

[2] 2. Cahan P, Li H, Morris SA, et al.: CellNet: Network biology applied to stem cell engineering. Cell. 2014; 158: 903–915. PubMed Abstract | Publisher Full Text

[3] 3. Choi YS, Gullicksrud JA, Xing S, et al.: LEF-1 and TCF-1 orchestrate TFH differentiation by regulating differentiation circuits upstream of the transcriptional repressor Bcl6. Nat. Immunol. 2015; 16: 980–990. PubMed Abstract | Publisher Full Text

[4] 4. Cook DN, Kang HS, Jetten AM: Retinoic Acid-Related Orphan Receptors (RORs): Regulatory Functions in Immunity, Development, Circadian Rhythm, and Metabolism. Nuclear Receptor Research. 2015; 2. PubMed Abstract | Publisher Full Text

[5] 5. Corces MR, Buenrostro JD, Wu B, et al.: Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution. Nat. Genet. 2016; 48: 1193–1203. PubMed Abstract | Publisher Full Text

[6] 6. Henriques T, Gilchrist DA, Nechaev S, et al.: Stable pausing by rna polymerase II provides an opportunity to target and integrate regulatory signals. Mol. Cell. 2013; 52: 517–528. PubMed Abstract | Publisher Full Text

[7] 7. Iwafuchi-Doi M, Zaret KS: Pioneer transcription factors in cell reprogramming.2014.

[8] 8. Kulakovskiy IV, Vorontsov IE, Yevshin IS, et al.: HOCOMOCO: Towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-Seq analysis. Nucleic Acids Res. 2018; 46: D252–D259. PubMed Abstract | Publisher Full Text

[9] 9. Lang AH, Li H, Collins JJ, et al.: Epigenetic Landscapes Explain Partially Reprogrammed Cells and Identify Key Reprogramming Genes. PLoS Comput. Biol. 2014; 10: e1003734. PubMed Abstract | Publisher Full Text

[10] 10. Luckheeram RV, Zhou R, Verma AD, et al.: CD4 +T cells: Differentiation and functions.2012.

[11] 11. Morris SA, Daley GQ: A blueprint for engineering cell fate: Current technologies to reprogram cell identity.2013.

[12] 12. Nichols J, Zevnik B, Anastassiadis K, et al.: Formation of pluripotent stem cells in the mammalian embryo depends on the POU transcription factor Oct4. Cell. 1998; 95: 379–391. PubMed Abstract | Publisher Full Text

[13] 13. Nish SA, Zens KD, Kratchmarov R, et al.: CD4+ T cell effector commitment coupled to self-renewal by asymmetric cell divisions. J. Exp. Med. 2017; 214: 39–47. PubMed Abstract | Publisher Full Text

[14] 14. Rackham OJ, Firas J, Fang H, et al.: A predictive computational framework for direct reprogramming between human cell types. Nat. Genet. 2016; 48: 331–335. PubMed Abstract | Publisher Full Text

[15] 15. Reid JE, Wernisch L: Pseudotime estimation: Deconfounding single cell time series. Bioinformatics. 2016; 32: 2973–2980. PubMed Abstract | Publisher Full Text

[16] 16. Rivera CM, Ren B: Mapping human epigenomes.2013.

[17] 17. Roost MS, Van Iperen L, Ariyurek Y, et al.: KeyGenes, a Tool to Probe Tissue Differentiation Using a Human Fetal Transcriptional Atlas. Stem Cell Reports. 2015; 4: 1112–1124. PubMed Abstract | Publisher Full Text

[18] 18. Subramanian A, Tamayo P, Mootha VK, et al.: Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U. S. A. 2005; 102: 15545–15550. PubMed Abstract | Publisher Full Text

[19] 19. Tapscott SJ, Davis RL, Thayer MJ, et al.: MyoDL: a Myc Requiring Nuclear Phosphoprotein to Convert Region Homology Myoblasts Fibroblasts to. Adv. Sci. 2010.

[20] 20. Tarazona S: Differential Expression in RNA-Seq. Gene Expr. 2011.

[21] 21. Vaquerizas JM, Kummerfeld SK, Teichmann SA, et al.: A census of human transcription factors: Function, expression and evolution.2009.

[22] 22. Vierbuchen T, Wernig M: Molecular Roadblocks for Cellular Reprogramming.2012.

[23] 23. Yamanaka S: Induced pluripotent stem cells: Past, present, and future.2012.

[24] 24. Zhu J, Yamane H, Paul WE: Differentiation of Effector CD4 T Cell Populations. Annu. Rev. Immunol. 2010; 28: 445–489. PubMed Abstract | Publisher Full Text

Improving prediction of core transcription factors for cell reprogramming and transdifferentiation

Abstract

Keywords

Introduction

Results

Table 1. A predicted list of transcription factors for HSC to CD4(+) lymphocytes differentiation.

Methods

Figure 1. Schematic overview of the proposed approach within the typical pipeline of TFs selection.

Figure 2. The Minimal Spanning Tree (MST), reconstructed on scATAC-seq data (GSE74912) for the system of the 8 hematopoietic cell types.

Figure 3. UMAP clustering of (A) scATAC-seq and (B) scRNA-seq of the 13 primary hematopoietic cell types (GSE74912).

Figure 4. Heatmap of the most differentially accessible scATAC-seq regions between hematopoietic stem cell (HSC) and CD4(+) T helper cells (CD4Tcell) cells (GSE74912).

Figure 5. Heatmap of the most differentially expressed genes from scRNA-seq data between hematopoietic stem cell (HSC) and CD4(+) T helper cells (CD4Tcell) cells (GSE74912).

Reconstruction of cell trajectories with scATAC-seq data

Differential accessibility analysis

TFs filtration and TFBS annotation

TFs ranking via GSEA-like enrichment analysis

Discussion

Conclusions

Data availability

Underlying data

Extended data

Competing interests

Grant information

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated