<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="methods-article" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.75321.1</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Method Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>Improving prediction of core transcription factors for cell reprogramming and transdifferentiation</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 1; peer review: 2 not approved]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Raevskiy</surname>
                        <given-names>Mikhail</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-6218-5480</uri>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Kondrashina</surname>
                        <given-names>Anna</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Medvedeva</surname>
                        <given-names>Yulia</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Funding Acquisition</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-7587-1666</uri>
                    <xref ref-type="corresp" rid="c2">b</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a3">3</xref>
                    <xref ref-type="aff" rid="a4">4</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>Department of Biological and Medical Physics, Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, 141701, Russian Federation</aff>
                <aff id="a2">
                    <label>2</label>Institute of Bioengineering, School of Life Sciences, &#x00c9;cole Polytechnique F&#x00e9;d&#x00e9;rale de Lausanne, Lausanne, Switzerland</aff>
                <aff id="a3">
                    <label>3</label>Institute of Bioengineering, Research Center of Biotechnology, Moscow, 119071, Russian Federation</aff>
                <aff id="a4">
                    <label>4</label>National Medical Research Center for Endocrinology, Moscow, 115478, Russian Federation</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:raevskymichail@gmail.com">raevskymichail@gmail.com</email>
                </corresp>
                <corresp id="c2">
                    <label>b</label>
                    <email xlink:href="mailto:ju.medvedeva@gmail.com">ju.medvedeva@gmail.com</email>
                </corresp>
                <fn fn-type="conflict">
                    <p>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>13</day>
                <month>1</month>
                <year>2022</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2022</year>
            </pub-date>
            <volume>11</volume>
            <elocation-id>38</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>23</day>
                    <month>12</month>
                    <year>2021</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2022 Raevskiy M et al.</copyright-statement>
                <copyright-year>2022</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/11-38/pdf"/>
            <abstract>
                <p>Identification of transcription factors (TFs) that could induce and direct cell conversion remains a challenge. Though several hundreds of TFs are usually transcribed in each cell type, the identity of a cell is controlled and can be achieved through the ectopic overexpression of only a small subset of so-called core TFs. Currently, the experimental identification of the core TFs for a broad spectrum of cell types remains challenging. Computational solutions to this problem would provide a better understanding of the mechanisms controlling cell identity during natural embryonic or malignant development, as well as give a foundation for cell-based therapy. Herein, we propose a computational approach based on over-enrichment of transcription factors binding sites (TFBS) in differentially accessible chromatin regions that could identify the potential core TFs for a variety of primary human cells involved in hematopoiesis. Our approach enables the integration of both transcriptomic (single-cell RNA sequencing, scRNA-seq) and epigenenomic (single-cell assay for transposable-accessible chromatin, scATAC-seq) data at the single-cell resolution to search for core TFs, and can be scalable to predict subsets of core TFs and their role in a given conversion between cells.</p>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>cell conversion</kwd>
                <kwd>scRNA-seq</kwd>
                <kwd>scATAC-seq</kwd>
                <kwd>transcription factors</kwd>
                <kwd>epigenetics</kwd>
            </kwd-group>
            <funding-group>
                <award-group id="fund-1" xlink:href="http://dx.doi.org/10.13039/501100012190">
                    <funding-source>Ministry of Science and Higher Education of the Russian Federation</funding-source>
                    <award-id>075-15-2020-899</award-id>
                </award-group>
                <funding-statement>This study was supported by Ministry of Science and Higher&#13;
Education of the Russian Federation (agreement no.075-&#13;
15-2020-899).</funding-statement>
                <funding-statement>
                    <italic>The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</italic>
                </funding-statement>
            </funding-group>
        </article-meta>
    </front>
    <body>
        <sec id="sec1" sec-type="intro">
            <title>Introduction</title>
            <p>The cell identity is largely controlled by transcription factors (TFs). TFs regulate gene expression by binding DNA in a sequence-specific manner, targeting short sequences called transcription factor binding sites (TFBS). Although almost half of all TFs are expressed in a particular cell type,
                <sup>
                    <xref ref-type="bibr" rid="ref21">21</xref>
                </sup> only a minor share of these TFs &#x2014; so-called core TFs &#x2014; are sufficient to maintain cell identity by defining the corresponding gene expression programs.
                <sup>
                    <xref ref-type="bibr" rid="ref11">11</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref22">22</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref23">23</xref>
                </sup> The identification of core TFs for a large number of cell types would be a valuable addition for an atlas of transcription regulators supplementing the Encyclopedia of Regulatory DNA Elements (ENCODE, Ref. 
                <xref ref-type="bibr" rid="ref16">16</xref>). Such an atlas, in turn, would facilitate systematic investigation of regulatory networks and contribute to establishing and refining direct cell conversion protocols for clinically relevant cell types.
                <sup>
                    <xref ref-type="bibr" rid="ref6">6</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref7">7</xref>
                </sup>
            </p>
            <p>Systematic determination of core TFs controlling individual cell type identity has previously been attempted. Initial efforts were mainly focused on the experimental screening of the TFs, presumably regulating the deferentially expressed genes (DEGs) in the comparison between query cell type, and a small number of alternative cell types that could potentially serve as an initial stage for conversion. Some of these TFs could play a role as regulators controlling cellular identities. For example, studies showed that over-expression of 
                <italic toggle="yes">MyoD1</italic> in fibroblasts leads to its conversion into the muscle cells,
                <sup>
                    <xref ref-type="bibr" rid="ref19">19</xref>
                </sup> while inhibition of 
                <italic toggle="yes">Oct4</italic> resulted in the suppression of the pluripotent stem cell population during mammalian embryo development.
                <sup>
                    <xref ref-type="bibr" rid="ref12">12</xref>
                </sup> Recent experiments with TF over-expression leading to conversion of cells to another cell type appeared to be used as a stringent test of the potential of specific TFs to establish and maintain cell identity.
                <sup>
                    <xref ref-type="bibr" rid="ref11">11</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref22">22</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref23">23</xref>
                </sup> Nonetheless, while being illustrative validation for each TF, such experiments are still time- and labor-consuming, and resulting observations are limited to specific cell types.</p>
            <p>The growth of genome-wide sequencing technologies allowed to develop computational systems capable of predicting candidate core TFs.
                <sup>
                    <xref ref-type="bibr" rid="ref2">2</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref9">9</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref14">14</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref17">17</xref>
                </sup> However, being broad in scope and easily scalable, these methods infer predictions using preferably only bulk RNA sequencing (RNA-seq) data, which estimates the average gene expression level across a hundred thousands to millions of cells. As a result, they are insufficient for analysis of heterogeneous systems, such as early embryonic populations or complex tissues, including brain or bone marrow.</p>
            <p>Here we propose an approach that uses single-cell expression and DNA accessibility data to select core TFs for cell differentiation or directed conversion. A distinct feature of the approach is incorporating not only TFs expression levels in the original and target cell types, but also (1) the chromatin conditions in gene regulatory elements, as well as (2) TF putative binding sites. Thus, this method simultaneously takes into account the accessibility and expression profile of the initial and terminal cell types involved in the conversion. Additionally, our method uses modified gene set enrichment analysis (GSEA)
                <sup>
                    <xref ref-type="bibr" rid="ref18">18</xref>
                </sup> for the selection of core TFs, thus reducing the number of arbitrary thresholds in the pipeline.</p>
        </sec>
        <sec id="sec2" sec-type="results">
            <title>Results</title>
            <p>To validate our method, we applied it to hematopoietic differentiation datasets,
                <sup>
                    <xref ref-type="bibr" rid="ref5">5</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref1">1</xref>
                </sup> since this process has been extensively studied. We provided TFs for the hematopoietic stem cells (HSC) differentiation into CD4(+) cells as an example (
                <xref ref-type="table" rid="T1">Table 1</xref>). The detected TFs are critical for the HSC-to-CD4(+) cells differentiation. The top-ranked TF, TCF7, is a transcription activator recruited in T-cell lymphocyte differentiation and is necessary for the survival of immature CD4(+) and CD8(+) thymocytes.
                <sup>
                    <xref ref-type="bibr" rid="ref13">13</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref10">10</xref>
                </sup> RORA gene plays a crucial role in the regulation of embryonic development, differentiation and immunity.
                <sup>
                    <xref ref-type="bibr" rid="ref4">4</xref>
                </sup> TBX21 is a lineage-defining TF, which initiates Th1(CD4(+)) lineage development from naive T helper (CD4(+)) precursor cells.
                <sup>
                    <xref ref-type="bibr" rid="ref24">24</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref10">10</xref>
                </sup> The LEF1 TF has a higher affinity to a functionally important site in the T-cell receptor-alpha enhancer, and thereby its presence in these regions increases the activity of the enhancer.
                <sup>
                    <xref ref-type="bibr" rid="ref3">3</xref>
                </sup>
            </p>
            <table-wrap id="T1" orientation="portrait" position="float">
                <label>Table 1. </label>
                <caption>
                    <title>A predicted list of transcription factors for 
                        <italic toggle="yes">HSC</italic> to 
                        <italic toggle="yes">CD4(+) lymphocytes</italic> differentiation.</title>
                </caption>
                <table content-type="article-table" frame="hsides">
                    <thead>
                        <tr>
                            <th align="left" colspan="1" rowspan="1" valign="top">HGNC gene</th>
                            <th align="left" colspan="1" rowspan="1" valign="top">GSEA 
                                <italic toggle="yes">p-val</italic>
                            </th>
                            <th align="left" colspan="1" rowspan="1" valign="top">GSEA 
                                <italic toggle="yes">q-val</italic>
                            </th>
                        </tr>
                    </thead>
                    <tbody>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">TCF7</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">
                                <inline-formula>
                                    <mml:math display="inline">
                                        <mml:mn>4.36</mml:mn>
                                        <mml:mo>&#x00d7;</mml:mo>
                                        <mml:msup>
                                            <mml:mn>10</mml:mn>
                                            <mml:mrow>
                                                <mml:mo>&#x2212;</mml:mo>
                                                <mml:mn>33</mml:mn>
                                            </mml:mrow>
                                        </mml:msup>
                                    </mml:math>
                                </inline-formula>
                            </td>
                            <td align="left" colspan="1" rowspan="1" valign="top">
                                <inline-formula>
                                    <mml:math display="inline">
                                        <mml:mn>1.50</mml:mn>
                                        <mml:mo>&#x00d7;</mml:mo>
                                        <mml:msup>
                                            <mml:mn>10</mml:mn>
                                            <mml:mrow>
                                                <mml:mo>&#x2212;</mml:mo>
                                                <mml:mn>31</mml:mn>
                                            </mml:mrow>
                                        </mml:msup>
                                    </mml:math>
                                </inline-formula>
                            </td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">RORA</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">
                                <inline-formula>
                                    <mml:math display="inline">
                                        <mml:mn>6.98</mml:mn>
                                        <mml:mo>&#x00d7;</mml:mo>
                                        <mml:msup>
                                            <mml:mn>10</mml:mn>
                                            <mml:mrow>
                                                <mml:mo>&#x2212;</mml:mo>
                                                <mml:mn>31</mml:mn>
                                            </mml:mrow>
                                        </mml:msup>
                                    </mml:math>
                                </inline-formula>
                            </td>
                            <td align="left" colspan="1" rowspan="1" valign="top">
                                <inline-formula>
                                    <mml:math display="inline">
                                        <mml:mn>2.17</mml:mn>
                                        <mml:mo>&#x00d7;</mml:mo>
                                        <mml:msup>
                                            <mml:mn>10</mml:mn>
                                            <mml:mrow>
                                                <mml:mo>&#x2212;</mml:mo>
                                                <mml:mn>29</mml:mn>
                                            </mml:mrow>
                                        </mml:msup>
                                    </mml:math>
                                </inline-formula>
                            </td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">NR1D1</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">
                                <inline-formula>
                                    <mml:math display="inline">
                                        <mml:mn>1.83</mml:mn>
                                        <mml:mo>&#x00d7;</mml:mo>
                                        <mml:msup>
                                            <mml:mn>10</mml:mn>
                                            <mml:mrow>
                                                <mml:mo>&#x2212;</mml:mo>
                                                <mml:mn>29</mml:mn>
                                            </mml:mrow>
                                        </mml:msup>
                                    </mml:math>
                                </inline-formula>
                            </td>
                            <td align="left" colspan="1" rowspan="1" valign="top">
                                <inline-formula>
                                    <mml:math display="inline">
                                        <mml:mn>5.29</mml:mn>
                                        <mml:mo>&#x00d7;</mml:mo>
                                        <mml:msup>
                                            <mml:mn>10</mml:mn>
                                            <mml:mrow>
                                                <mml:mo>&#x2212;</mml:mo>
                                                <mml:mn>28</mml:mn>
                                            </mml:mrow>
                                        </mml:msup>
                                    </mml:math>
                                </inline-formula>
                            </td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">TBX21</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">
                                <inline-formula>
                                    <mml:math display="inline">
                                        <mml:mn>1.57</mml:mn>
                                        <mml:mo>&#x00d7;</mml:mo>
                                        <mml:msup>
                                            <mml:mn>10</mml:mn>
                                            <mml:mrow>
                                                <mml:mo>&#x2212;</mml:mo>
                                                <mml:mn>8</mml:mn>
                                            </mml:mrow>
                                        </mml:msup>
                                    </mml:math>
                                </inline-formula>
                            </td>
                            <td align="left" colspan="1" rowspan="1" valign="top">
                                <inline-formula>
                                    <mml:math display="inline">
                                        <mml:mn>6.20</mml:mn>
                                        <mml:mo>&#x00d7;</mml:mo>
                                        <mml:msup>
                                            <mml:mn>10</mml:mn>
                                            <mml:mrow>
                                                <mml:mo>&#x2212;</mml:mo>
                                                <mml:mn>8</mml:mn>
                                            </mml:mrow>
                                        </mml:msup>
                                    </mml:math>
                                </inline-formula>
                            </td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1" valign="top">LEF1</td>
                            <td align="left" colspan="1" rowspan="1" valign="top">
                                <inline-formula>
                                    <mml:math display="inline">
                                        <mml:mn>2.53</mml:mn>
                                        <mml:mo>&#x00d7;</mml:mo>
                                        <mml:msup>
                                            <mml:mn>10</mml:mn>
                                            <mml:mrow>
                                                <mml:mo>&#x2212;</mml:mo>
                                                <mml:mn>7</mml:mn>
                                            </mml:mrow>
                                        </mml:msup>
                                    </mml:math>
                                </inline-formula>
                            </td>
                            <td align="left" colspan="1" rowspan="1" valign="top">
                                <inline-formula>
                                    <mml:math display="inline">
                                        <mml:mn>8.24</mml:mn>
                                        <mml:mo>&#x00d7;</mml:mo>
                                        <mml:msup>
                                            <mml:mn>10</mml:mn>
                                            <mml:mrow>
                                                <mml:mo>&#x2212;</mml:mo>
                                                <mml:mn>7</mml:mn>
                                            </mml:mrow>
                                        </mml:msup>
                                    </mml:math>
                                </inline-formula>
                            </td>
                        </tr>
                    </tbody>
                </table>
            </table-wrap>
        </sec>
        <sec id="sec3" sec-type="methods">
            <title>Methods</title>
            <p>The proposed approach (
                <xref ref-type="fig" rid="f1">Figure 1</xref>) consists of the following steps. First, for two given cell types involved in cell differentiation or conversion pathways, the minimal spanning tree (MST) is reconstructed based on the open chromatin in regulatory regions (
                <xref ref-type="fig" rid="f2">Figure 2</xref>, 
                <xref ref-type="fig" rid="f3">Figure 3</xref>). Then, a differential accessibility analysis (DAA) between initial and final cell types is performed to retrieve a list of genomic regions (ATAC-seq peaks) ranked by the statistical significance of a change in chromatin accessibility for a given cell conversion (
                <xref ref-type="fig" rid="f4">Figures 4</xref>, 
                <xref ref-type="fig" rid="f5">5</xref>). Next, the sequences corresponding to each of the ranked regions undergo the functional annotation with TFBS. Finally, TFs ranking is inferred by GSEA,
                <sup>
                    <xref ref-type="bibr" rid="ref18">18</xref>
                </sup> which was adjusted to estimate the tendency of TFBS for given TF under investigation to be over-represented at the most statistically significant genomic regions for a given cell differentiation or conversion.</p>
            <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                <label>Figure 1. </label>
                <caption>
                    <title>Schematic overview of the proposed approach within the typical pipeline of TFs selection.</title>
                </caption>
                <graphic id="gr1" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/79178/89cc6955-9712-4313-8024-0d6a8edc0830_figure1.gif"/>
            </fig>
            <fig fig-type="figure" id="f2" orientation="portrait" position="float">
                <label>Figure 2. </label>
                <caption>
                    <title>The Minimal Spanning Tree (MST), reconstructed on scATAC-seq data (GSE74912) for the system of the 8 hematopoietic cell types.</title>
                    <p>HSC, hematopoietic stem cells; MPP, multipotentent progenitor; LMPP, lymphoid-primed multipotent progenitor; CLP, common lymphoid progenitor; NK, natural killer cells.</p>
                </caption>
                <graphic id="gr2" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/79178/89cc6955-9712-4313-8024-0d6a8edc0830_figure2.gif"/>
            </fig>
            <fig fig-type="figure" id="f3" orientation="portrait" position="float">
                <label>Figure 3. </label>
                <caption>
                    <title>UMAP clustering of (A) scATAC-seq and (B) scRNA-seq of the 13 primary hematopoietic cell types (GSE74912).</title>
                </caption>
                <graphic id="gr3" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/79178/89cc6955-9712-4313-8024-0d6a8edc0830_figure3.gif"/>
            </fig>
            <fig fig-type="figure" id="f4" orientation="portrait" position="float">
                <label>Figure 4. </label>
                <caption>
                    <title>Heatmap of the most differentially accessible scATAC-seq regions between hematopoietic stem cell (HSC) and 
                        <italic toggle="yes">CD4(+)</italic> T helper cells (CD4Tcell) cells (GSE74912).</title>
                </caption>
                <graphic id="gr4" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/79178/89cc6955-9712-4313-8024-0d6a8edc0830_figure4.gif"/>
            </fig>
            <fig fig-type="figure" id="f5" orientation="portrait" position="float">
                <label>Figure 5. </label>
                <caption>
                    <title>Heatmap of the most differentially expressed genes from scRNA-seq data between hematopoietic stem cell (HSC) and 
                        <italic toggle="yes">CD4(+)</italic> T helper cells (CD4Tcell) cells (GSE74912).</title>
                </caption>
                <graphic id="gr5" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/79178/89cc6955-9712-4313-8024-0d6a8edc0830_figure5.gif"/>
            </fig>
            <sec id="sec4">
                <title>Reconstruction of cell trajectories with scATAC-seq data</title>
                <p>scATAC-seq data (GEO: (GSE96769, GSE111586)) were used to reconstruct the minimal spanning tree (MST) of hematopoietic cell types, the hierarchy of which was aligned along pseudo-time, reflecting a degree of pluripotency of the cells observed in the single-cell assay for transposable-accessible chromatin (scATAC-seq) dataset.
                    <sup>
                        <xref ref-type="bibr" rid="ref15">15</xref>
                    </sup> Thus, the obtained MST presents a collection of possible cell trajectories among the analyzed cell types.</p>
            </sec>
            <sec id="sec5">
                <title>Differential accessibility analysis</title>
                <p>Similarly to DEG analysis,
                    <sup>
                        <xref ref-type="bibr" rid="ref20">20</xref>
                    </sup> a differential accessibility analysis (DAA) of genomic regions was performed between two given cell types on the cell trajectory by hrefhttps://
                    <ext-link ext-link-type="uri" xlink:href="http://www.bioconductor.org/packages/devel/bioc/manuals/slingshot/man/slingshot.pdfSlingshot">www.bioconductor.org/packages/devel/bioc/manuals/slingshot/man/slingshot.pdfSlingshot</ext-link> v2.3. Accordingly, for each cell population on the MST, such a subset of regions ranked by p-value can be obtained, discriminating given cell population from others.</p>
            </sec>
            <sec id="sec6">
                <title>TFs filtration and TFBS annotation</title>
                <p>We excluded from the downstream analysis TFs that had either a near-zero median expression (below 5% percentile) in the final cell type or had a higher expression in the original cell types based on scRNA-seq data (GEO: GSE74912). Thus, only TFs uniquely expressed in a final cell population were considered.</p>
                <p>Genomic regions (scATAC-seq peaks from GSE74912) were listed and ranked based on the significance of DAA (p-value &lt; 0.01) performed by Monocle2, and used for functional annotation by TFBS using position weight matrices (PWM, p-value &lt; 0.0001) from the HOCOMOCO database.
                    <sup>
                        <xref ref-type="bibr" rid="ref8">8</xref>
                    </sup>
                </p>
            </sec>
            <sec id="sec7">
                <title>TFs ranking via GSEA-like enrichment analysis</title>
                <p>GSEA
                    <sup>
                        <xref ref-type="bibr" rid="ref18">18</xref>
                    </sup> was modified to perform the TF ranking according to their significance for a given cell conversion.</p>
                <p>Since TF sequence preferences and, therefore, the quantity of TFBS for each TF is different, TFs annotations are presented highly unequally in the regions ranking. Thereby, GSEA here was utilized to infer the degree of TFBS abundance at the top of the regions ranking for a given conversion.</p>
                <p>Consequently, for GSEA, the genomic regions ranking annotated with TFBS was taken as a pre-ranked list of TFs and each separate factor as a signature gene set. The final TFs ranking obtained from GSEA, thus, represents the significance of distinct TFs for cell differentiation or conversion.</p>
            </sec>
        </sec>
        <sec id="sec8" sec-type="discussion">
            <title>Discussion</title>
            <p>The proposed pipeline utilizes both transcriptomic and epigenenomic data at the single-cell resolution to search for core TFs that enable cell differentiation and conversion within the human hematopoietic system. The transcription factors rankings obtained (
                <xref ref-type="table" rid="T1">Table 1</xref>) suggest that the current approach is capable of predicting subsets of core TFs as well as reflecting their importance for cell differentiation and conversion between cells.</p>
        </sec>
        <sec id="sec9" sec-type="conclusions">
            <title>Conclusions</title>
            <p>Herein, we described a method for integrating single-cell chromatin accessibility and gene expression data that can successfully select core TFs for cell differentiation and conversion 
                <italic toggle="yes">in silico.</italic>
            </p>
        </sec>
        <sec id="sec10">
            <title>Data availability</title>
            <sec id="sec11">
                <title>Underlying data</title>
                <p>Gene Expression Omnibus: A Single-Cell Atlas of in vivo Mammalian Chromatin Accessibility, 
                    <ext-link ext-link-type="uri" xlink:href="https://identifiers.org/geo:">https://identifiers.org/geo:</ext-link> GSE111586</p>
                <p>Gene Expression Omnibus:Single-cell epigenomics maps the continuous regulatory landscape of human hematopoietic differentiation [scATAC-Seq], 
                    <ext-link ext-link-type="uri" xlink:href="https://identifiers.org/geo:">https://identifiers.org/geo:</ext-link> GSE96769</p>
                <p>Gene Expression Omnibus: ATAC-seq data, 
                    <ext-link ext-link-type="uri" xlink:href="https://identifiers.org/geo:">https://identifiers.org/geo:</ext-link> GSE74912</p>
            </sec>
            <sec id="sec12">
                <title>Extended data</title>
                <p>Analysis code available from: 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/annykay/transFactorsPrediction">https://github.com/annykay/transFactorsPrediction</ext-link>
                </p>
                <p>Archived analysis code as at time of publication: 
                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.5799254">https://doi.org/10.5281/zenodo.5799254</ext-link>
                </p>
                <p>License: MIT</p>
            </sec>
        </sec>
        <sec id="sec13">
            <title>Competing interests</title>
            <p>No competing interests were disclosed.</p>
        </sec>
        <sec id="sec14">
            <title>Grant information</title>
            <p>The study was supported by Ministry of Science and Higher Education of the Russian Federation (agreement no. 075-15-2020-899).</p>
        </sec>
    </body>
    <back>
        <ref-list>
            <title>References</title>
            <ref id="ref1">
                <label>1</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Buenrostro</surname>
                            <given-names>JD</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Corces</surname>
                            <given-names>MR</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lareau</surname>
                            <given-names>CA</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Integrated Single-Cell Analysis Maps the Continuous Regulatory Landscape of Human Hematopoietic Differentiation.</article-title>
                    <source>

                        <italic toggle="yes">Cell.</italic>
</source>
                    <year>2018</year>;<volume>173</volume>(<issue>6</issue>):<fpage>1535</fpage>&#x2013;<lpage>1548.e16</lpage>.
                    <pub-id pub-id-type="pmid">29706549</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.cell.2018.03.074</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref2">
                <label>2</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Cahan</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Li</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Morris</surname>
                            <given-names>SA</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>CellNet: Network biology applied to stem cell engineering.</article-title>
                    <source>

                        <italic toggle="yes">Cell.</italic>
</source>
                    <year>2014</year>;<volume>158</volume>:<fpage>903</fpage>&#x2013;<lpage>915</lpage>.
                    <pub-id pub-id-type="pmid">25126793</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.cell.2014.07.020</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref3">
                <label>3</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Choi</surname>
                            <given-names>YS</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gullicksrud</surname>
                            <given-names>JA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Xing</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>LEF-1 and TCF-1 orchestrate TFH differentiation by regulating differentiation circuits upstream of the transcriptional repressor Bcl6.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Immunol.</italic>
</source>
                    <year>2015</year>;<volume>16</volume>:<fpage>980</fpage>&#x2013;<lpage>990</lpage>.
                    <pub-id pub-id-type="pmid">26214741</pub-id>
                    <pub-id pub-id-type="doi">10.1038/ni.3226</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref4">
                <label>4</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Cook</surname>
                            <given-names>DN</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kang</surname>
                            <given-names>HS</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Jetten</surname>
                            <given-names>AM</given-names>
                        </name>
</person-group>:
                    <article-title>Retinoic Acid-Related Orphan Receptors (RORs): Regulatory Functions in Immunity, Development, Circadian Rhythm, and Metabolism.</article-title>
                    <source>

                        <italic toggle="yes">Nuclear Receptor Research.</italic>
</source>
                    <year>2015</year>;<volume>2</volume>.
                    <pub-id pub-id-type="pmid">26878025</pub-id>
                    <pub-id pub-id-type="doi">10.11131/2015/101185</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref5">
                <label>5</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Corces</surname>
                            <given-names>MR</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Buenrostro</surname>
                            <given-names>JD</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wu</surname>
                            <given-names>B</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Genet.</italic>
</source>
                    <year>2016</year>;<volume>48</volume>:<fpage>1193</fpage>&#x2013;<lpage>1203</lpage>.
                    <pub-id pub-id-type="pmid">27526324</pub-id>
                    <pub-id pub-id-type="doi">10.1038/ng.3646</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref6">
                <label>6</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Henriques</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gilchrist</surname>
                            <given-names>DA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Nechaev</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Stable pausing by rna polymerase II provides an opportunity to target and integrate regulatory signals.</article-title>
                    <source>

                        <italic toggle="yes">Mol. Cell.</italic>
</source>
                    <year>2013</year>;<volume>52</volume>:<fpage>517</fpage>&#x2013;<lpage>528</lpage>.
                    <pub-id pub-id-type="pmid">24184211</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.molcel.2013.10.001</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref7">
                <label>7</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Iwafuchi-Doi</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zaret</surname>
                            <given-names>KS</given-names>
                        </name>
</person-group>:
                    <article-title>Pioneer transcription factors in cell reprogramming.</article-title>
                    <year>2014</year>.</mixed-citation>
            </ref>
            <ref id="ref8">
                <label>8</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kulakovskiy</surname>
                            <given-names>IV</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Vorontsov</surname>
                            <given-names>IE</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Yevshin</surname>
                            <given-names>IS</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>HOCOMOCO: Towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-Seq analysis.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2018</year>;<volume>46</volume>:<fpage>D252</fpage>&#x2013;<lpage>D259</lpage>.
                    <pub-id pub-id-type="pmid">29140464</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkx1106</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref9">
                <label>9</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Lang</surname>
                            <given-names>AH</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Li</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Collins</surname>
                            <given-names>JJ</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Epigenetic Landscapes Explain Partially Reprogrammed Cells and Identify Key Reprogramming Genes.</article-title>
                    <source>

                        <italic toggle="yes">PLoS Comput. Biol.</italic>
</source>
                    <year>2014</year>;<volume>10</volume>:<fpage>e1003734</fpage>.
                    <pub-id pub-id-type="pmid">25122086</pub-id>
                    <pub-id pub-id-type="doi">10.1371/journal.pcbi.1003734</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref10">
                <label>10</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Luckheeram</surname>
                            <given-names>RV</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zhou</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Verma</surname>
                            <given-names>AD</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>CD4 +T cells: Differentiation and functions.</article-title>
                    <year>2012</year>.</mixed-citation>
            </ref>
            <ref id="ref11">
                <label>11</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Morris</surname>
                            <given-names>SA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Daley</surname>
                            <given-names>GQ</given-names>
                        </name>
</person-group>:
                    <article-title>A blueprint for engineering cell fate: Current technologies to reprogram cell identity.</article-title>
                    <year>2013</year>.</mixed-citation>
            </ref>
            <ref id="ref12">
                <label>12</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Nichols</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zevnik</surname>
                            <given-names>B</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Anastassiadis</surname>
                            <given-names>K</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Formation of pluripotent stem cells in the mammalian embryo depends on the POU transcription factor Oct4.</article-title>
                    <source>

                        <italic toggle="yes">Cell.</italic>
</source>
                    <year>1998</year>;<volume>95</volume>:<fpage>379</fpage>&#x2013;<lpage>391</lpage>.
                    <pub-id pub-id-type="pmid">9814708</pub-id>
                    <pub-id pub-id-type="doi">10.1016/S0092-8674(00)81769-9</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref13">
                <label>13</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Nish</surname>
                            <given-names>SA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zens</surname>
                            <given-names>KD</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kratchmarov</surname>
                            <given-names>R</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>CD4+ T cell effector commitment coupled to self-renewal by asymmetric cell divisions.</article-title>
                    <source>

                        <italic toggle="yes">J. Exp. Med.</italic>
</source>
                    <year>2017</year>;<volume>214</volume>:<fpage>39</fpage>&#x2013;<lpage>47</lpage>.
                    <pub-id pub-id-type="pmid">27923906</pub-id>
                    <pub-id pub-id-type="doi">10.1084/jem.20161046</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref14">
                <label>14</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Rackham</surname>
                            <given-names>OJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Firas</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Fang</surname>
                            <given-names>H</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A predictive computational framework for direct reprogramming between human cell types.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Genet.</italic>
</source>
                    <year>2016</year>;<volume>48</volume>:<fpage>331</fpage>&#x2013;<lpage>335</lpage>.
                    <pub-id pub-id-type="pmid">26780608</pub-id>
                    <pub-id pub-id-type="doi">10.1038/ng.3487</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref15">
                <label>15</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Reid</surname>
                            <given-names>JE</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wernisch</surname>
                            <given-names>L</given-names>
                        </name>
</person-group>:
                    <article-title>Pseudotime estimation: Deconfounding single cell time series.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2016</year>;<volume>32</volume>:<fpage>2973</fpage>&#x2013;<lpage>2980</lpage>.
                    <pub-id pub-id-type="pmid">27318198</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btw372</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref16">
                <label>16</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Rivera</surname>
                            <given-names>CM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ren</surname>
                            <given-names>B</given-names>
                        </name>
</person-group>:
                    <article-title>Mapping human epigenomes.</article-title>
                    <year>2013</year>.</mixed-citation>
            </ref>
            <ref id="ref17">
                <label>17</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Roost</surname>
                            <given-names>MS</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Van Iperen</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ariyurek</surname>
                            <given-names>Y</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>KeyGenes, a Tool to Probe Tissue Differentiation Using a Human Fetal Transcriptional Atlas.</article-title>
                    <source>

                        <italic toggle="yes">Stem Cell Reports.</italic>
</source>
                    <year>2015</year>;<volume>4</volume>:<fpage>1112</fpage>&#x2013;<lpage>1124</lpage>.
                    <pub-id pub-id-type="pmid">26028532</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.stemcr.2015.05.002</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref18">
                <label>18</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Subramanian</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Tamayo</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mootha</surname>
                            <given-names>VK</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles.</article-title>
                    <source>

                        <italic toggle="yes">Proc. Natl. Acad. Sci. U. S. A.</italic>
</source>
                    <year>2005</year>;<volume>102</volume>:<fpage>15545</fpage>&#x2013;<lpage>15550</lpage>.
                    <pub-id pub-id-type="pmid">16199517</pub-id>
                    <pub-id pub-id-type="doi">10.1073/pnas.0506580102</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref19">
                <label>19</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Tapscott</surname>
                            <given-names>SJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Davis</surname>
                            <given-names>RL</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Thayer</surname>
                            <given-names>MJ</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>MyoDL: a Myc Requiring Nuclear Phosphoprotein to Convert Region Homology Myoblasts Fibroblasts to.</article-title>
                    <source>

                        <italic toggle="yes">Adv. Sci.</italic>
</source>
                    <year>2010</year>.</mixed-citation>
            </ref>
            <ref id="ref20">
                <label>20</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Tarazona</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>Differential Expression in RNA-Seq.</article-title>
                    <source>

                        <italic toggle="yes">Gene Expr.</italic>
</source>
                    <year>2011</year>.</mixed-citation>
            </ref>
            <ref id="ref21">
                <label>21</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Vaquerizas</surname>
                            <given-names>JM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kummerfeld</surname>
                            <given-names>SK</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Teichmann</surname>
                            <given-names>SA</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A census of human transcription factors: Function, expression and evolution.</article-title>
                    <year>2009</year>.</mixed-citation>
            </ref>
            <ref id="ref22">
                <label>22</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Vierbuchen</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wernig</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <article-title>Molecular Roadblocks for Cellular Reprogramming.</article-title>
                    <year>2012</year>.</mixed-citation>
            </ref>
            <ref id="ref23">
                <label>23</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Yamanaka</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>Induced pluripotent stem cells: Past, present, and future.</article-title>
                    <year>2012</year>.</mixed-citation>
            </ref>
            <ref id="ref24">
                <label>24</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Zhu</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Yamane</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Paul</surname>
                            <given-names>WE</given-names>
                        </name>
</person-group>:
                    <article-title>Differentiation of Effector CD4 T Cell Populations.</article-title>
                    <source>

                        <italic toggle="yes">Annu. Rev. Immunol.</italic>
</source>
                    <year>2010</year>;<volume>28</volume>:<fpage>445</fpage>&#x2013;<lpage>489</lpage>.
                    <pub-id pub-id-type="pmid">20192806</pub-id>
                    <pub-id pub-id-type="doi">10.1146/annurev-immunol-030409-101212</pub-id>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report119675">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.79178.r119675</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Dashinimaev</surname>
                        <given-names>Erdem B.</given-names>
                    </name>
                    <xref ref-type="aff" rid="r119675a1">1</xref>
                    <xref ref-type="aff" rid="r119675a2">2</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-5640-7139</uri>
                </contrib>
                <aff id="r119675a1">
                    <label>1</label>Koltzov Institute of Developmental Biology, Russian Academy of Sciences, Moscow, Russian Federation</aff>
                <aff id="r119675a2">
                    <label>2</label>Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Pirogov Russian National Research Medical University, Moscow, Russian Federation</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>31</day>
                <month>1</month>
                <year>2022</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2022 Dashinimaev EB</copyright-statement>
                <copyright-year>2022</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport119675" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.75321.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>reject</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>
                <bold>Major comments:</bold> 
                <list list-type="bullet">
                    <list-item>
                        <p>On the whole, the article is unclear. I have no doubt that some thoughtful and valuable work has been done in which interesting data have been obtained, but in this manner, the entire work is presented incomprehensibly.&#x00a0;</p>
                    </list-item>
                    <list-item>
                        <p>The abstract and introduction chapters are well written.&#x00a0;</p>
                    </list-item>
                    <list-item>
                        <p>The Results chapter is poorly written and unclear, especially since the Methods chapter, which describes the proposed Pipeline, has moved behind the Results chapter. Considering that the main point of the article is the publication of the pipeline under development, perhaps it makes sense to eliminate the Methods chapter and move it to the Results chapter, taking into account the logic of the narrative. For example, the fact that the main result of the applied method was Table 1 becomes clear only at the end of the article from the Discussion chapter</p>
                    </list-item>
                    <list-item>
                        <p>The Discussion chapter was written extremely poorly. I suggest that this chapter should include references to similar published works (pipelines) from other teams in the field, and discuss their similarities and differences from your work (pipelines). It is also necessary to discuss the results in terms of the value of the data obtained, their potential applications in different fields of science and biotechnology. It is also necessary to highlight the disadvantages and limitations of your method and possible ways to solve them.&#x00a0;</p>
                    </list-item>
                </list> </p>
            <p> 
                <bold>Minor comments:</bold> 
                <list list-type="bullet">
                    <list-item>
                        <p>Fig.2. - missing the transcription of the abbreviations of cell type names.&#x00a0;</p>
                    </list-item>
                    <list-item>
                        <p>Perhaps, based on the logic of the narrative, swap Fig. 2 and Fig. 3.</p>
                    </list-item>
                    <list-item>
                        <p>The text contains a lot of abbreviations that are not explained in any way - for example TFBS, ATAC-seq, scATAC-seq, GSEA, DEG, Monocle2, PWM, HOCOMOCO, etc.</p>
                        <p> </p>
                        <p> Of course, for narrow specialists these acronyms make sense, but one of the tasks of scientific publications is to convey information to a wider audience in the most accessible way possible. All the more so given the multidisciplinary nature of F1000Research.</p>
                    </list-item>
                    <list-item>
                        <p>"Pluripotency" is better replaced with "stemness". 
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-119675-1">1</xref>
                            </sup>&#x00a0;</p>
                    </list-item>
                </list> </p>
            <p> 
                <bold>Conclusion:</bold>&#x00a0; 
                <list list-type="bullet">
                    <list-item>
                        <p>I&#x00a0;believe that the article requires significant revisions and rewrites.&#x00a0;</p>
                    </list-item>
                </list>
            </p>
            <p>Is the rationale for developing the new method (or application) clearly explained?</p>
            <p>Partly</p>
            <p>Is the description of the method technically sound?</p>
            <p>Partly</p>
            <p>Are the conclusions about the method and its performance adequately supported by the findings presented in the article?</p>
            <p>No</p>
            <p>If any results are presented, are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>No</p>
            <p>Are sufficient details provided to allow replication of the method development and its use by others?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>Cell biology, human cell reprogramming, regenerative biomedicine</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.</p>
        </body>
        <back>
            <ref-list>
                <title>References</title>
                <ref id="rep-ref-119675-1">
                    <label>1</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>&#x2018;Stemness&#x2019;</article-title>.<year>2014</year>;
                        <elocation-id>10.1016/B978-0-12-409503-8.00002-0</elocation-id>
                        <fpage>7</fpage>-<lpage>17</lpage>
                        <pub-id pub-id-type="doi">10.1016/B978-0-12-409503-8.00002-0</pub-id>
                    </mixed-citation>
                </ref>
            </ref-list>
        </back>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report119678">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.79178.r119678</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Boeva</surname>
                        <given-names>Valentiva</given-names>
                    </name>
                    <xref ref-type="aff" rid="r119678a1">1</xref>
                    <role>Referee</role>
                </contrib>
                <contrib contrib-type="author">
                    <name>
                        <surname>Gunz</surname>
                        <given-names>Samuel</given-names>
                    </name>
                    <xref ref-type="aff" rid="r119678a1">1</xref>
                    <role>Co-referee</role>
                </contrib>
                <aff id="r119678a1">
                    <label>1</label>Department of Computer Science, Institute for Machine Learning, ETH Zurich, Zurich, Switzerland</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>27</day>
                <month>1</month>
                <year>2022</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2022 Boeva V and Gunz S</copyright-statement>
                <copyright-year>2022</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport119678" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.75321.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>reject</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>
                <bold>Rationale:</bold>
            </p>
            <p> </p>
            <p> The rationale of developing this method is clearly stated. The authors identify experimental work to be the bottleneck in identifying core transcription factors (TFs) and therefore suggest a computational approach to solve this challenge. However, the authors do not state if there are existing methods in the field that address this problem. This information should be included, also in case there are similar methods that are related to the problem of identifying core TFs.</p>
            <p> </p>
            <p> 
                <bold>Methods:</bold>
            </p>
            <p> </p>
            <p> The described methodology is overall understandable and seems technically sound. Nevertheless, parts of the described methods are too short and should be further specified. It includes 
                <list list-type="order">
                    <list-item>
                        <p>Details on the construction of the MST (what were exactly the input data? Some details on the algorithm or a reference would be useful too).</p>
                    </list-item>
                    <list-item>
                        <p>More details on how GSEA was adjusted in this method. What is the input into GSEA? Why is it applied?</p>
                    </list-item>
                    <list-item>
                        <p>Details on the parsing of the obtained TF rankings (e.g. if a threshold was used).</p>
                    </list-item>
                </list> </p>
            <p> No information is given about data pre-processing requirements (scRNA-seq, scATAC-seq) to use the presented method. If the authors want their method to be used on other datasets this information should be included.</p>
            <p> </p>
            <p> Data used in this study includes &#x2018;A Single-Cell Atlas of in vivo Mammalian Chromatin Accessibility&#x2019; (GSE111586) which was obtained from mice. An explanation why this data was used should be included.</p>
            <p> </p>
            <p> 
                <bold>Reproducibility:</bold>
            </p>
            <p> </p>
            <p> The analysis of this manuscript cannot be reproduced for a number of reasons. First, the code cannot be found from the link of the GitHub repository. (I assume that the correct link should be 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/annykay/transFactorsPrediction-">https://github.com/annykay/transFactorsPrediction-</ext-link> ) Second, major parts of the analysis are missing in the GitHub repository, as well as in the 
                <ext-link ext-link-type="uri" xlink:href="http://doi.org/10.5281/zenodo.5799254">archived code repository</ext-link>.&#x00a0;The missing parts include the construction of the MST, TF filtration and TFBS annotation and TF ranking via GSEA-like enrichment analysis and parsing of the obtained TF rankings.</p>
            <p> </p>
            <p> In addition, there were a number of issues to reproduce the figures using the scripts that were available in the 
                <ext-link ext-link-type="uri" xlink:href="http://doi.org/10.5281/zenodo.5799254">archived code repository</ext-link>. 
                <list list-type="order">
                    <list-item>
                        <p>Data referenced for Figure 3 (A) (GSE74912) do not correspond to scATAC-seq but to bulk ATAC-seq data.</p>
                    </list-item>
                    <list-item>
                        <p>The data referenced for Figure 3 (B) scRNA-seq are not retrievable from the accession number the authors specify. Another accession number was listed in the code (GSE74246) which seems to corresponds to bulk RNA-seq (not scRNA-seq) data.</p>
                    </list-item>
                    <list-item>
                        <p>A random seed for the UMAP algorithm and other parameters should be specified in the scripts such that the results are reproducible.</p>
                    </list-item>
                </list> </p>
            <p> 
                <bold>Results and discussion:</bold>
            </p>
            <p> </p>
            <p> The main result of this study is a table of important TFs in hematopoietic cell differentiation. It is not clear why only these 5 TFs are reported (see comment on parsing of the obtained TF rankings). Moreover, the result is not clearly put into context in the discussion. Questions that have to be discussed include: 
                <list list-type="order">
                    <list-item>
                        <p>Are the reported TFs all TFs involved in hematopoietic cell differentiation known from the literature?</p>
                    </list-item>
                    <list-item>
                        <p>Are TFs involved in hematopoietic differentiation absent in the final enrichment list, if yes how many and what could be the reason that the method did not identify them?</p>
                    </list-item>
                    <list-item>
                        <p>Were some core TFs newly discovered using this method?</p>
                    </list-item>
                    <list-item>
                        <p>Can alternative methods, if they exist, retrieve these TFs?</p>
                    </list-item>
                </list> </p>
            <p> Given the title, the authors claim that they can identify core TFs for cell reprograming. It is clear why the same idea could in principle be used in both cell reprograming and differentiation. However, the method is not applied to identify core TFs in cell reprograming in the manuscript. While the authors claim that their method can determine core TFs for cell differentiation this does not directly imply that the method can also be used to determine core TFs for cell reprogramming. If this result should be included in the title and the conclusion, results supporting this claim must be shown in the manuscript.</p>
            <p> </p>
            <p> 
                <bold>Other comments:</bold> 
                <list list-type="bullet">
                    <list-item>
                        <p>There is a typo in referencing slingshot in paragraph Differential accessibility analysis.</p>
                    </list-item>
                    <list-item>
                        <p>The links to the Gene Expression Omnibus do not always work.</p>
                    </list-item>
                </list>
            </p>
            <p>Is the rationale for developing the new method (or application) clearly explained?</p>
            <p>Yes</p>
            <p>Is the description of the method technically sound?</p>
            <p>Yes</p>
            <p>Are the conclusions about the method and its performance adequately supported by the findings presented in the article?</p>
            <p>Partly</p>
            <p>If any results are presented, are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>No</p>
            <p>Are sufficient details provided to allow replication of the method development and its use by others?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>Epigenetics, transcriptional control, bioinformatics</p>
            <p>We confirm that we have read this submission and believe that we have an appropriate level of expertise to state that we do not consider it to be of an acceptable scientific standard, for reasons outlined above.</p>
        </body>
    </sub-article>
</article>
