<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="other" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.17976.2</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Software Tool Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>TFutils: Data structures for transcription factor bioinformatics</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 2; peer review: 2 approved, 1 not approved]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Stubbs</surname>
                        <given-names>Benjamin J.</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Gopaulakrishnan</surname>
                        <given-names>Shweta</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Glass</surname>
                        <given-names>Kimberly</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Pochet</surname>
                        <given-names>Nathalie</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Funding Acquisition</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-1820-5656</uri>
                    <xref ref-type="aff" rid="a2">2</xref>
                    <xref ref-type="aff" rid="a3">3</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Everaert</surname>
                        <given-names>Celine</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-7772-4259</uri>
                    <xref ref-type="aff" rid="a2">2</xref>
                    <xref ref-type="aff" rid="a3">3</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Raby</surname>
                        <given-names>Benjamin</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Funding Acquisition</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a4">4</xref>
                </contrib>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Carey</surname>
                        <given-names>Vincent</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Funding Acquisition</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-4046-0063</uri>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA</aff>
                <aff id="a2">
                    <label>2</label>Broad Institute, Cambridge, MA, 02142, USA</aff>
                <aff id="a3">
                    <label>3</label>Department of Neurology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA</aff>
                <aff id="a4">
                    <label>4</label>Pulmonary Genetics Center, Children's Hospital Boston, Boston, MA, 02115, USA</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:stvjc@channing.harvard.edu">stvjc@channing.harvard.edu</email>
                </corresp>
                <fn fn-type="conflict">
                    <p>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>17</day>
                <month>5</month>
                <year>2019</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2019</year>
            </pub-date>
            <volume>8</volume>
            <elocation-id>152</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>8</day>
                    <month>5</month>
                    <year>2019</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2019 Stubbs BJ et al.</copyright-statement>
                <copyright-year>2019</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/8-152/pdf"/>
            <abstract>
                <p>DNA transcription is intrinsically complex. Bioinformatic work with transcription factors (TFs) is complicated by a multiplicity of data resources and annotations. The Bioconductor package TFutils includes data structures and functions to enhance the precision and utility of integrative analyses that have components involving TFs. TFutils provides catalogs of human TFs from three reference sources (CISBP, HOCOMOCO, and GO), a catalog of TF targets derived from MSigDb, and multiple approaches to enumerating TF binding sites, including an interface to results of 690 ENCODE experiments. Aspects of integration of TF binding patterns and genome-wide association study results are explored in examples.</p>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>Transcription factors</kwd>
                <kwd>Gene expression</kwd>
                <kwd>Gene regulation</kwd>
                <kwd>Bioconductor</kwd>
            </kwd-group>
            <funding-group>
                <award-group id="fund-1" xlink:href="http://dx.doi.org/10.13039/100000060">
                    <funding-source>National Institute of Allergy and Infectious Diseases</funding-source>
                    <award-id>R03AI131066</award-id>
                </award-group>
                <award-group id="fund-2" xlink:href="http://dx.doi.org/10.13039/100000050">
                    <funding-source>National Heart, Lung, and Blood Institute</funding-source>
                    <award-id>R01HL118455</award-id>
                </award-group>
                <award-group id="fund-3" xlink:href="http://dx.doi.org/10.13039/100000054">
                    <funding-source>National Cancer Institute</funding-source>
                    <award-id>U01CA214846</award-id>
                    <award-id>U24CA180996</award-id>
                    <award-id>R21CA209940</award-id>
                </award-group>
                <funding-statement>Support for the development of this software was provided by the National Institutes of Health [U01 CA214846 to VC, U24 CA180996], the Chan Zuckerberg Initiative [DAF 2018-183436 to VC, R01 NHLBI HL118455 to BR] and NIH/NCI/ITCR R21 CA209940, NIH/NIAID R03 AI131066, U01 CA214846 collaborative set aside to NP.</funding-statement>
                <funding-statement>
                    <italic>The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</italic>
                </funding-statement>
            </funding-group>
        </article-meta>
        <notes>
            <sec sec-type="version-changes">
                <label>Revised</label>
                <title>Amendments from Version 1</title>
                <p>The two reviewer reports raised valid concerns related to the clarity of presentation. A reviewer has noted that the interpretation of figure 3 is difficult, and we concur. Additional work on the relationship between sequence-based and in vitro evidence of TF binding, specifically with respect to the combinatorial aspects of binding suggested by Figure 3, is warranted. To demonstrate interplay of TFutils with existing Bioconductor tools, Figure 5 is new, and makes use of motifStack and MotifDb.</p>
            </sec>
        </notes>
    </front>
    <body>
        <sec sec-type="intro">
            <title>Introduction</title>
            <p>A central concern of genome biology is improving understanding of gene transcription. In simple terms, transcription factors (TFs) are proteins that bind to DNA, typically near gene promoter regions. The role of TFs in gene expression variation is of great interest. Progress in deciphering genetic and epigenetic processes that affect TF abundance and function will be essential in clarifying and interpreting gene expression variation patterns and their effects on phenotype. Difficulties of identifying functional binding of TFs, and opportunities for using information of TF binding in systems biology contexts, are reviewed in Lambert 
                <italic toggle="yes">et al</italic>.
                <sup>
                    <xref ref-type="bibr" rid="ref-1">1</xref>
                </sup> and Weirauch 
                <italic toggle="yes">et al</italic>.
                <sup>
                    <xref ref-type="bibr" rid="ref-2">2</xref>
                </sup>.</p>
            <p>This paper describes an R/Bioconductor package called 
                <ext-link ext-link-type="uri" xlink:href="https://bioconductor.org/packages/release/bioc/html/TFutils.html">TFutils</ext-link>, which assembles various resources intended to clarify and unify approaches to working with TF concepts in bioinformatic analysis. Computations described in this paper can be carried out with 
                <ext-link ext-link-type="uri" xlink:href="https://www.bioconductor.org/">Bioconductor</ext-link> version 3.8. The package can be installed with</p>
            <p>
                <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">
                    <styled-content style="font-size:15px;color:#000000;"># use install.packages("BiocManager") if not already available
library(BiocManager)
install("TFutils")</styled-content>
                </preformat>
            </p>
            <p>In the next section we describe the basic concepts of enumerating and classifying TFs, enumerating TF targets, and representing genome-wide quantification of TF binding affinity. This is followed by a review of the key data structures and functions provided in the package, and an example in cancer informatics.</p>
            <p>The present paper does not deal directly with the manipulation or interpretation of sequence motifs. An excellent Bioconductor package that synthesizes many approaches to these tasks is 
                <italic toggle="yes">
                    <ext-link ext-link-type="uri" xlink:href="https://bioconductor.org/packages/3.9/bioc/html/universalmotif.html">universalmotif</ext-link>
                </italic>.</p>
            <p>A complete reference manual enumerating all functions and data sets in the package is available at: 
                <ext-link ext-link-type="uri" xlink:href="http://bioconductor.org/packages/release/bioc/manuals/TFutils/man/TFutils.pdf">http://bioconductor.org/packages/release/bioc/manuals/TFutils/man/TFutils.pdf</ext-link>
            </p>
        </sec>
        <sec>
            <title>Basic concepts of transcription factor bioinformatics</title>
            <sec>
                <title>Enumerating transcription factors</title>
                <p>Given the importance of the topic, it is not surprising that a number of bioinformatic research groups have published catalogs of transcription factors along with metadata about their features. Standard nomenclature for TFs has yet to be established. Gene symbols, motif sequences, and position-weight matrix catalog entries have all been used as TF identifiers.</p>
                <p>In TFutils we have gathered information from four widely used resources, focusing specifically on human TFs: 
                    <ext-link ext-link-type="uri" xlink:href="http://www.geneontology.org/">Gene Ontology</ext-link> (GO, Ashburner 
                    <italic toggle="yes">et al</italic>.
                    <sup>
                        <xref ref-type="bibr" rid="ref-3">3</xref>
                    </sup>, in which 
                    <monospace>GO:0003700</monospace> is the tag for the molecular function concept &#x201c;DNA binding transcription factor activity&#x201d;), 
                    <ext-link ext-link-type="uri" xlink:href="http://cisbp.ccbr.utoronto.ca/">CISBP</ext-link> (Catalog of Inferred Sequence Binding Preferences) (Weirauch 
                    <italic toggle="yes">et al</italic>.
                    <sup>
                        <xref ref-type="bibr" rid="ref-2">2</xref>
                    </sup>), 
                    <ext-link ext-link-type="uri" xlink:href="http://hocomoco11.autosome.ru/">HOCOMOCO</ext-link> (Homo sapiens Comprehensive Model Collection) (Kulakovskiy 
                    <italic toggle="yes">et al</italic>.
                    <sup>
                        <xref ref-type="bibr" rid="ref-4">4</xref>
                    </sup>), and the &#x201c;c3 TFT (transcription factor target)&#x201d; signature set of 
                    <ext-link ext-link-type="uri" xlink:href="http://software.broadinstitute.org/gsea/msigdb">MSigDb</ext-link> (Molecular Signatures Database) (Subramanian 
                    <italic toggle="yes">et al</italic>.
                    <sup>
                        <xref ref-type="bibr" rid="ref-5">5</xref>
                    </sup>). 
                    <xref ref-type="fig" rid="f1">Figure 1</xref> depicts the sizes of these catalogs, measured using counts of unique HGNC gene symbols. The enumeration for GO uses Bioconductor&#x2019;s 
                    <italic toggle="yes">
                        <ext-link ext-link-type="uri" xlink:href="https://bioconductor.org/packages/3.9/data/annotation/html/org.Hs.eg.db.html">org.Hs.eg.db</ext-link>
                    </italic> (version 3.7.0) package to find direct associations from 
                    <monospace>GO:0003700</monospace> to HGNC symbols. The enumeration for MSigDb is heuristic and involves parsing the gene set identifiers used in MSigDb for exact or close matches to HGNC symbols. For CISBP and HOCOMOCO, the associated web servers provide easily parsed tabular catalogs.</p>
                <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                    <label>Figure 1. </label>
                    <caption>
                        <title>Sizes of transcription factor (TF) catalogs and of intersections based on HGNC (HUGO Gene Nomenclature Committee) symbols for TFs.</title>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/21137/2eaee588-dcbf-45a5-a5aa-b41d899b4b6a_figure1.gif"/>
                </fig>
            </sec>
            <sec>
                <title>Classification of transcription factors</title>
                <p>As noted by Weirauch 
                    <italic toggle="yes">et al</italic>.
                    <sup>
                        <xref ref-type="bibr" rid="ref-2">2</xref>
                    </sup>, interpretation of the &#x201c;function and evolution of DNA sequences&#x201d; is dependent on the analysis of sequence-specific DNA binding domains. These domains are dynamic and cell-type specific (Gertz 
                    <italic toggle="yes">et al</italic>.
                    <sup>
                        <xref ref-type="bibr" rid="ref-6">6</xref>
                    </sup>). Classifying TFs according to features of the binding domain is an ongoing process of increasing intricacy. 
                    <xref ref-type="fig" rid="f2">Figure 2</xref> shows excerpts of hierarchies of terms related to TF type derived from GO (on the left) and 
                    <ext-link ext-link-type="uri" xlink:href="http://tfclass.bioinf.med.uni-goettingen.de/">TFclass</ext-link> (Wingender 
                    <italic toggle="yes">et al</italic>.
                    <sup>
                        <xref ref-type="bibr" rid="ref-7">7</xref>
                    </sup>). There is a disagreement between our enumeration of TFs based on GO in 
                    <xref ref-type="fig" rid="f1">Figure 1</xref> and the 1919 shown in AmiGO, as the latter includes a broader collection of receptor activities.</p>
                <fig fig-type="figure" id="f2" orientation="portrait" position="float">
                    <label>Figure 2. </label>
                    <caption>
                        <title>Screenshots of AmiGO and TFClass hierarchy excerpts.</title>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/21137/2eaee588-dcbf-45a5-a5aa-b41d899b4b6a_figure2.gif"/>
                </fig>
                <p>
                    <xref ref-type="table" rid="T1">Table 1</xref> provides examples of frequently encountered TF classifications in the CISBP and HOCOMOCO catalogs. The numerical components of the HOCOMOCO classes correspond to TFClass subfamilies (Wingender 
                    <italic toggle="yes">et al</italic>.
                    <sup>
                        <xref ref-type="bibr" rid="ref-7">7</xref>
                    </sup>).</p>
                <table-wrap id="T1" orientation="portrait" position="anchor">
                    <label>Table 1. </label>
                    <caption>
                        <title>Most frequently represented transcription factor (TF) classes in CISBP and HOCOMOCO. </title>
                        <p>The number of unique human TF_Name entries in CISBP is 1734.  The number of unique Transcription factor entries in HOCOMOCO (Sept. 2018 version) is 678. Entries in columns Nc (Nh) are numbers of distinct TFs annotated to classes in columns CISBP (HO-COMOCO) respectively. Entries are ordered top to bottom by frequency of occurrence. There is no substantive correspondence between entries on a given row. Harmonization of class terminology is beyond the scope of this paper.</p>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1">CISBP</th>
                                <th align="right" colspan="1" rowspan="1">Nc</th>
                                <th align="left" colspan="1" rowspan="1">HOCOMOCO</th>
                                <th align="right" colspan="1" rowspan="1">Nh</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">C2H2 ZF</td>
                                <td align="right" colspan="1" rowspan="1">655</td>
                                <td align="left" colspan="1" rowspan="1">More than 3 adjacent zinc finger factors{2.3.3}</td>
                                <td align="right" colspan="1" rowspan="1">106</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">Homeodomain</td>
                                <td align="right" colspan="1" rowspan="1">199</td>
                                <td align="left" colspan="1" rowspan="1">HOX-related factors{3.1.1}</td>
                                <td align="right" colspan="1" rowspan="1">41</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">bHLH</td>
                                <td align="right" colspan="1" rowspan="1">104</td>
                                <td align="left" colspan="1" rowspan="1">NK-related factors{3.1.2}</td>
                                <td align="right" colspan="1" rowspan="1">36</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">bZIP</td>
                                <td align="right" colspan="1" rowspan="1">66</td>
                                <td align="left" colspan="1" rowspan="1">Paired-related HD factors{3.1.3}</td>
                                <td align="right" colspan="1" rowspan="1">35</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">Unknown</td>
                                <td align="right" colspan="1" rowspan="1">49</td>
                                <td align="left" colspan="1" rowspan="1">Factors with multiple dispersed zinc fingers{2.3.4}</td>
                                <td align="right" colspan="1" rowspan="1">30</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">Forkhead</td>
                                <td align="right" colspan="1" rowspan="1">48</td>
                                <td align="left" colspan="1" rowspan="1">Forkhead box (FOX) factors{3.3.1}</td>
                                <td align="right" colspan="1" rowspan="1">27</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">Sox</td>
                                <td align="right" colspan="1" rowspan="1">48</td>
                                <td align="left" colspan="1" rowspan="1">Ets-related factors{3.5.2}</td>
                                <td align="right" colspan="1" rowspan="1">25</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">Nuclear receptor</td>
                                <td align="right" colspan="1" rowspan="1">46</td>
                                <td align="left" colspan="1" rowspan="1">Three-zinc finger Krueppel-related factors{2.3.1}</td>
                                <td align="right" colspan="1" rowspan="1">20</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">Myb/SANT</td>
                                <td align="right" colspan="1" rowspan="1">30</td>
                                <td align="left" colspan="1" rowspan="1">POU domain factors{3.1.10}</td>
                                <td align="right" colspan="1" rowspan="1">18</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">Ets</td>
                                <td align="right" colspan="1" rowspan="1">27</td>
                                <td align="left" colspan="1" rowspan="1">Tal-related factors{1.2.3}</td>
                                <td align="right" colspan="1" rowspan="1">18</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
            </sec>
            <sec>
                <title>Enumerating TF targets</title>
                <p>The Broad Institute MSigDb (Subramanian 
                    <italic toggle="yes">et al</italic>.
                    <sup>
                        <xref ref-type="bibr" rid="ref-5">5</xref>
                    </sup>) includes a gene set collection devoted to cataloging TF targets. We have used Bioconductor&#x2019;s 
                    <italic toggle="yes">
                        <ext-link ext-link-type="uri" xlink:href="https://bioconductor.org/packages/3.9/bioc/html/GSEABase.html">GSEABase</ext-link>
                    </italic> package (version 1.45.0) to import and serialize the 
                    <monospace>gmt</monospace> representation of this collection.</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">
                        <styled-content style="font-size:15px;color:#000000;">TFutils</styled-content>
                        <styled-content style="font-size:15px;color:#CF5C00;">::</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">tftColl

## GeneSetCollection
##   names: AAANWWTGC_UNKNOWN, AAAYRNCTG_UNKNOWN, ..., GCCATNTTG_YY1_Q6 (615 total)
##   unique identifiers: 4208, 481, ..., 56903 (12774 total)
##   types in collection:
##     geneIdType: EntrezIdentifier (1 total)
##     collectionType: NullCollection (1 total)</styled-content>
                    </preformat>
                </p>
                <p>Names of TFs for which target sets are assembled are encoded in a systematic way, with underscores separating substrings describing motifs, genes, and versions. Some peculiarity in nomenclature in the MSigDb labels can be observed:</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">
                        <styled-content style="font-size:15px;color:#214A87;">grep</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">(</styled-content>
                        <styled-content style="font-size:15px;color:#4F9905;">"NFK"</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">,</styled-content> 
                        <styled-content style="font-size:15px;color:#214A87;">names</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">(TFutils</styled-content>
                        <styled-content style="font-size:15px;color:#CF5C00;">::</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">tftColl),</styled-content> 
                        <styled-content style="font-size:15px;color:#214A87;">value=</styled-content>
                        <styled-content style="font-size:15px;color:#8F5903;">TRUE</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">)</styled-content>


                        <styled-content style="font-size:15px;color:#000000;">## [1] "NFKAPPAB65_01"         "NFKAPPAB_01"          "NFKB_Q6"
## [4] "NFKB_C"                "NFKB_Q6_01"           "GGGNNTTTCC_NFKB_Q6_01"</styled-content>
                    </preformat>
                </p>
                <p>Manual curation will be needed to improve the precision with which MSigDb TF target sets can be associated with specific TFs or motifs.</p>
            </sec>
            <sec>
                <title>Quantitative predictions of TF binding affinities</title>
                <p>In this subsection we address representation of putative binding sites. First we illustrate how to represent sequence-based affinity measures and the binding site locations implied by these. We then discuss use of results of ChIP-seq experiments for cell-type-specific binding site enumeration.</p>
                <p>
                    <bold>Affinity scores based on reference sequence.</bold> The 
                    <ext-link ext-link-type="uri" xlink:href="http://meme-suite.org/tools/fimo">FIMO</ext-link> algorithm of the MEME suite (Grant 
                    <italic toggle="yes">et al</italic>.
                    <sup>
                        <xref ref-type="bibr" rid="ref-8">8</xref>
                    </sup>) was used to score the human reference genome for TF binding affinity for 689 motif matrices to which genes are associated. Full details of the execution of FIMO are provided in Sonawane 
                    <italic toggle="yes">et al</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref-9">9</xref>
                    </sup>. Sixteen (16) tabix-indexed BED files are lodged in an AWS S3 bucket for illustration purposes.</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">
                        <styled-content style="font-size:15px;color:#214A87;">library</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">(GenomicFiles)</styled-content>

                        <styled-content style="font-size:15px;color:#214A87;">data</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">(fimo16)</styled-content>

                        <styled-content style="font-size:15px;color:#000000;">fimo16

## GenomicFiles object with 0 ranges and 16 files:
## files: M0635_1.02sort.bed.gz, M3433_1.02sort.bed.gz, ..., M6159_1.02sort.bed.gz, M6497_1.02sort.bed.
## detail: use files(), rowRanges(), colData(), ...</styled-content>
                    </preformat>
</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">
                        <styled-content style="font-size:15px;color:#214A87;">head</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">(</styled-content>
                        <styled-content style="font-size:15px;color:#214A87;">colData</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">(fimo16))

## DataFrame with 6 rows and 2 columns
##          Mtag        HGNC
##   &lt;character&gt; &lt;character&gt;
## 1     M0635_1      DMRTC2
## 2     M3433_1       HOXA3
## 3     M3467_1        IRF1
## 4     M3675_1      POU2F1
## 5     M3698_1        TP53
## 6     M3966_1       STAT1</styled-content>
                    </preformat>
                </p>
                <p>We harvest scores in a genomic interval of interest (bound to 
                    <monospace>fimo16</monospace> in the 
                    <monospace>rowRanges</monospace> assignment below) using 
                    <monospace>reduceByFile</monospace>. This yields a list with one element per file. Each such element holds a list of 
                    <monospace>scanTabix</monospace> results, one per query range.</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">
                        <styled-content style="font-size:15px;color:#214A87;">library</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">(BiocParallel)</styled-content>

                        <styled-content style="font-size:15px;color:#214A87;">register</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">(</styled-content>
                        <styled-content style="font-size:15px;color:#214A87;">SerialParam</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">())</styled-content> 
                        <styled-content style="font-size:15px;color:#8F5903;"># important for macosx?</styled-content>

                        <styled-content style="font-size:15px;color:#214A87;">rowRanges</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">(fimo16) =</styled-content> 
                        <styled-content style="font-size:15px;color:#214A87;">GRanges</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">(</styled-content>
                        <styled-content style="font-size:15px;color:#4F9905;">"chr17"</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">,</styled-content> 
                        <styled-content style="font-size:15px;color:#214A87;">IRanges</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">(</styled-content>
                        <styled-content style="font-size:15px;color:#0000CF;">38.077e6</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">,</styled-content> 
                        <styled-content style="font-size:15px;color:#0000CF;">38.084e6</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">))
rr = GenomicFiles</styled-content>
                        <styled-content style="font-size:15px;color:#CF5C00;">::</styled-content>
                        <styled-content style="font-size:15px;color:#214A87;">reduceByFile</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">(fimo16,</styled-content> 
                        <styled-content style="font-size:15px;color:#214A87;">MAP=function</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">(r,f)</styled-content>
  
                        <styled-content style="font-size:15px;color:#214A87;">scanTabix</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">(f,</styled-content> 
                        <styled-content style="font-size:15px;color:#214A87;">param=</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">r))</styled-content>
                    </preformat>
                </p>
                <p>scanTabix produces a list of vectors of text strings, which we parse with 
                    <monospace>data.table::fread</monospace>. The resulting tables are then reduced to a genomic location and -log10 of the p-value derived from the binding affinity statistic of FIMO in the vicinity of that location.</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">

                        <styled-content style="font-size:15px;color:#000000;">asdf =</styled-content> 
                        <styled-content style="font-size:15px;color:#214A87;">function</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">(x) data.table</styled-content>
                        <styled-content style="font-size:15px;color:#CF5C00;">::</styled-content>
                        <styled-content style="font-size:15px;color:#214A87;">fread</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">(</styled-content>
                        <styled-content style="font-size:15px;color:#214A87;">paste0</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">(x,</styled-content> 
                        <styled-content style="font-size:15px;color:#214A87;">collapse=</styled-content>
                        <styled-content style="font-size:15px;color:#4F9905;">"\n"</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">),</styled-content> 
                        <styled-content style="font-size:15px;color:#214A87;">header=</styled-content>
                        <styled-content style="font-size:15px;color:#8F5903;">FALSE</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">)
gg =</styled-content> 
                        <styled-content style="font-size:15px;color:#214A87;">lapply</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">(rr,</styled-content> 
                        <styled-content style="font-size:15px;color:#214A87;">function</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">(x) {</styled-content>
        
                        <styled-content style="font-size:15px;color:#000000;">tmp =</styled-content> 
                        <styled-content style="font-size:15px;color:#214A87;">asdf</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">(x[[</styled-content>
                        <styled-content style="font-size:15px;color:#0000CF;">1</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">]][[</styled-content>
                        <styled-content style="font-size:15px;color:#0000CF;">1</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">]])</styled-content>
        
                        <styled-content style="font-size:15px;color:#214A87;">data.frame</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">(</styled-content>
                        <styled-content style="font-size:15px;color:#214A87;">loc=</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">tmp</styled-content>
                        <styled-content style="font-size:15px;color:#CF5C00;">$</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">V2,</styled-content> 
                        <styled-content style="font-size:15px;color:#214A87;">score=</styled-content>
                        <styled-content style="font-size:15px;color:#CF5C00;">-</styled-content>
                        <styled-content style="font-size:15px;color:#214A87;">log10</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">(tmp</styled-content>
                        <styled-content style="font-size:15px;color:#CF5C00;">$</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">V7))</styled-content>
      
                        <styled-content style="font-size:15px;color:#000000;">})</styled-content>

                        <styled-content style="font-size:15px;color:#214A87;">for</styled-content> 
                        <styled-content style="font-size:15px;color:#000000;">(i</styled-content> 
                        <styled-content style="font-size:15px;color:#214A87;">in</styled-content> 
                        <styled-content style="font-size:15px;color:#0000CF;">1</styled-content>
                        <styled-content style="font-size:15px;color:#CF5C00;">:</styled-content>
                        <styled-content style="font-size:15px;color:#214A87;">length</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">(gg))  gg[[i]]</styled-content>
                        <styled-content style="font-size:15px;color:#CF5C00;">$</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">tf =</styled-content> 
                        <styled-content style="font-size:15px;color:#214A87;">colData</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">(fimo16)[i,</styled-content>
                        <styled-content style="font-size:15px;color:#0000CF;">2</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">]</styled-content>
                    </preformat>
                </p>
                <p>It turns out there are too many distinct TFs to display names individually, so we label the scores with the names of the associated TF families as defined in CISBP.</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">

                        <styled-content style="font-size:15px;color:#000000;">matchcis =</styled-content> 
                        <styled-content style="font-size:15px;color:#214A87;">match</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">(</styled-content>
                        <styled-content style="font-size:15px;color:#214A87;">colData</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">(fimo16)[,</styled-content>
                        <styled-content style="font-size:15px;color:#0000CF;">2</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">], cisbpTFcat[,</styled-content>
                        <styled-content style="font-size:15px;color:#0000CF;">2</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">])
famn = cisbpTFcat[matchcis,]</styled-content>
                        <styled-content style="font-size:15px;color:#CF5C00;">$</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">Family_Name</styled-content>

                        <styled-content style="font-size:15px;color:#214A87;">for</styled-content> 
                        <styled-content style="font-size:15px;color:#000000;">(i</styled-content> 
                        <styled-content style="font-size:15px;color:#214A87;">in</styled-content> 
                        <styled-content style="font-size:15px;color:#0000CF;">1</styled-content>
                        <styled-content style="font-size:15px;color:#CF5C00;">:</styled-content>
                        <styled-content style="font-size:15px;color:#214A87;">length</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">(gg))  gg[[i]]</styled-content>
                        <styled-content style="font-size:15px;color:#CF5C00;">$</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">tffam = famn[i]
nn =</styled-content> 
                        <styled-content style="font-size:15px;color:#214A87;">do.call</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">(rbind, gg)</styled-content>
                    </preformat>
                </p>
                <p>A simple display of 
                    <italic toggle="yes">predicted</italic> TF binding affinity near the gene 
                    <italic toggle="yes">ORMDL3</italic> is provided in 
                    <xref ref-type="fig" rid="f3">Figure 3</xref>.</p>
                <fig fig-type="figure" id="f3" orientation="portrait" position="float">
                    <label>Figure 3. </label>
                    <caption>
                        <title>TF binding in the vicinity of gene 
                            <italic toggle="yes">ORMDL3</italic>.</title>
                        <p>Points are -log10-transformed FIMO-based p-values colored according to TF class as annotated in CISBP. Segments at bottom of plot are transcribed regions of 
                            <italic toggle="yes">ORMDL3</italic> according to UCSC gene models in build hg19.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/21137/2eaee588-dcbf-45a5-a5aa-b41d899b4b6a_figure3.gif"/>
                </fig>
                <p>
                    <bold>TF binding predictions based on ChIP-seq data from ENCODE.</bold> The ENCODE project provides BED-formatted reports on ChIP-seq experiments for many combinations of cell type and DNA-binding factors. TFutils includes a table 
                    <monospace>encode690</monospace> that gives information on 690 experiments involving pairs formed from 91 cell lines and 161 TFs for which results have been recorded as GRanges instances that can be acquired with the 
                    <italic toggle="yes">
                        <ext-link ext-link-type="uri" xlink:href="https://bioconductor.org/packages/3.9/bioc/html/AnnotationHub.html">AnnotationHub</ext-link>
                    </italic> (version 2.15.4) package. Positional relationships between cell-type specific binding sites and genomic features can be investigated. An illustration is given in 
                    <xref ref-type="fig" rid="f4">Figure 4</xref>, in which is it suggested that in HepG2 cells, CEBPB exhibits a distinctive pattern of binding in the vicinity of 
                    <italic toggle="yes">ORMDL3</italic>.</p>
                <fig fig-type="figure" id="f4" orientation="portrait" position="float">
                    <label>Figure 4. </label>
                    <caption>
                        <title>Binding of CEBPB in the vicinity of 
                            <italic toggle="yes">ORMDL3</italic> derived from ChIP-seq experiments in four cell lines reported by ENCODE.</title>
                        <p>Colored rectangles at top are regions identified as narrow binding peaks, arrows in bottom half are exons in 
                            <italic toggle="yes">ORMDL3</italic>. Arrows sharing a common vertical position are members of the same transcript as cataloged in Ensembl version 75.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/21137/2eaee588-dcbf-45a5-a5aa-b41d899b4b6a_figure4.gif"/>
                </fig>
            </sec>
            <sec>
                <title>Visualization of motif relationships in a family of transcription factors </title>
                <p>Inspired by a referee&#x2019;s suggestion, we created functions that couple the HOCOMOCO TFclass enumeration with Bioconductor&#x2019;s MotifDb
                    <sup>
                        <xref ref-type="bibr" rid="ref-10">10</xref>
                    </sup> and motifStack
                    <sup>
                        <xref ref-type="bibr" rid="ref-11">11</xref>
                    </sup> package resources. 
                    <xref ref-type="fig" rid="f5">Figure 5</xref> is the output of example(tffamCirc.plot), available in version 1.5.1 of TFutils.</p>
                <fig fig-type="figure" id="f5" orientation="portrait" position="float">
                    <label>Figure 5. </label>
                    <caption>
                        <title>A circos display of motifs of transcription factors in the TFclass 3.1.3 (paired-related homeodomain factors).</title>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/21137/2eaee588-dcbf-45a5-a5aa-b41d899b4b6a_figure5.gif"/>
                </fig>
            </sec>
            <sec>
                <title>Summary</title>
                <p>We have compared enumerations of human transcription factors by different projects, provided access to two forms of binding domain classification, and illustrated the use of cloud-resident genome-wide binding predictions. In the next section we review selected details of data structures and methods of the 
                    <italic toggle="yes">
                        <ext-link ext-link-type="uri" xlink:href="https://bioconductor.org/packages/3.9/bioc/html/TFutils.html">TFutils</ext-link>
                    </italic> package.</p>
            </sec>
        </sec>
        <sec sec-type="methods">
            <title>Methods</title>
            <sec>
                <title>Implementation</title>
                <p>The TFutils package is designed to lower barriers to usage of key findings of TF biology in human genome research. TFutils is supplied as a conventional R package distributed with, and making use of, the Bioconductor software ecosystem. TFutils includes ready-to-use reference data, tools for visualizing binding sites, and tools that simplify integrative use of TF binding information with GWAS findings. A complete enumeration of functions and data available in the package is provided in the reference manual at 
                    <ext-link ext-link-type="uri" xlink:href="http://bioconductor.org/packages/release/bioc/manuals/TFutils/man/TFutils.pdf">http://bioconductor.org/packages/release/bioc/manuals/TFutils/man/TFutils.pdf</ext-link>
                </p>
            </sec>
            <sec>
                <title>Data resources</title>
                <p>
                    <bold>Catalogs.</bold> Two reference resources have been collected into the TFutils package as data.frame instances. These are 
                    <monospace>cisbpTFcat</monospace> (CISBP: 7592 x 28), and 
                    <monospace>hocomoco.mono.sep2018</monospace> (mononucleotide models, full catalog, 769 x 9). These data.frames are snapshots of the CISBP and HOCOMOCO catalogs.</p>
                <p>
                    <bold>Indexed BED in AWS S3.</bold> As described above 
                    <monospace>fimo16</monospace> provides programmatic access to FIMO scores for 16 TFs, using the 
                    <italic toggle="yes">
                        <ext-link ext-link-type="uri" xlink:href="https://bioconductor.org/packages/3.9/GenomicFiles">GenomicFiles</ext-link>
                    </italic> (version 1.19.0) protocol.</p>
                <p>
                    <bold>Annotated reference to ENCODE ChIP-seq results.</bold> 
                    <monospace>encode690</monospace> simplifies programmatic access to TF:cell-line combinations available in Bioconductor 
                    <italic toggle="yes">
                        <ext-link ext-link-type="uri" xlink:href="https://bioconductor.org/packages/3.9/AnnotationHub">AnnotationHub</ext-link>
                    </italic> (version 2.15.4).</p>
                <p>
                    <bold>TF targets enumerated in MsigDb.</bold> The c3-TFT (TF targets) subset from MSigDb is provided as a GeneSet-Collection instance as defined in 
                    <italic toggle="yes">
                        <ext-link ext-link-type="uri" xlink:href="https://bioconductor.org/packages/3.9/GSEABase">GSEABase</ext-link>
                    </italic>.</p>
                <p>
                    <bold>Illustrative GWAS records.</bold> The full EBI/EMBL GWAS catalog is available in the 
                    <italic toggle="yes">
                        <ext-link ext-link-type="uri" xlink:href="https://bioconductor.org/packages/3.9/gwascat">gwascat</ext-link>
                    </italic> package (version 2.15.0); for convenience, an excerpt focusing on chromosome 17 is supplied with TFutils as 
                    <monospace>gwascat_hg19_chr17</monospace>.</p>
            </sec>
            <sec>
                <title>Infrastructure for interacting with components of TFutils</title>
                <p>
                    <bold>Interactive enumeration of TF targets implicated in GWAS.</bold> The 
                    <monospace>TFtargs</monospace> function runs a shiny app that permits selection of a TF in the nomenclature of the MSigDb c3/TFT gene set collection. The app will search an object provided by the 
                    <italic toggle="yes">
                        <ext-link ext-link-type="uri" xlink:href="https://bioconductor.org/packages/3.9/gwascat">gwascat</ext-link>
                    </italic> package for references in the 
                    <monospace>MAPPED_GENE</monospace> field that match the targets of the selected TF. 
                    <xref ref-type="fig" rid="f6">Figure 6</xref> gives an illustration.</p>
                <fig fig-type="figure" id="f6" orientation="portrait" position="float">
                    <label>Figure 6. </label>
                    <caption>
                        <title>TFtargs screenshot.</title>
                        <p>This example reports on recent EBI GWAS catalog hits on chromosome
17 only.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/21137/2eaee588-dcbf-45a5-a5aa-b41d899b4b6a_figure6.gif"/>
                </fig>
                <p>
                    <bold>The TFCatalog S4 class</bold>. Reference catalogs for TF biology are structured with the 
                    <monospace>TFCatalog S4</monospace> class. Two essential components for managing a catalog are the native TF identifier for the catalog and the HGNC gene symbol typically used to name the TF. The 
                    <monospace>TFCatalog</monospace> class includes a name field to name the catalog, and a character vector with elements comprised of the native identifiers for catalogued TFs.</p>
                <p>For example, CISBP uses 
                    <monospace>T004843_1.02</monospace> to refer to motifs associated with gene TFAP2B. There are five such motifs, three derived from SELEX, one from Transfac, and one from Hocomoco.</p>
                <p>A 
                    <monospace>data.frame</monospace> instance that has an obligatory column named &#x2018;HGNC&#x2019; can include any collection of fields that offer metadata about the TF in the specified catalog. Here is how we construct and view a TFCatalog object using the CISBP reference data.</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">
                        <styled-content style="font-size:15px;color:#214A87;">data</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">(cisbpTFcat)</styled-content>

                        <styled-content style="font-size:15px;color:#000000;">TFs_CISBP =</styled-content> 
                        <styled-content style="font-size:15px;color:#214A87;">TFCatalog</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">(</styled-content>
                        <styled-content style="font-size:15px;color:#214A87;">name=</styled-content>
                        <styled-content style="font-size:15px;color:#4F9905;">"CISBP.info"</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">,</styled-content>
   
                        <styled-content style="font-size:15px;color:#214A87;">nativeIds=</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">cisbpTFcat[,</styled-content>
                        <styled-content style="font-size:15px;color:#0000CF;">1</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">],</styled-content>
   
                        <styled-content style="font-size:15px;color:#214A87;">HGNCmap =</styled-content> 
                        <styled-content style="font-size:15px;color:#000000;">cisbpTFcat)
TFs_CISBP

## TFutils TFCatalog instance CISBP.info
##  7592 native Ids, including
##    T004843_1.02 ... T153733_1.02
##  1551 unique HGNC tags, including
##    TFAP2B TFAP2B ... ZNF10 ZNF350</styled-content>
                    </preformat>
                </p>
            </sec>
            <sec>
                <title>Operation: Installation</title>
                <p>The TFutils package can be installed in any version of R subsequent to 3.5.0, and therefore will be usable on Unix, Windows, or Mac platforms. The preferred method of installation employs the CRAN package BiocManager, through the R command BiocManager::install("TFutils"). All necessary dependencies will be installed through this process.</p>
            </sec>
            <sec>
                <title>Operation: Use cases</title>
                <p>In this section we consider applications of the tools in genetic epidemiology. First we look for TFs that may harbor variants associated with traits in the EBI GWAS catalog. Then we show how to enumerate traits associated with targets of a selected TF.</p>
                <p>
                    <bold>Find TFs that are direct GWAS hits for a given trait.</bold> 
                    <monospace>directHitsInCISBP</monospace> accepts a string naming a trait, and returns a data.frame of TFs identified as &#x201c;mapped genes&#x201d; for the trait, with their TF &#x201c;family name&#x201d;.</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">
                        <styled-content style="font-size:15px;color:#214A87;">library</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">(dplyr)</styled-content>

                        <styled-content style="font-size:15px;color:#214A87;">library</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">(magrittr)</styled-content>

                        <styled-content style="font-size:15px;color:#214A87;">library</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">(gwascat)</styled-content>

                        <styled-content style="font-size:15px;color:#214A87;">data</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">(ebicat37)</styled-content>

                        <styled-content style="font-size:15px;color:#214A87;">directHitsInCISBP</styled-content>(
                        <styled-content style="font-size:15px;color:#4F9905;">"Rheumatoid arthritis"</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">, ebicat37)

## Joining, by = "HGNC"

##      HGNC Family_Name
## 1  ARID5B ARID/BRIGHT
## 7   EOMES       T-box
## 15  GATA3        GATA
## 35  JAZF1     C2H2 ZF
## 37  MECP2         MBD
## 45   MTF1     C2H2 ZF
## 57    REL         Rel
## 65  STAT4        STAT
## 79   AIRE        SAND
## 82   IRF5         IRF</styled-content>
                    </preformat>
                </p>
                <p>
                    <bold>Retrieve traits mapped to genes that are targets of a given TF.</bold> 
                    <monospace>topTraitsOfTargets</monospace> will acquire the targets of a selected TF, check for hits in these genes in a given GWAS catalog instance, and tabulate the most commonly reported traits.</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">
                        <styled-content style="font-size:15px;color:#000000;">tt =</styled-content> 
                        <styled-content style="font-size:15px;color:#214A87;">topTraitsOfTargets</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">(</styled-content>
                        <styled-content style="font-size:15px;color:#4F9905;">"MTF1"</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">, TFutils</styled-content>
                        <styled-content style="font-size:15px;color:#CF5C00;">::</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">tftColl, ebicat37)

## remapping identifiers of input GeneSetCollection to Symbol...

## done</styled-content>


                        <styled-content style="font-size:15px;color:#214A87;">head</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">(tt)

##                              DISEASE.TRAIT MAPPED_GENE       SNPS CHR_ID
## 1                        Atopic dermatitis        TNXB rs41268896      6
## 2                        Atopic dermatitis        TNXB rs12153855      6
## 3                        Atopic dermatitis       KIF3A  rs2897442      5
## 4 Attention deficit hyperactivity disorder      SEMA3A   rs797820      7
## 5 Attention deficit hyperactivity disorder        DNM1  rs2502731      9
## 6 Attention deficit hyperactivity disorder        GPC6  rs7995215     13
##     CHR_POS
## 1  32102292
## 2  32107027
## 3 132713335
## 4  83979723
## 5 128214278
## 6  93756253</styled-content>


                        <styled-content style="font-size:15px;color:#214A87;">table</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">(tt[,</styled-content>
                        <styled-content style="font-size:15px;color:#0000CF;">1</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">])

##
##                        Atopic dermatitis
##                                        3
## Attention deficit hyperactivity disorder
##                                        3
##                                   Height
##                                        7
##                  Menarche (age at onset)
##                                        4
##                   Obesity-related traits
##                                       11
##                     Rheumatoid arthritis
##                                        3</styled-content>
                    </preformat>
                </p>
            </sec>
        </sec>
        <sec sec-type="discussion">
            <title>Discussion</title>
            <p>Sources and consequences of variations in DNA transcription are fundamental problems for cell biology, and the projects we have made use of for cataloging transcription factors are at the boundaries of current knowledge.</p>
            <p>It is noteworthy that the four resources used for 
                <xref ref-type="fig" rid="f1">Figure 1</xref> agree on names of only 119 TFs. The fact that CISBP distinguishes 475 TFs that are not identified in any other source should be better understood. We observe that the ascription of TF status to AHRR is based on its sharing motifs with AHR (see 
                <monospace>
                    <ext-link ext-link-type="uri" xlink:href="http://cisbp.ccbr.utoronto.ca/TFreport.php?searchTF=T014165_1.02">http://cisbp.ccbr. utoronto.ca/TFreport.php?searchTF=T014165_1.02</ext-link>
                </monospace>).</p>
            <p>
                <xref ref-type="fig" rid="f2">Figure 2</xref> and 
                <xref ref-type="table" rid="T1">Table 1</xref> show that the classification of TFs is now fairly elaborate. Use of the precise terminology of the TFClass system to label TFs of interest at present relies on associations provided with the HOCOMOCO catalog.</p>
            <p>As population studies in genomic and genetic epidemiology grow in size and scope, principles for organizing and prioritizing loci associated with phenotypes of interest are urgently needed. 
                <xref ref-type="fig" rid="f6">Figure 6</xref> shows that loci associated with phenotypes related to kidney function, lung function, and IL-8 levels are potentially unified through the fact that the GWAS hits are connected with genes identified as targets of VDR (vitamin D receptor). This example limited attention to hits on chromosome 17; the 
                <monospace>TFtargs</monospace> tool permits 
                <italic toggle="yes">ad libitum</italic> exploration of phenotype-locus-gene-TF associations. Our hope is that the tools and resources collected in TFutils will foster systematic development of evidence-based mechanistic network models for transcription regulation in human disease contexts, thereby contributing to the development of personalized genomic medicine.</p>
        </sec>
        <sec>
            <title>Data availability</title>
            <p>With the exception of the FIMO scoring data (
                <monospace>fimo16</monospace>), all data underlying the results are available as part of the article and no additional source data are required.</p>
            <p>
                <monospace>fimo16</monospace> links to indexed bed files in a public S3 bucket funded by the Bioconductor foundation. The underling data is sourced from Sonawane 
                <italic toggle="yes">et al</italic>.  2017 
                <monospace>
                    <ext-link ext-link-type="uri" xlink:href="https://dx.doi.org/10.1016/j.celrep.2017.10.001">https://doi.org/10.1016/j.celrep.2017.10.001</ext-link>
                </monospace>
                <sup>
                    <xref ref-type="bibr" rid="ref-9">9</xref>
                </sup>
            </p>
        </sec>
        <sec>
            <title>Software availability</title>
            <p>Source code is available from GitHub:  
                <monospace>
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/vjcitn/TFutils">https://github.com/vjcitn/TFutils</ext-link>
                </monospace>
            </p>
            <p>Archived  source  code:   
                <monospace>
                    <ext-link ext-link-type="uri" xlink:href="https://dx.doi.org/doi:10.18129/B9.bioc.TFutils">https://doi.org/doi:10.18129/B9.bioc.TFutils</ext-link>
                </monospace>
                <sup>
                    <xref ref-type="bibr" rid="ref-12">12</xref>
                </sup>
            </p>
            <p>Licence: 
                <ext-link ext-link-type="uri" xlink:href="https://opensource.org/licenses/Artistic-2.0">Artistic License 2.0</ext-link>
            </p>
        </sec>
    </body>
    <back>
        <ref-list>
            <ref id="ref-1">
                <label>1</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Lambert</surname>
                            <given-names>SA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Jolma</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Campitelli</surname>
                            <given-names>LF</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The Human Transcription Factors.</article-title>
                    <source>

                        <italic toggle="yes">Cell.</italic>
</source>
                    <year>2018</year>;<volume>172</volume>(<issue>4</issue>):<fpage>650</fpage>&#x2013;<lpage>665</lpage>.
                    <pub-id pub-id-type="pmid">29425488</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.cell.2018.01.029</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-2">
                <label>2</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Weirauch</surname>
                            <given-names>MT</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Yang</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Albu</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Determination and inference of eukaryotic transcription factor sequence specificity.</article-title>
                    <source>

                        <italic toggle="yes">Cell.</italic>
</source>
                    <year>2014</year>;<volume>158</volume>(<issue>6</issue>):<fpage>1431</fpage>&#x2013;<lpage>1443</lpage>.
                    <pub-id pub-id-type="pmid">25215497</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.cell.2014.08.009</pub-id>
                    <pub-id pub-id-type="pmcid">4163041</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-3">
                <label>3</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ashburner</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ball</surname>
                            <given-names>CA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Blake</surname>
                            <given-names>JA</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.</article-title>
                    <source>

                        <italic toggle="yes">Nat Genet.</italic>
</source>
                    <year>2000</year>;<volume>25</volume>(<issue>1</issue>):<fpage>25</fpage>&#x2013;<lpage>29</lpage>.
                    <pub-id pub-id-type="pmid">10802651</pub-id>
                    <pub-id pub-id-type="doi">10.1038/75556</pub-id>
                    <pub-id pub-id-type="pmcid">3037419</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-4">
                <label>4</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kulakovskiy</surname>
                            <given-names>IV</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Vorontsov</surname>
                            <given-names>IE</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Yevshin</surname>
                            <given-names>IS</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-Seq analysis.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2018</year>;<volume>46</volume>(<issue>D1</issue>):<fpage>D252</fpage>&#x2013;<lpage>D259</lpage>.
                    <pub-id pub-id-type="pmid">29140464</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkx1106</pub-id>
                    <pub-id pub-id-type="pmcid">5753240</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-5">
                <label>5</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Subramanian</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Tamayo</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mootha</surname>
                            <given-names>VK</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.</article-title>
                    <source>

                        <italic toggle="yes">Proc Natl Acad Sci U S A.</italic>
</source>
                    <year>2005</year>;<volume>102</volume>(<issue>43</issue>):<fpage>15545</fpage>&#x2013;<lpage>15550</lpage>.
                    <pub-id pub-id-type="pmid">16199517</pub-id>
                    <pub-id pub-id-type="doi">10.1073/pnas.0506580102</pub-id>
                    <pub-id pub-id-type="pmcid">1239896</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-6">
                <label>6</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Gertz</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Savic</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Varley</surname>
                            <given-names>KE</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Distinct properties of cell-type-specific and shared transcription factor binding sites.</article-title>
                    <source>

                        <italic toggle="yes">Mol Cell.</italic>
</source>
                    <year>2013</year>;<volume>52</volume>(<issue>1</issue>):<fpage>25</fpage>&#x2013;<lpage>36</lpage>.
                    <pub-id pub-id-type="pmid">24076218</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.molcel.2013.08.037</pub-id>
                    <pub-id pub-id-type="pmcid">3811135</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-7">
                <label>7</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wingender</surname>
                            <given-names>E</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Schoeps</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Haubrock</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>TFClass: expanding the classification of human transcription factors to their mammalian orthologs.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2018</year>;<volume>46</volume>(<issue>D1</issue>):<fpage>D343</fpage>&#x2013;<lpage>D347</lpage>.
                    <pub-id pub-id-type="pmid">29087517</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkx987</pub-id>
                    <pub-id pub-id-type="pmcid">5753292</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-8">
                <label>8</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Grant</surname>
                            <given-names>CE</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Bailey</surname>
                            <given-names>TL</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Noble</surname>
                            <given-names>WS</given-names>
                        </name>
</person-group>:
                    <article-title>FIMO: scanning for occurrences of a given motif.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2011</year>;<volume>27</volume>(<issue>7</issue>):<fpage>1017</fpage>&#x2013;<lpage>1018</lpage>.
                    <pub-id pub-id-type="pmid">21330290</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btr064</pub-id>
                    <pub-id pub-id-type="pmcid">3065696</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-9">
                <label>9</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Sonawane</surname>
                            <given-names>AR</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Platig</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Fagny</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Understanding Tissue-Specific Gene Regulation.</article-title>
                    <source>

                        <italic toggle="yes">Cell Rep.</italic>
</source>
                    <year>2017</year>;<volume>21</volume>(<issue>4</issue>):<fpage>1077</fpage>&#x2013;<lpage>1088</lpage>.
                    <pub-id pub-id-type="pmid">29069589</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.celrep.2017.10.001</pub-id>
                    <pub-id pub-id-type="pmcid">5828531</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-10">
                <label>10</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Shannon</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Richards</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <article-title>
                        <italic toggle="yes">MotifDb</italic>: An Annotated Collection of Protein-DNA Binding Sequence Motifs</article-title>. R package version 1.26.0.
                    <pub-id pub-id-type="doi">10.18129/B9.bioc.MotifDb</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-11">
                <label>11</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ou</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wolfe</surname>
                            <given-names>SA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Brodsky</surname>
                            <given-names>MH</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>motifStack for the analysis of transcription factor binding site evolution.</article-title>
                    <source>

                        <italic toggle="yes">Nat Methods.</italic>
</source>
                    <year>2018</year>;<volume>15</volume>(<issue>1</issue>):<fpage>8</fpage>&#x2013;<lpage>9</lpage>.
                    <pub-id pub-id-type="pmid">29298290</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nmeth.4555</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-12">
                <label>12</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Carey</surname>
                            <given-names>V</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gopaulakrishnan</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>TFutils: TFutils</article-title>. R package version 1.2.0.<year>2018</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://www.doi.org/10.18129/B9.bioc.TFutils">http://www.doi.org/10.18129/B9.bioc.TFutils</ext-link>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report49980">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.21137.r49980</article-id>
            <title-group>
                <article-title>Reviewer response for version 2</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Weirauch</surname>
                        <given-names>Matthew T.</given-names>
                    </name>
                    <xref ref-type="aff" rid="r49980a2">2</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-7977-9122</uri>
                </contrib>
                <contrib contrib-type="author">
                    <name>
                        <surname>Ernst</surname>
                        <given-names>Kevin</given-names>
                    </name>
                    <xref ref-type="aff" rid="r49980a1">1</xref>
                    <role>Co-referee</role>
                </contrib>
                <aff id="r49980a1">
                    <label>1</label>Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA</aff>
                <aff id="r49980a2">
                    <label>2</label>Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center&#x00a0;(CCHMC)&#x00a0;, Cincinnati, OH, USA</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>18</day>
                <month>7</month>
                <year>2019</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2019 Ernst K and Weirauch MT</copyright-statement>
                <copyright-year>2019</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport49980" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.17976.2"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>reject</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The authors present TFutils, a Bioconductor package for analyze TFs and their binding sites. TFutils combines information and analysis capabilities from several popular sources into one interface. The authors present use-cases and tutorials for how TFutils can be used to ask and answer several biological questions.</p>
            <p> </p>
            <p> Major Comments 
                <list list-type="order">
                    <list-item>
                        <p>From what I can tell, the manuscript is basically a copy-pasted version of the &#x201c;vignette&#x201d; from their TFutils R package on Bioconductor, probably written by the (co)author of the package. &#x00a0;If the purpose is to teach people how to use their TFutils package, or even compare/contrast it with other existing solutions, in my opinion the manuscript falls very short of this goal. It&#x2019;d be different if the vignette (or the paper) made a clear case for what TFutils does right up front. Apart from saying it doesn&#x2019;t do sequence logo visualizations, the introduction lacks any discussion of what good the package is, except that it &#x201c;assembles various resources intended to clarify and unify approaches to working with TF concepts in bioinformatic analysis&#x201d;. A helpful remedy for this would be to have a table of included functions, data structures, and sample datasets, similar to the auto-generated listing from their own package, which is already publicly-available on the web. &#x00a0;The &#x201c;Intro&#x201d; section (actually the whole paper) lacks any straightforward enumeration of the contents of the package like &#x201c;we include data structures (name them) that wrap X, Y, and Z databases, and functions alpha, beta, and gamma to do (whatever).&#x201d; &#x00a0;</p>
                    </list-item>
                    <list-item>
                        <p>The manuscript itself is very hard to follow. It would be much better if there were with a common, practical theme through the examples &#x2013; as written, they are quite disjoint and not very well explained. There is no unifying thread to any of these examples, no focus on telling a &#x201c;story&#x201d; with a real-life research inquiry, or making the case for how TFutils could make that inquiry easier. If people in my lab (a transcription factor bioinformatics lab) have trouble following it, then there is almost no chance that a non-expert could follow it. One thing that would really help would be more human language in the code samples parts (or code comments), in order to aid in reader understanding. As it stands, the text surrounding the code is choppy, and does little to illuminate what&#x2019;s going on in the code samples. The examples and the text surrounding them are sometimes painfully disjoint, to the point where I was wondering how the text and code and figures right next to each other were related.</p>
                    </list-item>
                    <list-item>
                        <p>I would strongly urge the authors to use the Lambert 
                            <italic>et al.&#x00a0;</italic>collection of 1,639 human TFs as their basis.&#x00a0;Bigger is not always better.&#x00a0;Although the other databases currently used collectively sum to many more human TFs, there are many false positives.&#x00a0;For example, GO includes things like kinases in this category. This is exactly the reason that Lambert and colleagues went to the painstaking efforts of collecting all human TF candidates and manually curating the list one by one.</p>
                    </list-item>
                    <list-item>
                        <p>Table 1 is comparing apples to oranges.&#x00a0;Cis-BP contains all human TFs, regardless of their motif status (i.e. even if a TF does not have a known DNA binding motif, it is still included in the database). HOCOMOCO, on the other hand, only includes TFs with motifs. So, it does not really make sense to compare their TF members, which seems to be the major point of Table 1.&#x00a0;This also applies to the following comment in the Discussion, which also does not really make sense based on these facts: &#x201c;is noteworthy that the four resources used for Figure 1 agree on names of only 119 TFs. The fact that CIS-BP distinguishes 475 TFs that are not identified in any other source should be better understood. We observe that the ascription of TF status to AHRR is based on its sharing motifs with AHR.&#x201d;&#x00a0;The only reason AHRR is not in the other databases is probably because it has not had its motif directly determined through experimentation.</p>
                    </list-item>
                    <list-item>
                        <p>The example functionality shown in Figure 6 does not make sense to me.&#x00a0;As I understand it, a &#x201c;TF target&#x201d; is from MSigDB, which is simply a predicted binding site for a TF (here, VDR), in the promoter of a gene (e.g., TBX2 in the top row of the figure). TBX2 is associated with, e.g. Creatine Levels.&#x00a0;But, we do not know where the GWAS signal is located relative to TBX2.&#x00a0;The GWAS signal could be 20,000 bases away from TBX2, or inside its intron, etc.&#x00a0;So, what is the connection here between binding of VDR to the promoter of TBX2 and the GWAS signal? Unless I am misunderstanding how this part of the tool is working, this analysis seems very misleading to me.</p>
                    </list-item>
                </list> Minor comments 
                <list list-type="order">
                    <list-item>
                        <p>Introduction: &#x201c;typically near gene promoter regions&#x201d; &#x2013; I would add &#x201c;and enhancers&#x201d;, since this is actually where the majority of the TF binding sites are located.</p>
                    </list-item>
                    <list-item>
                        <p>There was just a new release of Cis-BP (version 2.0) &#x2013; if its not too much work, I would urge the authors to update their system, since it is a major update. I realize that this can be a lot of work, so I will leave it to the authors to decide if this is worth immediate action.</p>
                    </list-item>
                    <list-item>
                        <p>I would be very careful about calling the MSigDB collection &#x201c;TF targets&#x201d; &#x2013; these are not experimentally determined binding events (e.g., through ChIP-seq). These are simply the result of scanning motifs in promoters. I think these should therefore only be referred to as &#x201c;predicted targets&#x201d;. This seems nitpicky, but I think there is a very important distinction here.</p>
                    </list-item>
                    <list-item>
                        <p>It looks like the motifs from FIMO are Cis-BP motifs (these are incorporated into FIMO), which is fine.&#x00a0;But according to the example shown, it looks like the IDs might be truncated in your database &#x2013; for example, the top one is called &#x201c;M3433_1&#x201d;, which could lead to ambiguities, since it could correspond to &#x201c;M3433_1.02&#x201d;, &#x201c;M3433_1.01&#x201d;, &#x201c;M3433_1.00&#x201d;, etc.</p>
                    </list-item>
                    <list-item>
                        <p>It is usually not mentioned whether a function or data structure comes from TFutils or some other R package - a simple inline comment or note in the accompany text would really clear this up.</p>
                    </list-item>
                    <list-item>
                        <p>Dependencies are left up to the reader to figure out, which is a bit of a nuisance for someone that is trying to decide whether or not a given package is worth exploring.</p>
                    </list-item>
                    <list-item>
                        <p>Cis-BP should be spelled &#x201c;Cis-BP&#x201d; (as it is in Determination and inference of eukaryotic transcription factor sequence specificity&#x00a0;- Weirauch 
                            <italic>et al. </italic>(2014)
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-49980-1">1</xref>
                            </sup>).</p>
                    </list-item>
                    <list-item>
                        <p>Some of the sentence structure and word choices (data is &#x201c;lodged&#x201d; in Amazon S3, &#x201c;harvesting&#x201d; insights from the data) are inappropriate.</p>
                    </list-item>
                    <list-item>
                        <p>There are numerous &#x201c;typesetting&#x201d; problems in the paper, most of which, I assume originate with the original R Markdown vignette.</p>
                    </list-item>
                    <list-item>
                        <p>Monospace font not used consistently for variable or function names (numerous occurrences) - in many places throughout the text.</p>
                    </list-item>
                    <list-item>
                        <p>A bold font is used where a 4th-level headline (####) should be, which when rendered as HTML, creates a run-on with the first sentence of the intended section; e.g.:</p>
                        <p> Annotated reference to ENCODE ChIP-seq results.encode690</p>
                        <p> Find TFs that are direct GWAS hits for a given trait.directHitsInCISBP</p>
                    </list-item>
                    <list-item>
                        <p>Additional individual nit-picks (there are many) can be found in a marked-up version of v2 of the paper, found 
                            <ext-link ext-link-type="uri" xlink:href="https://www.diigo.com/annotated/original/e8adb7a71a050fece38d9b7c7925be0d">here</ext-link>.&#x00a0;</p>
                    </list-item>
                </list>
            </p>
            <p>Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?</p>
            <p>Partly</p>
            <p>Is the rationale for developing the new software tool clearly explained?</p>
            <p>No</p>
            <p>Is the description of the software tool technically sound?</p>
            <p>No</p>
            <p>Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?</p>
            <p>Partly</p>
            <p>Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?</p>
            <p>No</p>
            <p>Reviewer Expertise:</p>
            <p>Gene regulation, bioinformatics, genomics, functional genomics, disease genetics.</p>
            <p>We confirm that we have read this submission and believe that we have an appropriate level of expertise to state that we do not consider it to be of an acceptable scientific standard, for reasons outlined above.</p>
        </body>
        <back>
            <ref-list>
                <title>References</title>
                <ref id="rep-ref-49980-1">
                    <label>1</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Determination and inference of eukaryotic transcription factor sequence specificity.</article-title>
                        <source>
                            <italic>Cell</italic>
                        </source>.<year>2014</year>;<volume>158</volume>(<issue>6</issue>) :
                        <elocation-id>10.1016/j.cell.2014.08.009</elocation-id>
                        <fpage>1431</fpage>-<lpage>1443</lpage>
                        <pub-id pub-id-type="pmid">25215497</pub-id>
                        <pub-id pub-id-type="doi">10.1016/j.cell.2014.08.009</pub-id>
                    </mixed-citation>
                </ref>
            </ref-list>
        </back>
        <sub-article article-type="response" id="comment4764-49980">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Carey</surname>
                            <given-names>Vincent</given-names>
                        </name>
                        <aff>Harvard University, USA</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>18</day>
                    <month>7</month>
                    <year>2019</year>
                </pub-date>
            </front-stub>
            <body>
                <p>We appreciate the energetic response to our paper and are grateful for the contributions that the authors of the review have made to transcription factor bioinformatics.</p>
                <p>As the reviewers remark, a revision to the paper that accommodates their suggestions would be laborious and thus cannot occur immediately. However it will occur eventually as we agree for the most part with many of the criticisms. We would like to make a few preliminary responses; we will follow the reviewers' enumeration.</p>
                <p>Major 1: "copy-pasted" is a pejorative term but it is not warranted here. We have endeavored to keep the content of the paper "computable" - the Rmarkdown, when compiled, draws information from TFutils and other packages in real time, while the pdf/html in the F1000 renderings are not directly computed/computable. This is an important dimension of reproducibility that is becoming more familiar to scientists and their publishers. Our approach is not perfect but we hope it is a step in the right direction.</p>
                <p>Much of the remaining material in Major 1 seems a matter of concern with organization, making demands of the introduction to present material that occurs later on or through links. We accept these concerns to some extent and will decide how to respond later.</p>
                <p>Major 2: Concern with absence of unifying thread and story is certainly legitimate. Choppy and unilluminating text should be avoided. We are glad that other reviewers did not find these conditions to be severe obstacles to reading and providing constructive criticism.</p>
                <p>Major 3: Yes, we will be happy to introduce Lambert's collection into the package, and to include material in a new version of the package, and later to describe it in a revision of the paper. Perhaps the reviewers would like to collaborate on this task.</p>
                <p>Major 4: Table 1 illustrates divergence in terminology and basic frequency of class representation in two resources that address the class membership problem. It is not clear how "apples to apples" comparison should be conducted in this situation, and we welcome the reviewers' guidance. Certainly the distinction between types of enumeration (based on experimental evidence of binding vs existence of motif) should be made in connection with such enumerations, but how to do it with these specific resources is not completely clear. The Lambert table S1 does not seem to use TFclass categorization - but TFclass seems a useful information structure. Is there a good reason for this?</p>
                <p>Major 5: The text, though terse, explains exactly what is happening. MSigDb asserts that a TF has a gene target. The EBI/EMBL GWAS catalog is inspected for GWAS hits mapped to the asserted targets. The app presents these assertions and targets to the user. Additional details on the GWAS hit could be provided in a revision but we do not see what is misleading in this simple collation for convenience of users of these two resources.</p>
                <p>Minor 1: Point taken.</p>
                <p>Minor 2: Will do.</p>
                <p>Minor 3: That's the terminology of MSigDB at the time we wrote. We can add the caveat in revision - but perhaps you should take it up with MSigDb authors.</p>
                <p>Minor 4: Will check.</p>
                <p>Minor 5: Would appreciate more details, perhaps they are in your revision.</p>
                <p>Minor 6: Dependencies are resolved by BiocManager which is what should be used for installation. That information is in the introduction.</p>
                <p>Minor 7: Will do.</p>
                <p>Minor 8: Will consider.</p>
                <p>Minor 9: Hope these are noted in your revision.</p>
                <p>Minor 10-12: Thanks.</p>
            </body>
        </sub-article>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report48671">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.21137.r48671</article-id>
            <title-group>
                <article-title>Reviewer response for version 2</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Bucher</surname>
                        <given-names>Philipp</given-names>
                    </name>
                    <xref ref-type="aff" rid="r48671a1">1</xref>
                    <role>Referee</role>
                </contrib>
                <contrib contrib-type="author">
                    <name>
                        <surname>Ambrosini</surname>
                        <given-names>Giovanna</given-names>
                    </name>
                    <xref ref-type="aff" rid="r48671a1">1</xref>
                    <role>Co-referee</role>
                </contrib>
                <aff id="r48671a1">
                    <label>1</label>Swiss Institute for Experimental Cancer Research (ISREC), School of Life Sciences, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>27</day>
                <month>6</month>
                <year>2019</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2019 Ambrosini G and Bucher P</copyright-statement>
                <copyright-year>2019</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport48671" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.17976.2"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The authors have followed up on several of our suggestions and this (in our opinion) has significantly improved the article.</p>
            <p>Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?</p>
            <p>Partly</p>
            <p>Is the rationale for developing the new software tool clearly explained?</p>
            <p>Yes</p>
            <p>Is the description of the software tool technically sound?</p>
            <p>Yes</p>
            <p>Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?</p>
            <p>Partly</p>
            <p>Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>Bioinformatics, Epigenetics, ChIP-seq, regulatory region annotation, motif analysis, database design, web tools.</p>
            <p>We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report48672">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.21137.r48672</article-id>
            <title-group>
                <article-title>Reviewer response for version 2</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Zhu</surname>
                        <given-names>Lihua Julie</given-names>
                    </name>
                    <xref ref-type="aff" rid="r48672a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-7416-0590</uri>
                </contrib>
                <contrib contrib-type="author">
                    <name>
                        <surname>Li</surname>
                        <given-names>Rui</given-names>
                    </name>
                    <xref ref-type="aff" rid="r48672a2">2</xref>
                    <role>Co-referee</role>
                </contrib>
                <aff id="r48672a1">
                    <label>1</label>Department of Molecular, Cell and Cancer Biology, University of Massachusetts Medical School&#x00a0;(UMMS), Worcester, MA, USA</aff>
                <aff id="r48672a2">
                    <label>2</label>University of Massachusetts Medical School, Worcester, MA, USA</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>28</day>
                <month>5</month>
                <year>2019</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2019 Zhu LJ and Li R</copyright-statement>
                <copyright-year>2019</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport48672" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.17976.2"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>We think that the authors have addressed our major concerns, and we would like to change the status to Approved for the revised version.</p>
            <p> There are a couple of typos in the revised version that the authors might want to correct. 
                <list list-type="order">
                    <list-item>
                        <p>Figure 4 legend states that there are four cell lines while the figure seems to depict data for 5 cell lines.</p>
                    </list-item>
                    <list-item>
                        <p>The authors probably meant &#x201c;four&#x201d; instead of &#x201c;for&#x201d; in the following sentence under the Discussion section.</p>
                        <p> Sources and consequences of variations in DNA transcription are fundamental problems for cell biology, and the projects we have made use of 
                            <bold>for</bold> cataloging transcription factors are at the boundaries of current knowledge.&#x201d;</p>
                    </list-item>
                </list>
            </p>
            <p>Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?</p>
            <p>Yes</p>
            <p>Is the rationale for developing the new software tool clearly explained?</p>
            <p>Yes</p>
            <p>Is the description of the software tool technically sound?</p>
            <p>Yes</p>
            <p>Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?</p>
            <p>Partly</p>
            <p>Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>NA</p>
            <p>We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report45003">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.19660.r45003</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Bucher</surname>
                        <given-names>Philipp</given-names>
                    </name>
                    <xref ref-type="aff" rid="r45003a1">1</xref>
                    <role>Referee</role>
                </contrib>
                <contrib contrib-type="author">
                    <name>
                        <surname>Ambrosini</surname>
                        <given-names>Giovanna</given-names>
                    </name>
                    <xref ref-type="aff" rid="r45003a1">1</xref>
                    <role>Co-referee</role>
                </contrib>
                <aff id="r45003a1">
                    <label>1</label>Swiss Institute for Experimental Cancer Research (ISREC), School of Life Sciences, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>26</day>
                <month>4</month>
                <year>2019</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2019 Ambrosini G and Bucher P</copyright-statement>
                <copyright-year>2019</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport45003" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.17976.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>TFutils is a Bioconductor package meant to help users study TF binding in the human genome. The tool integrates several resources such as Gene Ontology (GO), CISBP, HOCOMOCO, and MSigD (the Molecular Signature Database). The paper describes how to tackle basic problems users are faced with when trying to work with TFs, in particular the TF classification, the gene targets identification, and, ultimately, the prediction of TF binding affinities.</p>
            <p> This article looks more like a software tutorial than a scientific article. As software has to evolve in order keep up with user needs, the text will have to be updated on a regular basis in the future, in order to remain up-to-data as well. This is fine if F1000Research accepts updates and supports versioning of articles. Otherwise, another format should be chosen for presenting this tool.</p>
            <p> Just by reading the article, we didn't get a clear impression of what is inside TFutils. Going through the command examples was helpful in this respect. Nevertheless, we have doubts whether we would be able to use this package in a productive manner in the future. The promise that "TFutils lowers the barriers of usage of key findings of TF biology" holds only for expert users of Bioconductor, who are already familiar with all the other packages mentioned in this article and necessary to reproduce the results.</p>
            <p> The current manuscript has several shortcomings. At a general level, it is not very transparent to the na&#x00ef;ve reader what is actually new from this package and what functionalities are provided by the many other Bioconductor packages referred to in the text. Fortunately, we found a well-organized reference manual for TFutils version 1.2.0 on the internet, which clarified this issue for us. A URL to this document should have been included in the article.</p>
            <p> As this is a tutorial-style document, it would be helpful to provide complete R code for reproducing Figures 3 and 4. While Figure 3 is relatively easy to generate, it took us at least half a day to reproduce Figure 4. A major limitation is that the fimo16 object, upon which Figure 3 is based, contains only TF affinity data for 16 out of 689 scanned TF motif matrices.</p>
            <p> Figure 3 shows the predicted binding sites for 16 TFs in a selected genomic region. Already with such a small number of TFs, the Figure is pretty crowded with dots. One wonders what it would look like if all 689 FIMO-scanned motif matrices were considered. In view of the density of motif matches it seems doubtful whether any biological insights can be gained from such a plot. Some guidance for the interpretation is needed.</p>
            <p> Figure 4 shows ENCODE binding peaks for CEBPB in the same genomic region that was used for Figure 3. Naturally, we were curious to know whether the peaks seen in this Figures co-localize with corresponding motif matches in Figure 3. Unfortunately, CEBPB is not included in the fimo16 collection. To exemplify the power of the tool, it would have been preferable to choose an example where the reader can crosscheck the consistency between predictions and experiments via comparison of Figure 3 with Figure 4.</p>
            <p> Overall, our impression is that TFutils is a useful package albeit for a restricted community of users already familiar with the other Bioconductor packages mentioned in the article. However, the manuscript could benefit from major revisions as pointed out above.</p>
            <p>Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?</p>
            <p>Partly</p>
            <p>Is the rationale for developing the new software tool clearly explained?</p>
            <p>Yes</p>
            <p>Is the description of the software tool technically sound?</p>
            <p>Yes</p>
            <p>Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?</p>
            <p>Partly</p>
            <p>Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>Bioinformatics, Epigenetics, ChIP-seq, regulatory region annotation, motif analysis, database design, web tools.</p>
            <p>We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however we have significant reservations, as outlined above.</p>
        </body>
        <sub-article article-type="response" id="comment4635-45003">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Carey</surname>
                            <given-names>Vincent</given-names>
                        </name>
                        <aff>Harvard University, USA</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>None.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>8</day>
                    <month>5</month>
                    <year>2019</year>
                </pub-date>
            </front-stub>
            <body>
                <p>We are very appreciative of the effort underlying this review. &#x00a0;Important questions were raised and we endeavor to answer them fully below.&#x00a0; Reviewer comments are prefaced by "
                    <bold>QUERY</bold>" and our replies are prefaced by "
                    <bold>RESPONSE</bold>".</p>
                <p>
                    <bold>QUERY</bold>: This article looks more like a software tutorial than a scientific article. As software has to evolve in order keep up with user needs, the text will have to be updated on a regular basis in the future, in order to remain up-to-data as well. This is fine if F1000Research accepts updates and supports versioning of articles. Otherwise, another format should be chosen for presenting this tool.</p>
                <p>
                    <bold>RESPONSE</bold>: &#x00a0;Because F1000Research accepts updates and supports versioning of articles, we believe that this format is an acceptable one for the work we have described.</p>
                <p>
                    <bold>QUERY</bold>: Just by reading the article, we didn't get a clear impression of what is inside TFutils. Going through the command examples was helpful in this respect. Nevertheless, we have doubts whether we would be able to use this package in a productive manner in the future. The promise that "TFutils lowers the barriers of usage of key findings of TF biology" holds only for expert users of Bioconductor, who are already familiar with all the other packages mentioned in this article and necessary to reproduce the results.</p>
                <p>
                    <bold>RESPONSE</bold>: We are glad that the examples in the paper were useful to the reviewer. It is true that Bioconductor software generally requires acquaintance with and use of multiple interrelated packages. In this sense, it is different from a relatively common approach of command-line utility implementation of bioinformatic analysis tools, where a given tool may be understood and used in isolation. We do not agree that "expert" level of understanding of Bioconductor is necessary to make use of the material described, although some facility with and enthusiasm for the R language would be necessary to make headway. The paper is published in the Bioconductor channel of F1000research, and it is expected that the readership will have an acquaintance with the resources and limitations of the Bioconductor software and data ecosystem.</p>
                <p>
                    <bold>QUERY</bold>: The current manuscript has several shortcomings. At a general level, it is not very transparent to the na&#x00ef;ve reader what is actually new from this package and what functionalities are provided by the many other Bioconductor packages referred to in the text. Fortunately, we found a well-organized reference manual for TFutils version 1.2.0 on the internet, which clarified this issue for us. A URL to this document should have been included in the article.</p>
                <p>
                    <bold>RESPONSE</bold>: This is a useful observation. We have now included a reference to 
                    <ext-link ext-link-type="uri" xlink:href="http://bioconductor.org/packages/release/bioc/manuals/TFutils/man/TFutils.pdf">http://bioconductor.org/packages/release/bioc/manuals/TFutils/man/TFutils.pdf</ext-link>
                </p>
                <p>in the implementation section of the paper.</p>
                <p>
                    <bold>QUERY</bold>: As this is a tutorial-style document, it would be helpful to provide complete R code for reproducing Figures 3 and 4. While Figure 3 is relatively easy to generate, it took us at least half a day to reproduce Figure 4. A major limitation is that the fimo16 object, upon which Figure 3 is based, contains only TF affinity data for 16 out of 689 scanned TF motif matrices.</p>
                <p>
                    <bold>RESPONSE</bold>: We appreciate the effort taken here. The production of Figure 4 was complicated and we have created an app and associated github repository to clarify the basic issues. The repository is 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/vjcitn/encdemo">https://github.com/vjcitn/encdemo</ext-link> and the face page of the repo has a screenshot of the app, which runs at 
                    <ext-link ext-link-type="uri" xlink:href="https://vjcitn.shinyapps.io/encdemo/">https://vjcitn.shinyapps.io/encdemo/</ext-link>. Our point in the visualization of Figure 4 is not the visualization per se, which can be accomplished with standard genome browsers, with suitable commands. Rather, we use the visualization to give concrete demonstration of the immediate programmatic availability (to users of this package) of the relevant experimental results and annotations. We concur that the limitation of fimo16 to a small number of TFs is disappointing. Comprehensive presentation of the scan scores to our user base/readership would require computational resources that we have not yet been able to muster. A local deployment requires close to a terabyte of indexed storage.</p>
                <p>
                    <bold>QUERY</bold>: Figure 3 shows the predicted binding sites for 16 TFs in a selected genomic region. Already with such a small number of TFs, the Figure is pretty crowded with dots. One wonders what it would look like if all 689 FIMO-scanned motif matrices were considered. In view of the density of motif matches it seems doubtful whether any biological insights can be gained from such a plot. Some guidance for the interpretation is needed.</p>
                <p>
                    <bold>RESPONSE</bold>: As noted just above we do not have a mechanism for providing all 689 scans. We have added text after Figure 3 acknowledging the challenge of interpretation, specifically with respect to combinatorics of TF binding.</p>
                <p>
                    <bold>QUERY</bold>: Figure 4 shows ENCODE binding peaks for CEBPB in the same genomic region that was used for Figure 3. Naturally, we were curious to know whether the peaks seen in this Figures co-localize with corresponding motif matches in Figure 3. Unfortunately, CEBPB is not included in the fimo16 collection. To exemplify the power of the tool, it would have been preferable to choose an example where the reader can crosscheck the consistency between predictions and experiments via comparison of Figure 3 with Figure 4.</p>
                <p>
                    <bold>RESPONSE</bold>: We agree that unification of concepts underlying Figure 3 and 4 would be quite desirable. Figure 3 is based on the analysis of motifs in reference sequence, while Figure 4 is a severe reduction of cell-type specific information from in vitro experiments. The data underlying the encdemo app noted above should be useful for beginning surveys across TFs and across cell types essential for a full understanding of cell-type specific combinatorics of TF binding.</p>
                <p>
                    <bold>QUERY</bold>: Overall, our impression is that TFutils is a useful package albeit for a restricted community of users already familiar with the other Bioconductor packages mentioned in the article. However, the manuscript could benefit from major revisions as pointed out above.</p>
                <p>
                    <bold>RESPONSE</bold>: We appreciate the effort taken in this review and we have endeavored to answer the questions raised.</p>
            </body>
        </sub-article>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report44033">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.19660.r44033</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Zhu</surname>
                        <given-names>Lihua Julie</given-names>
                    </name>
                    <xref ref-type="aff" rid="r44033a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-7416-0590</uri>
                </contrib>
                <contrib contrib-type="author">
                    <name>
                        <surname>Liu</surname>
                        <given-names>Haibo</given-names>
                    </name>
                    <xref ref-type="aff" rid="r44033a2">2</xref>
                    <role>Co-referee</role>
                </contrib>
                <contrib contrib-type="author">
                    <name>
                        <surname>Li</surname>
                        <given-names>Rui</given-names>
                    </name>
                    <xref ref-type="aff" rid="r44033a3">3</xref>
                    <role>Co-referee</role>
                </contrib>
                <aff id="r44033a1">
                    <label>1</label>Department of Molecular, Cell and Cancer Biology, University of Massachusetts Medical School&#x00a0;(UMMS), Worcester, MA, USA</aff>
                <aff id="r44033a2">
                    <label>2</label>Iowa State University, Ames, IA, USA</aff>
                <aff id="r44033a3">
                    <label>3</label>University of Massachusetts Medical School, Worcester, MA, USA</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>20</day>
                <month>2</month>
                <year>2019</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2019 Zhu LJ et al.</copyright-statement>
                <copyright-year>2019</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport44033" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.17976.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The TFutils package provides useful, convenient, integrated data structures for TF-related bioinformatics analyses, by&#x00a0;incorporating the&#x00a0;basic information of human transcription factors (TFs), such as TF classification, known TF targets, genome-wide TF binding sites and binding affinity scores, which might be used to prioritize candidate genetic variants and help understand gene transcriptional regulatory mechanisms. Importantly, it also provides an interactive interface to query TFs and TF targets implicated in human traits as discovered by many GWASs.</p>
            <p> In a quick test, all demo code in this paper worked. However, to make sure TFutils is more useful to the bioinformatics community, a few questions may need to be addressed. Here are our detailed comments and questions.</p>
            <p> &#x00a0; 
                <list list-type="order">
                    <list-item>
                        <p>TFutils includes resources from CISBP, HOCOMOCO, GO and MSigDb. There are additional human TF resources. Is there any reason not to include those resources such as JASPAR
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-44033-1">1</xref>
                            </sup>, Transfac
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-44033-2">2</xref>
                            </sup>, HDPI
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-44033-3">3</xref>
                            </sup>&#x00a0;and uniPro
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-44033-4">4</xref>
                            </sup>?&#x00a0;</p>
                    </list-item>
                    <list-item>
                        <p>There are potential packages that will likely import TFutils such as 
                            <ext-link ext-link-type="uri" xlink:href="https://bioconductor.org/packages/release/bioc/html/TFBSTools.html">TFBSTools </ext-link>for the analysis of transcription factor binding sites manipulation, motifStack for graphic representation of multiple motifs
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-44033-5">5</xref>
                            </sup> and 
                            <ext-link ext-link-type="uri" xlink:href="http://bioconductor.org/packages/release/bioc/html/MotIV.html">MotIV</ext-link>. It will be helpful to present a few lines of code to show how to integrate data from TFutils to aforementioned pipelines.</p>
                    </list-item>
                    <list-item>
                        <p>The section &#x201c;Basic concepts of transcription factor bioinformatics&#x201d; includes lots of background information, such as existing TF-related data sources/bases, TF classification, and how TFutils incorporates and access those resources. To make it easy to follow, we suggest break this part into the Introduction section and the Method section. The author may move the background information and TF classification to the Introduction section, and include an Implementation section in the Methods section to describe how TFutils incorporates all these data sources and how to retrieve the relevant information in TFutils and how to integrate with other packages as mentioned in 2, where the R script snippets can be displayed.</p>
                    </list-item>
                    <list-item>
                        <p>To maintain/increase the user base, it is important to keep the data up to date. Currently, the data were snapshots of the CISBP and HOCOMOCO catalogs. If the resources are not updated regularly, it&#x2019;s unlikely that users will use TFutils after 2-3 years. Is there a plan in place to have the resources assembled by TFutils be update regularly? How often is the update going to be? Is it going to be automatic or manually?</p>
                    </list-item>
                    <list-item>
                        <p>Flexibility of the data structure is also important, as users may want to expand the utility of TFutils. Suggest authors describe how to add features to the current data structures in TFutils in the manuscript.</p>
                    </list-item>
                    <list-item>
                        <p>It will be useful to add information on the numbers of TFs and targets included in the assembled resources, as well as in the original databases.</p>
                    </list-item>
                    <list-item>
                        <p>There is a python package having the same name &#x201c;tfutils&#x201d; which is very popular. If it is not too hard to do, we suggest authors change the package name to avoid confusion</p>
                    </list-item>
                    <list-item>
                        <p>Installation and running environments of the TFutils was described twice, once in the Introduction section, the other time in the Methods section: Operation: Installation. It is better to only describe this once in the Methods section.</p>
                    </list-item>
                    <list-item>
                        <p>There are many short paragraphs consisting of one or two sentences and related information are scattered into different sections. For instances, the last paragraph of the Introduction section about the limitations of TFutil might be moved to the Discussion part; whereas the third paragraph in the Discussion section might be moved to somewhere at the beginning of the Introduction section or where it is appropriate.</p>
                    </list-item>
                    <list-item>
                        <p>Page 6, the Summary section might be better moved to between the data availability section and the discussion section to summarize the implemented functionality of TFutil.</p>
                    </list-item>
                </list> Besides those major issues, we also have a few minor questions: 
                <list list-type="order">
                    <list-item>
                        <p>Currently the abstract only mentions TF targets derived from the MSigDb. Considering that the ENCODE TF ChIP-seq data is one of the most significant resources for TF targets information as mentioned in the main text, suggest authors add how the ENCODE ChIP-seq data were incorporated into TFutils in the abstract.</p>
                    </list-item>
                    <list-item>
                        <p>Page 5, please clarify the type of details in the sentence &#x201c;Full details are provided in Sonawane et al&#x201d;.</p>
                    </list-item>
                    <list-item>
                        <p>Gene structure can be better depicted in Fig. 3 and Fig. 4, perhaps adopting the gene structure visualization in most genome viewers, showing exon/intron structure and gene transcription direction.</p>
                    </list-item>
                    <list-item>
                        <p>Please include the used R packages in the citation.</p>
                    </list-item>
                    <list-item>
                        <p>&#x201c;TFtargs()&#x201d; in Figure 5 legend needs to be edited.</p>
                    </list-item>
                    <list-item>
                        <p>For the&#x00a0;subtitles under the Use cases section, suggest add find before &#x201c;TFs that are direct GWAS &#x2026;&#x201d; and retrieve before &#x201c;Traits mapped to genes that &#x2026;&#x201d;.</p>
                    </list-item>
                </list>
            </p>
            <p>Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?</p>
            <p>Yes</p>
            <p>Is the rationale for developing the new software tool clearly explained?</p>
            <p>Yes</p>
            <p>Is the description of the software tool technically sound?</p>
            <p>Yes</p>
            <p>Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?</p>
            <p>Partly</p>
            <p>Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>Bioinformatics, ChIP-seq, CRISPR technology, RNA-seq, annotation, ATAC-seq, motif analysis, shRNA/CRISPR screening, visualization, machine learning and database application</p>
            <p>We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however we have significant reservations, as outlined above.</p>
        </body>
        <back>
            <ref-list>
                <title>References</title>
                <ref id="rep-ref-44033-1">
                    <label>1</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>JASPAR: an open-access database for eukaryotic transcription factor binding profiles.</article-title>
                        <source>
                            <italic>Nucleic Acids Res</italic>
                        </source>.<year>2004</year>;<volume>32</volume>(<issue>Database issue</issue>) :
                        <elocation-id>10.1093/nar/gkh012</elocation-id>
                        <fpage>D91</fpage>-<lpage>4</lpage>
                        <pub-id pub-id-type="pmid">14681366</pub-id>
                        <pub-id pub-id-type="doi">10.1093/nar/gkh012</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-44033-2">
                    <label>2</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>TRANSFAC: an integrated system for gene expression regulation.</article-title>
                        <source>
                            <italic>Nucleic Acids Res</italic>
                        </source>.<year>2000</year>;<volume>28</volume>(<issue>1</issue>) :<fpage>316</fpage>-<lpage>9</lpage>
                        <pub-id pub-id-type="pmid">10592259</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-44033-3">
                    <label>3</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>hPDI: a database of experimental human protein-DNA interactions.</article-title>
                        <source>
                            <italic>Bioinformatics</italic>
                        </source>.<year>2010</year>;<volume>26</volume>(<issue>2</issue>) :
                        <elocation-id>10.1093/bioinformatics/btp631</elocation-id>
                        <fpage>287</fpage>-<lpage>9</lpage>
                        <pub-id pub-id-type="pmid">19900953</pub-id>
                        <pub-id pub-id-type="doi">10.1093/bioinformatics/btp631</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-44033-4">
                    <label>4</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>UniPROBE: an online database of protein binding microarray data on protein-DNA interactions.</article-title>
                        <source>
                            <italic>Nucleic Acids Res</italic>
                        </source>.<year>2009</year>;<volume>37</volume>(<issue>Database issue</issue>) :
                        <elocation-id>10.1093/nar/gkn660</elocation-id>
                        <fpage>D77</fpage>-<lpage>82</lpage>
                        <pub-id pub-id-type="pmid">18842628</pub-id>
                        <pub-id pub-id-type="doi">10.1093/nar/gkn660</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-44033-5">
                    <label>5</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>motifStack for the analysis of transcription factor binding site evolution.</article-title>
                        <source>
                            <italic>Nat Methods</italic>
                        </source>.<year>2018</year>;<volume>15</volume>(<issue>1</issue>) :
                        <elocation-id>10.1038/nmeth.4555</elocation-id>
                        <fpage>8</fpage>-<lpage>9</lpage>
                        <pub-id pub-id-type="pmid">29298290</pub-id>
                        <pub-id pub-id-type="doi">10.1038/nmeth.4555</pub-id>
                    </mixed-citation>
                </ref>
            </ref-list>
        </back>
        <sub-article article-type="response" id="comment4634-44033">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Carey</surname>
                            <given-names>Vincent</given-names>
                        </name>
                        <aff>Harvard University, USA</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>8</day>
                    <month>5</month>
                    <year>2019</year>
                </pub-date>
            </front-stub>
            <body>
                <p>We appreciate the careful reading of this paper, which has inspired additional functionality that will be added to the package. Reviewer comments (isolated with 
                    <bold>QUERY</bold>: tag) are addressed below (with 
                    <bold>RESPONSE</bold>: tag).</p>
                <p>
                    <bold>QUERY</bold>: TFutils includes resources from CISBP, HOCOMOCO, GO and MSigDb. There are additional human</p>
                <p>TF resources. Is there any reason not to include those resources such as JASPAR1, Transfac2, HDPI3 and uniPro4?</p>
                <p>
                    <bold>RESPONSE</bold>: According to 
                    <ext-link ext-link-type="uri" xlink:href="http://cisbp.ccbr.utoronto.ca/summary.php?by=4&amp;orderby=MSource_Identifier">http://cisbp.ccbr.utoronto.ca/summary.php?by=4&amp;orderby=MSource_Identifier</ext-link>&#x00a0;JASPAR and TRANSFAC entries are included in CIS-BP. UniProbe is an interesting resource but does not provide a mapping from TF or motif to target genes. Interfacing&#x00a0;to UniProbe is beyond the scope of the tasks intended for this paper. The same&#x00a0;issue appears to us to apply to hPDI. &#x00a0;Resources that do not enumerate "gene level"&#x00a0;TF targets are beyond the scope of current work.</p>
                <p>
                    <bold>QUERY</bold>: There are potential packages that will likely import TFutils such as TFBSTools for the analysis of transcription factor binding sites manipulation, motifStack for graphic representation of multiple motifs5 and MotIV. It will be helpful to present a few lines of code to show how to integrate data from TFutils to aforementioned pipelines.</p>
                <p>
                    <bold>RESPONSE</bold>: We will add an example of how diversity in TF reference motif data leads</p>
                <p>to a structure that may be visualized with MotifDb/motifStack. Specifically the 1.5.x+ versions of TFutils will include a function that uses motifStack and MotifDb to generate displays like Figure 5 of the revised paper.&#x00a0;This is the result of example(tffamCirc.plot)in the current devel branch of TFutils.</p>
                <p>
                    <bold>QUERY</bold>: The section &#x201c;Basic concepts of transcription factor bioinformatics&#x201d; includes lots of background information, such as existing TF-related data sources/bases, TF classification, and how TFutils incorporates and access those resources. To make it easy to follow, we suggest break this part into the Introduction section and the Method section. The author may move the background information and TF classification to the Introduction section, and include an Implementation section in the Methods section to describe how TFutils incorporates all these data sources and how to retrieve the relevant information in TFutils and how to integrate with other packages as mentioned in 2, where the R script snippets can be displayed.</p>
                <p>
                    <bold>RESPONSE</bold>: Some of these changes are made. Our approach to the narrative attempts to follow the F1000Research schema.</p>
                <p>
                    <bold>QUERY</bold>: &#x00a0;To maintain/increase the user base, it is important to keep the data up to date. Currently, the data were snapshots of the CISBP and HOCOMOCO catalogs. If the resources are not updated regularly, it&#x2019;s unlikely that users will use TFutils after 2-3 years. Is there a plan in place to have the resources assembled by TFutils be update regularly? How often is the update going to be? Is it going to be automatic or manually?</p>
                <p>
                    <bold>RESPONSE</bold>: The package will be updated according to the Bioconductor release</p>
                <p>protocol. The main pages give explicit information on provenance of information underlying serialized data structures. Community input on the utility of the various sources will be important in determining the frequency of content updates.</p>
                <p>
                    <bold>QUERY</bold>: Flexibility of the data structure is also important, as users may want to expand the utility of TFutils. Suggest authors describe how to add features to the current data structures in TFutils in the manuscript.</p>
                <p>
                    <bold>RESPONSE</bold>: The code is open source. Pull requests are welcome. If there are&#x00a0;specific features of interest to the reviewers we will consider how to incorporate&#x00a0;them in future versions of the package/manuscript.</p>
                <p>
                    <bold>QUERY</bold>: It will be useful to add information on the numbers of TFs and targets included in the assembled resources, as well as in the original databases.</p>
                <p>
                    <bold>RESPONSE</bold>: Totals are added in the caption of Figure 1.</p>
                <p>
                    <bold>QUERY</bold>: There is a python package having the same name &#x201c;tfutils&#x201d; which is very popular. If it is not too hard to do, we suggest authors change the package name to avoid confusion</p>
                <p>
                    <bold>RESPONSE</bold>: The python package addresses "tensorflow", which is not related to TFs</p>
                <p>in bioinformatics. We do not believe that the risk of confusion is high, but will consider renaming if events of confusion are observed.</p>
                <p>
                    <bold>QUERY</bold>: Installation and running environments of the TFutils was described twice, once in the Introduction section, the other time in the Methods section: Operation: Installation. It is better to only describe this once in the Methods section.</p>
                <p>
                    <bold>RESPONSE</bold>: The presentation follows the suggested F1000Research format.&#x00a0;&#x00a0;</p>
                <p>
                    <bold>QUERY</bold>: There are many short paragraphs consisting of one or two sentences and related information are scattered into different sections. For instances, the last paragraph of the Introduction section about the limitations of TFutil might be moved to the Discussion part; whereas the third paragraph in the Discussion section might be moved to somewhere at the beginning of the Introduction section or where it is appropriate.</p>
                <p>Page 6, the Summary section might be better moved to between the data availability section and the discussion section to summarize the implemented functionality of TFutil.</p>
                <p>
                    <bold>RESPONSE</bold>: The last paragraph of the introduction is used to pre-empt potential reader disappointment early in the presentation, so we prefer to leave it where it is. The third paragraph in the discussion does include content of general and possibly introductory significance, but that content is embedded in concrete illustrations that depend upon actual details of package use. The "summary" element is provided to give the reader a break before plunging into the obligatory Methods/Implementation material. We concur with your basic aesthetic preferences but have organized our text in what we consider to be a rational way.</p>
                <p>
                    <bold>QUERY</bold>: Besides those major issues, we also have a few minor questions:</p>
                <p>
                    <bold>QUERY</bold>: Currently the abstract only mentions TF targets derived from the MSigDb. Considering that the ENCODE TF ChIP-seq data is one of the most significant resources for TF targets information as mentioned in the main text, suggest authors add how the ENCODE ChIP-seq data were incorporated into TFutils in the abstract.</p>
                <p>
                    <bold>RESPONSE</bold>: metadata(encode690) provides details. The ENCODE data are derived from</p>
                <p>Bioconductor's AnnotationHub package. These facts are noted in the vicinity of Figure 4. &#x00a0;We have added a phrase to the abstract that mentions the ENCODE interface.</p>
                <p>
                    <bold>QUERY</bold>: Page 5, please clarify the type of details in the sentence &#x201c;Full details are provided in Sonawane et al&#x201d;.</p>
                <p>
                    <bold>RESPONSE</bold>: These authors describe how FIMO was used to obtain sequence-based binding affinity scores; the main text is slightly modified to clarify the role of this reference.</p>
                <p>
                    <bold>QUERY</bold>: Gene structure can be better depicted in Fig. 3 and Fig. 4, perhaps adopting the gene structure visualization in most genome viewers, showing exon/intron structure and gene transcription direction.</p>
                <p>
                    <bold>RESPONSE</bold>: We agree that the gene model displays are sub-optimal. However, the visualizations are not central to the package. Ideally, Gviz, ggbio, or karyploteR infrastructure would be used and we will pursue these improvements in updates to the package. Further discussion of the visualization is conducted with the other referee report. We note that the interactive "app" at 
                    <ext-link ext-link-type="uri" xlink:href="https://vjcitn.shinyapps.io/encdemo/">https://vjcitn.shinyapps.io/encdemo/</ext-link> can be used to interactively view binding sites for a small number of TFs in a small number of cell types. Such visualizations can be better accomplished with standard genome browsers. The visualizations in the paper are provided to make concrete the immediate programmatic availability of these resources and concepts to package users. In particular, Figure 3 just scratches the surface of the concept that the combinatorics of TF binding are complex. As noted by the other reviewer, the display is impossible to parse meaningfully in detail, but the overall interpretation is that binding events form a complex ensemble and that data structures and programming patterns are needed to develop compelling interpretations.</p>
                <p>
                    <bold>QUERY</bold>: Please include the used R packages in the citation.</p>
                <p>
                    <bold>RESPONSE</bold>: After running the Rmarkdown document in TFutils/vignette/TFutils_f1000 that conducts all the computations presented in the paper, sessionInfo() can be run to enumerate all packages in use. There are 37 packages attached, and 60 loaded but not attached. It does not seem reasonable to burden the paper with such an accounting. Users can run sessionInfo() and then query the DESCRIPTION files of packages of interest for provenance information.</p>
                <p>
                    <bold>QUERY</bold>: &#x201c;TFtargs()&#x201d; in Figure 5 legend needs to be edited.</p>
                <p>
                    <bold>RESPONSE</bold>: Done.</p>
                <p>
                    <bold>QUERY</bold>: For the subtitles under the Use cases section, suggest add find before &#x201c;TFs that are direct GWAS &#x2026;&#x201d; and retrieve before &#x201c;Traits mapped to genes that &#x2026;&#x201d;.</p>
                <p>
                    <bold>RESPONSE</bold>: Done.</p>
            </body>
        </sub-article>
    </sub-article>
</article>
