<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="methods-article" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.13535.2</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Method Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>
                    <italic>StateHub-StatePaintR:</italic> rapid and reproducible chromatin state evaluation for custom genome annotation</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 2; peer review: 3 approved with reservations]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Coetzee</surname>
                        <given-names>Simon G.</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-4267-5930</uri>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Ramjan</surname>
                        <given-names>Zachary</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <xref ref-type="aff" rid="a2">2</xref>
                    <xref ref-type="aff" rid="a3">3</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Dinh</surname>
                        <given-names>Huy Q.</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <xref ref-type="aff" rid="a4">4</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Berman</surname>
                        <given-names>Benjamin P.</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Funding Acquisition</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a4">4</xref>
                </contrib>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Hazelett</surname>
                        <given-names>Dennis J.</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Funding Acquisition</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-0749-9935</uri>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a4">4</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>The Center for Bioinformatics and Functional Genomics, Department of Biomedical Sciences, Cedars-Sinai Medical Center, Los Angeles, CA, 90048, USA</aff>
                <aff id="a2">
                    <label>2</label>Zaxxis LLC, Grandville, MI, 49418, USA</aff>
                <aff id="a3">
                    <label>3</label>Van Andel Research Institute, Grand Rapids, MI, 49503, USA</aff>
                <aff id="a4">
                    <label>4</label>Sammuel Oschin Comprehensive Cancer Institute, Los Angeles, CA, 90048, USA</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:dennis.hazelett@csmc.edu">dennis.hazelett@csmc.edu</email>
                </corresp>
                <fn fn-type="conflict">
                    <p>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>7</day>
                <month>5</month>
                <year>2020</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2018</year>
            </pub-date>
            <volume>7</volume>
            <elocation-id>214</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>28</day>
                    <month>4</month>
                    <year>2020</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2020 Coetzee SG et al.</copyright-statement>
                <copyright-year>2020</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/7-214/pdf"/>
            <abstract>
                <p>Genome annotation is critical to understand the function of disease variants, especially for clinical applications. To meet this need there are segmentations available from public consortia reflecting varying unsupervised approaches to functional annotation based on epigenetics data, but there remains a need for transparent, reproducible, and easily interpreted genomic maps of the functional biology of chromatin. We introduce a new framework for defining a combinatorial epigenomic model of chromatin state on a web database, 
                    <italic toggle="yes">StateHub</italic>. In addition, we created an annotation tool for bioconductor, 
                    <italic toggle="yes">StatePaintR</italic>, which accesses these models and uses them to rapidly (on the order of seconds) produce chromatin state segmentations in standard genome browser formats. Annotations are fully documented with change history and versioning, authorship information, and original source files. 
                    <italic toggle="yes">StatePaintR</italic> calculates ranks for each state from next-gen sequencing peak statistics, facilitating variant prioritization, enrichment, and other types of quantitative analysis. 
                    <italic toggle="yes">StateHub</italic> hosts annotation tracks for major public consortia as a resource and allows users to submit their own alternative models.</p>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>epigenomics</kwd>
                <kwd>chromatin</kwd>
                <kwd>visualization</kwd>
                <kwd>methylation</kwd>
                <kwd>variant annotation</kwd>
                <kwd>ChIP-seq</kwd>
                <kwd>bioconductor</kwd>
            </kwd-group>
            <funding-group>
                <award-group id="fund-1">
                    <funding-source>National Institutes of Health</funding-source>
                    <award-id>RO1CA190182</award-id>
                    <award-id>UO1CA184826</award-id>
                </award-group>
                <funding-statement>The study was funded by National Institutes of Health (UO1CA184826 (BPB) and RO1CA190182 (DJH)).</funding-statement>
            </funding-group>
        </article-meta>
        <notes>
            <sec sec-type="version-changes">
                <label>Revised</label>
                <title>Amendments from Version 1</title>
                <p>The primary differences in the revised article are an expansion and clarification of the method, both in it's implementation and in the nature of the output it creates. Figures and figure legends have been updated to clarify the text. A new section has been written describing the scoring of annotations, and their relationship to enhancer prediction.</p>
            </sec>
        </notes>
    </front>
    <body>
        <sec sec-type="intro">
            <title>Introduction</title>
            <p>Chromatin segmentations are increasingly important for a broad area of research that includes regulatory genomics, genetic epidemiology, precision health, and molecular genetics. There is a need for consistent, unbiased resolution of chromatin states to interpret the epigenome and predict function across different tissues and cell types.</p>
            <p>Complex, overlapping patterns of post-translational modifications (PTM) to histone subunits
                <sup>
                    <xref ref-type="bibr" rid="ref-1">1</xref>,
                    <xref ref-type="bibr" rid="ref-2">2</xref>
                </sup>, signify differing states of chromatin activity. These modifications consist of mono-, di-, or tri-methylation and acetylation of histone 3 lysines 4, 9, 27, and 36
                <sup>
                    <xref ref-type="bibr" rid="ref-3">3</xref>
                </sup>. Direct assays for histone PTMs with next-generation sequencing (NGS) using chromatin immunoprecipitation (ChIP-seq) result in a set of genomic intervals with evidence for enrichment over background (input chromatin), using signal intensity.</p>
            <p>In addition to ChIP-seq of histone PTMs, there are also NGS methods for histone displacement, including DNase I hypersensitivity
                <sup>
                    <xref ref-type="bibr" rid="ref-4">4</xref>
                </sup> (DNase-seq or DHS), Formaldehyde Assisted Isolation of Regulatory Elements
                <sup>
                    <xref ref-type="bibr" rid="ref-5">5</xref>
                </sup> (FAIRE-seq), Assay for Transposase Accessible Chromatin
                <sup>
                    <xref ref-type="bibr" rid="ref-6">6</xref>
                </sup> (ATAC-seq) and Nucleosome Occupancy and Methylome sequencing (NOMe-seq)
                <sup>
                    <xref ref-type="bibr" rid="ref-7">7</xref>
                </sup>. Histone displacement, nucleosome positioning and DNA methylation are also detected in genomewide assays (
                <italic toggle="yes">e.g.</italic> whole genome bisulfite sequencing
                <sup>
                    <xref ref-type="bibr" rid="ref-8">8</xref>
                </sup>). Histone displacement is associated with transcription factor binding and transcriptional activity
                <sup>
                    <xref ref-type="bibr" rid="ref-9">9</xref>
                </sup>. In addition, direct binding of transcription factors is measured in ChIP-seq experiments with an antibody directed against a transcription factor or an epitope-tagged version.</p>
            <p>All these data are compatible with data represented as genomic intervals (in BED format), including CpG islands, annotated transcription start sites, repeat elements, 3&#x2032; UTRs. The input and final (output) processed data format are both represented as browser extensible data (.bed), a flexible standard for different peak calling methods (
                <italic toggle="yes">e.g.</italic> &#x201c;narrowPeak&#x201d; and &#x201c;broadPeak&#x201d; are types of .bed files).</p>
            <p>Several machine-learning approaches integrate NGS experiments into annotation tracks
                <sup>
                    <xref ref-type="bibr" rid="ref-10">10</xref>
                </sup>. The goal is to discover epigenomic states and aid in understanding &#x201c;non-coding&#x201d; genomic elements in an unbiased and biologically meaningful way. Newly discovered states are an amalgam of true functional categories of chromatin biology. The most popular and widely used of these machine learning methods is ChromHMM
                <sup>
                    <xref ref-type="bibr" rid="ref-11">11</xref>
                </sup>. Other machine-learning approaches include spectral-based learning
                <sup>
                    <xref ref-type="bibr" rid="ref-12">12</xref>
                </sup>, inference based on read counts
                <sup>
                    <xref ref-type="bibr" rid="ref-13">13</xref>
                </sup>, dynamic bayesian networks
                <sup>
                    <xref ref-type="bibr" rid="ref-14">14</xref>
                </sup>, probabilistic approaches
                <sup>
                    <xref ref-type="bibr" rid="ref-15">15</xref>
                </sup>, supervised enhancer detection
                <sup>
                    <xref ref-type="bibr" rid="ref-16">16</xref>
                </sup>, and other hidden Markov methods
                <sup>
                    <xref ref-type="bibr" rid="ref-17">17</xref>&#x2013;
                    <xref ref-type="bibr" rid="ref-19">19</xref>
                </sup>.</p>
            <p>The interpretability and general usefulness of the state predictions produced by these algorithms varies. A multitude of states often must be consolidated into simpler, biologically meaningful categories. Hoffman 
                <italic toggle="yes">et al.</italic>, recognized this problem when they proposed a combined meta-analysis of ChromHMM and Segway annotations
                <sup>
                    <xref ref-type="bibr" rid="ref-20">20</xref>
                </sup>. However, a software framework for expert or rule-based segmentations is still lacking. Comparisons across heterogeneous data sets, involving different learned models, or slightly different sets of epigenetic marks, must be performed carefully, tracking how annotations are created and which can be considered compatible. In addition, it is necessary to update information about what annotations are appropriate as new evidence about the combinatorial patterns of the epigenome come to light. Such methodology is needed for integrating different experimental data (including non-NGS data) in a reproducible way, reflecting both the novel insights gained from the machine learning methods and our current understanding of genome biology.</p>
            <p>Here we introduce 
                <italic toggle="yes">StateHub</italic> and 
                <italic toggle="yes">StatePaintR</italic> for generating and documenting chromatin state and other genome segmentation models in a transparent and reproducible fashion. 
                <italic toggle="yes">StateHub</italic> is a community resource for storing annotation models, state definitions and associated data in a shareable, referenceable form. The 
                <italic toggle="yes">StatePaintR</italic> package implements these models and state definitions to produce annotation tracks based on histone and other epigenomics marks, sequence features, and gene annotations. We show that 
                <italic toggle="yes">StatePaintR</italic> can be used to rapidly annotate large collections of public data for summarizing epigenomics data or annotation of variants. We show how annotations gracefully degrade, in that cell types or tissues with missing data types are annotated appropriately based upon available information. We show some use cases and describe how 
                <italic toggle="yes">StatePaintR</italic> uses ChIP-seq data peak statistics to rank the state prediction for each segment. The priority of the method is to provide a framework to express existing statements about the relationships of genomic annotations and how they combine to reveal underlying chromatin states thereby bypassing 
                <italic toggle="yes">denovo</italic> learning and annotating of states within each sample and annotating solely based upon simple rules and available data.</p>
        </sec>
        <sec sec-type="methods">
            <title>Methods</title>
            <sec>
                <title>Implementation</title>
                <p>
                    <italic toggle="yes">StatePaintR</italic> is implemented as a software package in the R language freely available from the Bioconductor repository: 
                    <ext-link ext-link-type="uri" xlink:href="http://www.bioconductor.org/packages/release/bioc/html/StatePaintR.html">www.bioconductor.org/packages/release/bioc/html/StatePaintR.html</ext-link>. The package contains functions for generating annotation tracks from called peaks specified as intervals according to the rules specified in a decision matrix and an abstraction layer describing the relationships between specific assays and functional categories. An abstraction layer may define a single functional category for a collection of assays that represent similar biology, 
                    <italic toggle="yes">e.g.</italic> assays for H3K27ac and H3K9ac may both represent an &#x201c;Active&#x201d; functional category. These data are supplied to 
                    <italic toggle="yes">StatePaintR</italic> in the form of BED files, or one of their extensions (
                    <italic toggle="yes">e.g.</italic> narrowPeaks, gappedPeaks), leaving it to the user to either call areas of enrichment/peaks in the manner they think best, or acquire pre-called peaks from a trusted source. The decision matrix encodes the relationship between these functional categories and specific chromatin states, where the values of any particular cell of this matrix must take any of 4 different values (
                    <xref ref-type="table" rid="T1">Table 1</xref>) indicating the nature of the relationship. Together the abstraction layer and the decision matrix describe a 
                    <italic toggle="yes">StatePaintR</italic> model.</p>
                <table-wrap id="T1" orientation="portrait" position="anchor">
                    <label>Table 1. </label>
                    <caption>
                        <title>
                            <italic toggle="yes">StatePaintR</italic> matrix values.</title>
                        <p>
                            <italic toggle="yes">StatePaintR</italic> assigns annotations according to custom rules specified in a matrix. The rules are represented as an integer code that takes any of 4 values [0&#x2013;3]. The meaning of each value is summarized in the table.</p>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="center" colspan="1" rowspan="1">required or
                                    <break/>state?</th>
                                <th align="center" colspan="1" rowspan="1">consistent
                                    <break/>with state?</th>
                                <th align="center" colspan="1" rowspan="1">binary
                                    <break/>value</th>
                                <th align="center" colspan="1" rowspan="1">decimal
                                    <break/>value</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="center" colspan="1" rowspan="1">No</td>
                                <td align="center" colspan="1" rowspan="1">No</td>
                                <td align="center" colspan="1" rowspan="1">00
                                    <sub>2</sub>
                                </td>
                                <td align="center" colspan="1" rowspan="1">0</td>
                            </tr>
                            <tr>
                                <td align="center" colspan="1" rowspan="1">No</td>
                                <td align="center" colspan="1" rowspan="1">Yes</td>
                                <td align="center" colspan="1" rowspan="1">01
                                    <sub>2</sub>
                                </td>
                                <td align="center" colspan="1" rowspan="1">1</td>
                            </tr>
                            <tr>
                                <td align="center" colspan="1" rowspan="1">Yes</td>
                                <td align="center" colspan="1" rowspan="1">No</td>
                                <td align="center" colspan="1" rowspan="1">10
                                    <sub>2</sub>
                                </td>
                                <td align="center" colspan="1" rowspan="1">2</td>
                            </tr>
                            <tr>
                                <td align="center" colspan="1" rowspan="1">Yes</td>
                                <td align="center" colspan="1" rowspan="1">Yes</td>
                                <td align="center" colspan="1" rowspan="1">11
                                    <sub>2</sub>
                                </td>
                                <td align="center" colspan="1" rowspan="1">3</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <p>Each cell of the decision matrix relates functional category to chromatin state in a 2-bit code representing the answers to two TRUE/FALSE questions (see 
                    <xref ref-type="table" rid="T1">Table 1</xref>). Is the functional category required in order to call the state? And, is overlap consistent with the state? For the purposes of explanation, examples below use the nomenclature of our &#x201c;focused poised promoter model&#x201d;, but a user may create their own model or modify the decision matrix or abstraction layer of an existing model. The cell of the decision matrix defining the relationship between the state &#x201c;Poised Promoter Region&#x201d; (PPR) and the functional category representing narrow peak calls of H3K27me3, &#x201c;PolycombNarrow&#x201d; is 3, representing the binary value 11
                    <sub>2</sub>. This encoding indicates that in order to call the PPR state on an interval, data representing the &#x201c;PolycombNarrow&#x201d; functional category is required to be present, and second, the interval in question must also overlap with a peak described by that functional category. A score of 2 representing the binary value 10
                    <sub>2</sub>, as in the cell describing the relationship between PPR and the functional category &#x201c;Active&#x201d;, indicates that in order for the interval to be annotated as PPR, data relating to &#x201c;Active&#x201d; must be present in the data set, but must 
                    <italic toggle="yes">not</italic> overlap the queried interval. A score of 0 representing the binary value 00
                    <sub>2</sub>, as in the cell for the functional category &#x201c;Core&#x201d; (which incorporates DHS, ATAC-Seq, and FAIRE peak calls) and PPR, indicates that it is not necessary for data represented by &#x201c;Core&#x201d; to be present, however if the &#x201c;Core&#x201d; data is present and overlapping the queried interval, the PPR state cannot be called. The category &#x201c;Translation marks&#x201d; does not affect PPR in this model, even if it overlaps. Marks that are essentially irrelevant to PPR such as this one are assigned 1 representing binary 01
                    <sub>2</sub>.</p>
                <p>Thus established, each row (as &#x201c;state&#x201d;) in the decision matrix is a unique combination of values describing the relationship of the functional categories to the state, where the rows are organized by the software in order of state complexity. 
                    <italic toggle="yes">StatePaintR</italic> first generates a GRanges list (an R object containing a list of chromosomes and interval coordinates with arbitrary metadata columns attached) of all uniquely mapping segment boundaries from the start and end coordinates of every peak in all files. 
                    <italic toggle="yes">StatePaintR</italic> then evaluates the presence or absence of each functional category and eliminates erroneous states. Next the program assesses overlaps of each segment to determine whether the conditions specified in each cell of the decision matrix are compatible with that segment, producing a boolean value. Rows with perfect matches in all cells are candidate state calls. Since 
                    <italic toggle="yes">StatePaintR</italic> evaluates in order of increasing state complexity, lower complexity states can be overwritten if higher complexity states match. This is very useful for building degeneracy in a model. An example of this in 
                    <xref ref-type="fig" rid="f1">Figure 1</xref> is illustrated by the states, ER and EAR. If active marks (
                    <italic toggle="yes">e.g.</italic> H3K27Ac) are not available for a given cell type, 
                    <italic toggle="yes">StatePaintR</italic> will annotate H3K4me1 marks as ER under our default model. In a different cell type for which H3K27Ac data are available, 
                    <italic toggle="yes">StatePaintR</italic> will know to distinguish between H3K4me1 enriched regions as either active or poised based on overlap of this second mark. Thus, a model can specify different state calls as appropriate based on the availability of data for each cell type. 
                    <italic toggle="yes">StatePaintR</italic> includes a peak score for each state drawn from all experiment categories (columns) that have a matrix value of 3, 
                    <italic toggle="yes">i.e.</italic> because they are required for and consistent with that state. The peak scores are rank normalized on a scale of 1 to 1,000, with 1 being the minimum peak size and 1000 being the maximum. If multiple categories are required, 
                    <italic toggle="yes">StatePaintR</italic> selects the median peak score for the annotation. This behavior can be overridden (see documentation for details).</p>
                <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                    <label>Figure 1. </label>
                    <caption>
                        <title>Mapping datasets to functional significance annotations.</title>
                        <p>Experimental data and external database annotations are combined into abstraction layers (columns), integrated to produce chromatin states (rows) from the decision matrix. 
                            <italic toggle="yes">StatePaintR</italic> produces state assignments by iteratively comparing the marks that are present in each segment with each row of data in the table. The values of color-coded squares signify relationship between data and state: 0 (light red) the feature/data type negates the state but is not required to be present, 1 (light green) feature is consistent with the state but not required, 2 (red) if the feature is required to be available and negates the state, and 3 (green) it is both required and consistent with the state. Complexity of states increases from top to bottom. For the example, red dotted arrows, proceeding downward, point to non-matching rows, and green arrows point to matching rows. The state call corresponds to the last matched row. In this example with the presence of H3K4me1 (&#x201c;Regulatory&#x201d;), H3K27ac (&#x201c;Active&#x201d;) and DNase1 hypersensitivity (&#x201c;Core&#x201d;), the first state consistent with the presence of these functional categories is &#x201c;Enhancer&#x201d;, followed by the increasingly more complex &#x201c;Regulatory Site&#x201d;, &#x201c;Active Chromatin&#x201d;, &#x201c;Active Enhancer&#x201d;, &#x201c;Enhancer Core&#x201d;, &#x201c;Active Chromatin Core&#x201d;, and finally &#x201c;Active Enhancer Core&#x201d;.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/20361/63d78e54-5d21-4948-857f-3cafb9bd7554_figure1.gif"/>
                </fig>
                <p>Finally, once all segments are annotated, and scored, 
                    <italic toggle="yes">StatePaintR</italic> is able to export these annotations as BED files that may be viewed in any genome browser. The package includes an R-markdown vignette. The current release version of this vignette is always available from the Bioconductor website.</p>
                <p>
                    <italic toggle="yes">StateHub</italic> is implemented as an interactive website (
                    <ext-link ext-link-type="uri" xlink:href="http://www.statehub.org/">www.statehub.org</ext-link>). 
                    <italic toggle="yes">StateHub</italic> contains a database implemented in 
                    <ext-link ext-link-type="uri" xlink:href="https://www.mongodb.com/">MongoDB</ext-link> and a search engine written with 
                    <ext-link ext-link-type="uri" xlink:href="http://www.gwtproject.org/">Google Web Toolkit (GWT)</ext-link>, which updates dynamically with user input. This database includes all models, model metadata and pre-computed 
                    <italic toggle="yes">StatePaintR</italic> browser tracks. Models are composite 
                    <ext-link ext-link-type="uri" xlink:href="http://www.json.org/">JSON</ext-link> objects that include an unique identifier, name, revision number, a searchable text description, and a model matrix (as defined in 
                    <xref ref-type="table" rid="T1">Table 1</xref>). The website also includes links to this manuscript, R-markdown containing code for figures, the latest version of the vignette, links to twitter feed and additional instructional materials.</p>
            </sec>
            <sec>
                <title>StateHub models</title>
                <p>The main text makes reference to two models in 
                    <italic toggle="yes">StateHub</italic> (
                    <ext-link ext-link-type="uri" xlink:href="http://www.statehub.org/">statehub.org</ext-link>). The unique identifiers of these models are as follows: &#x201c;Default&#x201d; (model ID: 581ff9f246e0fb06b4b6b178) and &#x201c;Focused Poised promoter&#x201d; (model ID: 5813b67f46e0fb06b493ceb0). In each of the two models presented and discussed in this paper we chose a naming convention for our states reflecting biological function.</p>
            </sec>
            <sec>
                <title>Annotation of public datasets</title>
                <p>Preprocessed peak calls were obtained from the IHEC and ENCODE websites (see 
                    <xref ref-type="table" rid="T2">Table 2</xref>) for hg19, and where possible hg38. Where possible we used IDR (Irreproducible Discovery Rate) processed narrowPeak calls for DHS and broadPeaks for broad marks (H3K27Ac, H3K4me1, H3K27me3, H3K36me3) unless otherwise specified in the model. A complete manifest with filenames, plus all annotation tracks are available on the 
                    <italic toggle="yes">StateHub</italic> website.</p>
                <table-wrap id="T2" orientation="portrait" position="anchor">
                    <label>Table 2. </label>
                    <caption>
                        <title>Annotation of public datasets.</title>
                        <p>Data from the indicated public consortia were downloaded and processed in 
                            <italic toggle="yes">StatePaintR</italic>. The resulting annotation files and browser sessions are available from the StateHub web page under each model page.</p>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="center" colspan="1" rowspan="1"/>
                                <th align="center" colspan="1" rowspan="1">hg19</th>
                                <th align="center" colspan="1" rowspan="1">hg38</th>
                                <th align="center" colspan="1" rowspan="1">mm10</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">Blueprint (IHEC)</td>
                                <td align="center" colspan="1" rowspan="1">&#x00a0;630</td>
                                <td align="center" colspan="1" rowspan="1">&#x00a0;548</td>
                                <td align="right" colspan="1" rowspan="1">0</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">CEEHRC (IHEC)</td>
                                <td align="center" colspan="1" rowspan="1">&#x00a0;158</td>
                                <td align="center" colspan="1" rowspan="1">&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;0</td>
                                <td align="right" colspan="1" rowspan="1">2</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">DEEP (IHEC)</td>
                                <td align="center" colspan="1" rowspan="1">&#x00a0;&#x00a0;&#x00a0;22</td>
                                <td align="center" colspan="1" rowspan="1">&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;0</td>
                                <td align="right" colspan="1" rowspan="1">6</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">ENCODE</td>
                                <td align="center" colspan="1" rowspan="1">&#x00a0;&#x00a0;&#x00a0;84</td>
                                <td align="center" colspan="1" rowspan="1">&#x00a0;109</td>
                                <td align="right" colspan="1" rowspan="1">98</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">Roadmap</td>
                                <td align="center" colspan="1" rowspan="1">&#x00a0;127</td>
                                <td align="center" colspan="1" rowspan="1">&#x00a0;&#x00a0;&#x00a0;&#x00a0;&#x00a0;0</td>
                                <td align="right" colspan="1" rowspan="1">0</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
            </sec>
            <sec>
                <title>Enrichment calculations</title>
                <p>
                    <italic toggle="yes">Parkinson&#x2019;s GWAS variants.</italic> To illustrate the use of StatePaintR chromatin state segmentations in GWAS functional annotations, we revisited an earlier study of Parkinson&#x2019;s disease in which we tested for tissue-specific enrichment of genetic associations. Parkinson&#x2019;s GWAS variants were obtained from a previously published large scale meta-analysis
                    <sup>
                        <xref ref-type="bibr" rid="ref-21">21</xref>
                    </sup>. We used a beta-binomial conjugate distribution to estimate the credible range of differences in overlaps between observed (GWAS hits) 
                    <italic toggle="yes">vs</italic>. random variants. To calculate enrichment we selected all variants within 1 MB of the index SNP in each region with a minor allele frequency (MAF) &gt; 0.01, defining foreground as SNPs in linkage disequilibrium with the index SNP at a cutoff of 
                    <italic toggle="yes">r</italic>
                    <sup>2</sup> &gt; 0.8 and background as all SNPs inclusive (MAF &gt; 0.01). 
                    <italic toggle="yes">Enrichment in genomic annotations.</italic> Analyses and graphics were produced using the SegTools package
                    <sup>
                        <xref ref-type="bibr" rid="ref-22">22</xref>
                    </sup>.</p>
            </sec>
            <sec>
                <title>Analysis of methylation data</title>
                <p>To select methylation variants, we analyzed the Infinium HM450 data of 114 ovarian tumor samples
                    <sup>
                        <xref ref-type="bibr" rid="ref-23">23</xref>
                    </sup> and 216 control normal Fallopian tube samples
                    <sup>
                        <xref ref-type="bibr" rid="ref-24">24</xref>
                    </sup>. We define differentially methylated regions as those having a difference in beta values of 0.3 (cancer 
                    <italic toggle="yes">vs.</italic> normal) and significance in Mann-Whitney U-test (FDR-corrected p-value &lt; 0.01). We then performed enrichment calculations using overlaps between probes that were hypermethylated in cancer 
                    <italic toggle="yes">vs.</italic> normal and the state calls from two models described above and in the text. The enrichment calculations were done with fisher&#x2019;s exact test using the complete HM450 probeset as background.</p>
            </sec>
            <sec>
                <title>Operation</title>
                <p>All code used to generate figures, tables, and this manuscript is included as an R-markdown document (
                    <xref ref-type="other" rid="SF1">Supplementary File 1</xref>)
                    <sup>
                        <xref ref-type="bibr" rid="ref-25">25</xref>
                    </sup>. A copy of this document may also be obtained from the 
                    <italic toggle="yes">StateHub</italic> website. In addition, a workflow vignette is available from the bioconductor package and mirrored on the github repository at 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/Simon-Coetzee/StatePaintR">github.com/Simon-Coetzee/StatePaintR</ext-link>.</p>
            </sec>
        </sec>
        <sec sec-type="results">
            <title>Results</title>
            <sec>
                <title>A framework for rule-based annotation</title>
                <p>In order to assign chromatin states, it is necessary to account for the complex interplay of input from genomic annotations and cell-type-specific experimental data sources that define and demarcate functional regions of the genome
                    <sup>
                        <xref ref-type="bibr" rid="ref-1">1</xref>
                    </sup>. Computationally they have to be put in the right order to avoid erroneous overwriting of information-rich categories with information-poor ones.</p>
                <p>We initially wrote a model as a decision tree, encompassing a set of basic rules for annotation, but this approach was limited in that any small change to the model necessitated a near complete re-write of our software. Secondary to this, we wanted a solution that would enable us to specify any change in the model and have it produced the same way as all previous models while minimizing software updates. And thirdly, we felt that any such model should be reproducible, documented, citable and extensible to any combination of experiments. Moreover from a bioinformatics perspective, we felt that any two colleagues working separately should be able to produce precisely the same annotations from the same datasets and models. To satisfy these different requirements we separated the model specification from the annotation tool. We implemented model-specification as a decision matrix, which has the advantage of separating model specification from software, enabling complete explicit control of the annotation software without computer programming expertise.</p>
                <p>We created a searchable website, 
                    <italic toggle="yes">
                        <ext-link ext-link-type="uri" xlink:href="http://www.statehub.org/">StateHub</ext-link>
                    </italic>, to host a permanent repository of models, document model objects and make them available as a resource to the community. The 
                    <ext-link ext-link-type="uri" xlink:href="http://www.bioconductor.org/packages/release/bioc/html/StatePaintR.html">
                        <italic toggle="yes">StatePaintR</italic> package</ext-link> retrieves models from 
                    <italic toggle="yes">StateHub</italic> and performs annotations on local data. Thus, 
                    <italic toggle="yes">StateHub</italic>- 
                    <italic toggle="yes">StatePaintR</italic> is a framework to document models and apply them to annotate genomic data. The models in 
                    <italic toggle="yes">StateHub</italic> consist of an abstraction layer, defining the relationships between data sources and functional categories. These categories are integrated to produce annotations (left hand column, &#x201c;Chromatin States&#x201d;) via a decision matrix (
                    <xref ref-type="fig" rid="f1">Figure 1</xref>). Within the model each state has associated descriptions of arbitrary length which may contain key words or other relevant details (bottom right).</p>
            </sec>
            <sec>
                <title>Annotation scoring</title>
                <p>
                    <italic toggle="yes">StatePaintR</italic> enables rank scoring of all states, allowing prioritization for non-coding variant annotation. No other existing tool does both chromatin state annotation and rank evaluation simultaneously. Thus, while machine learning chromatin segmentation methods are focused on label assignment alone, our paradigm preserves critical quality information from the underlying ChIP-seq data to arrive at overall rank scores. We used these rank scores to generate precision recall statistics for predicting experimentally validated enhancer regions from the VISTA database
                    <sup>
                        <xref ref-type="bibr" rid="ref-26">26</xref>
                    </sup>. Our method outperformed most other methods aimed at predicting enhancers (
                    <xref ref-type="table" rid="T3">Table 3</xref>). Unlike other methods, our tool did not rely on training data and not only was able to predict and score enhancer states, but any other arbitrary states that can be described using the 
                    <italic toggle="yes">StateHub</italic> definition language. No other existing tools provide this functionality with chromatin segmentation.</p>
                <table-wrap id="T3" orientation="portrait" position="anchor">
                    <label>Table 3. </label>
                    <caption>
                        <title>Relative performance of 
                            <italic toggle="yes">StatePaintR</italic> enhancer ranking 
                            <italic toggle="yes">vs</italic>. VISTA enhancers
                            <sup>
                                <xref ref-type="bibr" rid="ref-27">27</xref>
                            </sup>.</title>
                        <p>Columns 2&#x2013;6 reflect the area under the precision-recall gain (auprg) curve. Highest scoring algorithm noted with *.</p>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1">source</th>
                                <th align="center" colspan="1" rowspan="1">neural tube</th>
                                <th align="center" colspan="1" rowspan="1">mid-brain</th>
                                <th align="center" colspan="1" rowspan="1">hind-brain</th>
                                <th align="center" colspan="1" rowspan="1">limb</th>
                                <th align="center" colspan="1" rowspan="1">heart</th>
                                <th align="center" colspan="1" rowspan="1">average
                                    <break/>auprg</th>
                                <th align="center" colspan="1" rowspan="1">average
                                    <break/>rank</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">REPTILE</td>
                                <td align="center" colspan="1" rowspan="1">
                                    <bold>
                                        <italic toggle="yes">0.86*</italic>
                                    </bold>
                                </td>
                                <td align="center" colspan="1" rowspan="1">
                                    <bold>
                                        <italic toggle="yes">0.87*</italic>
                                    </bold>
                                </td>
                                <td align="center" colspan="1" rowspan="1">0.76</td>
                                <td align="center" colspan="1" rowspan="1">
                                    <bold>
                                        <italic toggle="yes">0.89*</italic>
                                    </bold>
                                </td>
                                <td align="center" colspan="1" rowspan="1">
                                    <bold>0.92</bold>
                                </td>
                                <td align="center" colspan="1" rowspan="1">0.86</td>
                                <td align="center" colspan="1" rowspan="1">2.0</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">StatePaintR
                                    <xref ref-type="other" rid="TFN3">&#x2020;</xref>
                                </td>
                                <td align="center" colspan="1" rowspan="1">
                                    <bold>0.84</bold>
                                </td>
                                <td align="center" colspan="1" rowspan="1">0.84</td>
                                <td align="center" colspan="1" rowspan="1">
                                    <bold>0.79</bold>
                                </td>
                                <td align="center" colspan="1" rowspan="1">
                                    <bold>0.85</bold>
                                </td>
                                <td align="center" colspan="1" rowspan="1">0.88</td>
                                <td align="center" colspan="1" rowspan="1">0.84</td>
                                <td align="center" colspan="1" rowspan="1">3.0</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">RFECS</td>
                                <td align="center" colspan="1" rowspan="1">0.79</td>
                                <td align="center" colspan="1" rowspan="1">
                                    <bold>0.85</bold>
                                </td>
                                <td align="center" colspan="1" rowspan="1">0.78</td>
                                <td align="center" colspan="1" rowspan="1">
                                    <bold>0.85</bold>
                                </td>
                                <td align="center" colspan="1" rowspan="1">
                                    <bold>0.92</bold>
                                </td>
                                <td align="center" colspan="1" rowspan="1">0.84</td>
                                <td align="center" colspan="1" rowspan="1">3.0</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">ENCODE</td>
                                <td align="center" colspan="1" rowspan="1">0.82</td>
                                <td align="center" colspan="1" rowspan="1">0.82</td>
                                <td align="center" colspan="1" rowspan="1">
                                    <bold>
                                        <italic toggle="yes">0.80*</italic>
                                    </bold>
                                </td>
                                <td align="center" colspan="1" rowspan="1">
                                    <bold>0.85</bold>
                                </td>
                                <td align="center" colspan="1" rowspan="1">0.88</td>
                                <td align="center" colspan="1" rowspan="1">0.83</td>
                                <td align="center" colspan="1" rowspan="1">3.4</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">DELTA</td>
                                <td align="center" colspan="1" rowspan="1">0.81</td>
                                <td align="center" colspan="1" rowspan="1">0.84</td>
                                <td align="center" colspan="1" rowspan="1">0.76</td>
                                <td align="center" colspan="1" rowspan="1">0.84</td>
                                <td align="center" colspan="1" rowspan="1">
                                    <bold>
                                        <italic toggle="yes">0.93*</italic>
                                    </bold>
                                </td>
                                <td align="center" colspan="1" rowspan="1">0.84</td>
                                <td align="center" colspan="1" rowspan="1">3.6</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">CSIANN</td>
                                <td align="center" colspan="1" rowspan="1">0.72</td>
                                <td align="center" colspan="1" rowspan="1">0.68</td>
                                <td align="center" colspan="1" rowspan="1">0.62</td>
                                <td align="center" colspan="1" rowspan="1">0.69</td>
                                <td align="center" colspan="1" rowspan="1">0.84</td>
                                <td align="center" colspan="1" rowspan="1">0.71</td>
                                <td align="center" colspan="1" rowspan="1">6.2</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">EnhancerFinder</td>
                                <td align="center" colspan="1" rowspan="1">NA</td>
                                <td align="center" colspan="1" rowspan="1">0.59</td>
                                <td align="center" colspan="1" rowspan="1">0.63</td>
                                <td align="center" colspan="1" rowspan="1">0.67</td>
                                <td align="center" colspan="1" rowspan="1">0.82</td>
                                <td align="center" colspan="1" rowspan="1">0.68</td>
                                <td align="center" colspan="1" rowspan="1">6.8</td>
                            </tr>
                        </tbody>
                    </table>
                    <table-wrap-foot>
                        <fn>
                            <p id="TFN3">
                                <sup>&#x2020;</sup>Annotations using &#x201c;poised promoter&#x201d; model as described in the text.</p>
                        </fn>
                    </table-wrap-foot>
                </table-wrap>
            </sec>
        </sec>
        <sec>
            <title>Use cases</title>
            <sec>
                <title>Segmentation of public datasets</title>
                <p>We generated annotations of 119 ENCODE cell lines
                    <sup>
                        <xref ref-type="bibr" rid="ref-26">26</xref>
                    </sup>, 128 Roadmap tissues
                    <sup>
                        <xref ref-type="bibr" rid="ref-28">28</xref>
                    </sup>, 26 cell lines and tissues from CEEHRC (peak calls obtained from the 
                    <ext-link ext-link-type="uri" xlink:href="http://epigenomesportal.ca/ihec/index.html">IHEC website</ext-link>), and 23 blood cell types from Blueprint (download at 
                    <ext-link ext-link-type="uri" xlink:href="http://www.statehub.org/">statehub.org</ext-link>). On a desktop PC it takes approximately 12&#x2013;15 seconds to produce an annotation from a typical cell line, depending on the number of datasets and intervals (see 
                    <xref ref-type="other" rid="SF2">Figure S1</xref>). 
                    <italic toggle="yes">StatePaintR</italic> produces genome browser compatible BED files with color-coded state annotations (specified in 
                    <italic toggle="yes">StateHub</italic> model). 
                    <xref ref-type="fig" rid="f2">Figure 2</xref> shows a representative region around the 
                    <italic toggle="yes">POLR2A</italic> gene from a subset of 77 high-quality (minimum 15 million reads) tissue samples and cell lines with H3K27Ac data from Roadmap. A complete manifest for processing these data is included in additional files 1.</p>
                <fig fig-type="figure" id="f2" orientation="portrait" position="float">
                    <label>Figure 2. </label>
                    <caption>
                        <title>Annotation of public epigenomics data sets.</title>
                        <p>Annotations of 77 cell types from the Roadmap Epigenomics consortium, including some Roadmap-processed ENCODE data, selected for their high quality with default model. Roadmap tissues are clustered and color coded at left according to the same color scheme used in Roadmap publications
                            <sup>
                                <xref ref-type="bibr" rid="ref-28">28</xref>
                            </sup>.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/20361/63d78e54-5d21-4948-857f-3cafb9bd7554_figure2.gif"/>
                </fig>
            </sec>
            <sec>
                <title>Annotation of genome-wide association studies</title>
                <p>A common use of genome annotation is to assign putative function to genetic loci identified by genome-wide association studies (GWAS), particularly for non-coding regions. We previously used a custom annotation of Roadmap tissues based on the approach described in this manuscript to identify locus-specific tissue enrichment in variants associated with Parkinson&#x2019;s disease
                    <sup>
                        <xref ref-type="bibr" rid="ref-29">29</xref>
                    </sup>. In that study, we displayed locus-by-tissue enrichment as a heat-map. Here we present a similar analysis using our new 
                    <italic toggle="yes">StateHub</italic> model as the basis for an alternative visualization. Since we showed that Parkinson&#x2019;s disease variants are primarily associated with enhancers and promoters
                    <sup>
                        <xref ref-type="bibr" rid="ref-29">29</xref>
                    </sup>, we plotted the 95% range of credible values for enrichment in enhancers and promoters vs background SNPs (matched for GC content &amp; minor allele frequency). Each locus (row) is plotted against a selection of tissues in Roadmap (
                    <xref ref-type="fig" rid="f3">Figure 3</xref>).</p>
                <fig fig-type="figure" id="f3" orientation="portrait" position="float">
                    <label>Figure 3. </label>
                    <caption>
                        <title>Locus- and tissue-specific enrichment of Parkinson&#x2019;s GWAS variants.</title>
                        <p>Bars: 95% credible range for enrichment of Parkinson&#x2019;s GWAS variants and LD proxies with R
                            <sup>2</sup> &#x2265; 0.8 in the union of active enhancers and promoters vs SNPs in the region with similar minor allele frequency and R
                            <sup>2</sup> &lt; 0.8, for each of 4 independent genetic loci. 
                            <italic toggle="yes">&#x03b8;</italic>
                            <sub>1</sub>, 
                            <italic toggle="yes">&#x03b8;</italic>
                            <sub>2</sub> relative enrichment in foreground and background sets, respectively. 
                            <italic toggle="yes">a</italic>
                            <sub>1</sub>, 
                            <italic toggle="yes">b</italic>
                            <sub>1</sub> number of foreground SNPs overlapping biofeatures or not-overlapping, respectively. 
                            <italic toggle="yes">a</italic>
                            <sub>2</sub>, 
                            <italic toggle="yes">b</italic>
                            <sub>2</sub> number of background SNPs overlapping biofeatures or not-overlapping, respectively. 
                            <italic toggle="yes">a</italic> and 
                            <italic toggle="yes">b</italic> are shape parameters of a beta distributed prior. Significant enrichment profiles for roadmap tissues are displayed in color (REMC lineage-specific colors); non-significant are gray.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/20361/63d78e54-5d21-4948-857f-3cafb9bd7554_figure3.gif"/>
                </fig>
            </sec>
            <sec>
                <title>Evaluation of two models with respect to cancer methylation</title>
                <p>Our &#x201c;default&#x201d; model proposes a class of enhancers and promoters in a poised state, an &#x201c;Enhancer Poised Region&#x201d; (EPR) and a &#x201c;Promoter Poised Region&#x201d; (PPR). These features have H3K4me1 or H3K4me3 and lack H3K27Ac. This model also classifies H3K27me3 as silenced/polycomb repressed (SCR). To investigate functional enrichment of methylation variants, we looked at how differentially methylated regions (DMR) in ovarian cancer tumors partition between chromatin states as defined in this model (
                    <xref ref-type="fig" rid="f1">Figure 1</xref>).</p>
                <p>From previous work, CpG islands containing temporarily silenced (poised) genes by polycomb repressive complex in normal tissues may acquire DNA methylation during cancer formation resulting in permanent silencing
                    <sup>
                        <xref ref-type="bibr" rid="ref-30">30</xref>,
                        <xref ref-type="bibr" rid="ref-24">24</xref>
                    </sup>. While the segments called EPR and PPR were associated with hypermethylated probes in ovarian cancer across tissues, the magnitude of enrichment was not great (see 
                    <xref ref-type="fig" rid="f4">Figure 4</xref>, &#x201c;Model 1&#x201d;), and it remained possible that our state definitions were too broad.</p>
                <fig fig-type="figure" id="f4" orientation="portrait" position="float">
                    <label>Figure 4. </label>
                    <caption>
                        <title>Example of model comparisons.</title>
                        <p>Enrichment as in 
                            <xref ref-type="fig" rid="f3">Figure 3</xref> using either of two different state models (model 1 and model 2) from StateHub, &#x201c;Default&#x201d; and &#x201c;Focused Poised Promoter&#x201d;, which differ in the treatment of poised promoters. The association of hypermethylated regions in ovarian cancer with poised enhancer (&#x201c;Enhancer Poised Regions&#x201d; &#x2013; EPR) and promoters (&#x201c;Promoter Poised Regions&#x201d; &#x2013; PPR) across roadmap tissues are indicated by odds-ratio in the Y-axis. Y-axis range is the same for both plots. Both models distinguish hypermethylated probes in the poised state but model 2 is more selective than model 1. In this model (2) enhancers with H3K4me1 and promoters with H3K4me3 overlapping narrow regions of H3K27me3 are poised (EPR and PPR), but those without H3K27me3 are called weak (EWR and PWR). Model 1, by contrast, assigns promoters lacking active marks to the poised state.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/20361/63d78e54-5d21-4948-857f-3cafb9bd7554_figure4.gif"/>
                </fig>
                <p>One hypothesis is that poised promoters are distinguishable by the presence or absence of focused H3K27me3, in particular the narrowPeak calls (as opposed to broad, low-level enrichment from broadPeak files used in model 1). To address this hypothesis, we repeated the analysis in 
                    <xref ref-type="fig" rid="f4">Figure 4</xref> for an alternative model (model 2; &#x201c;focused poised promoter&#x201d;) in which H3K27me3 is called as both broadPeak and narrowPeaks. We use the H3K27me3 broadPeak file as in the previous model to identify repressed regions, and H3K27me3 narrowPeaks to identify poised states (EPR and PPR). Enhancers lacking H3K27Ac and H3K27me3 were classified as weak enhancers and promoters (&#x201c;Enhancer Weak Regions&#x201d;, EWR and &#x201c;Promoter Weak Regions&#x201d; PWR, not shown in 
                    <xref ref-type="fig" rid="f4">Figure 4</xref>). Regulatory elements with these properties have also been called &#x201c;primed&#x201d;
                    <sup>
                        <xref ref-type="bibr" rid="ref-31">31</xref>
                    </sup>.</p>
                <p>We found greater enrichment when we defined poised states in this way (compare model 2 (focused poised promoter) with model 1 (default) in 
                    <xref ref-type="fig" rid="f4">Figure 4</xref>). The hypermethylated ovarian cancer CpGs were more enriched in EPR, PPR, and SCR states as defined in the focused poised promoter model relative to the default model, and hypomethylated probes were enriched only in HET and SCR states (not shown). The odds ratio of enrichment for hypermethylated CpGs in EPR and PPR from the default model fell in a range between 0 and 5. However, the enrichment of the hypermethylated probes in our focused poised promoter model was &gt; 5 in PPR and &gt; 10 in EPR (
                    <xref ref-type="fig" rid="f4">Figure 4</xref>, model 2). Thus, ovarian hypermethylated probes are enriched across Roadmap tissues in H3K27me3+ enhancers and promoters, and we concluded that H3K27me3 narrowPeaks are an important distinguishing feature for this class.</p>
            </sec>
            <sec>
                <title>Enrichment of functional annotation</title>
                <p>Next, we characterized the distribution of states in our focused poised promoter model relative to Gencode v37 gene annotations and also to enhancers from Ensembl
                    <sup>
                        <xref ref-type="bibr" rid="ref-32">32</xref>
                    </sup>. 
                    <xref ref-type="fig" rid="f5">Figure 5</xref> shows the relative enrichment of Human mammary epithelial cell (HMEC) chromatin states in each of these features. We found enrichment in Ensembl enhancers for three states: Active enhancer (EAR), Active regions (AR) and Weak enhancer (EWR). The definition of &#x201c;active enhancer&#x201d; in the Ensembl build is cumulative across cell types
                    <sup>
                        <xref ref-type="bibr" rid="ref-32">32</xref>
                    </sup> and therefore includes many cell-type specific enhancers that would be predicted to be weak (having exclusively H3K4me1) in a particular cell line such as HMEC. These three states were not enriched in any other category of genomic annotations. Likewise, we found enrichment of the inactive enhancers in Transcribed (TRS) and Silenced/Polycomb (SCR). TRS was most enriched in gene body annotations, particularly internal exons and introns. SCR and Heterochromatin (HET) were depleted across all categories. Lastly, the 5&#x2032;, first exon and first intron regions were enriched in active and weak promoters, consistent with the role of these regions in transcription initiation.</p>
                <fig fig-type="figure" id="f5" orientation="portrait" position="float">
                    <label>Figure 5. </label>
                    <caption>
                        <title>Enrichment in genomic annotations.</title>
                        <p>Relative enrichment of called states genomewide from HMEC in annotations from Ensembl and Gencode. Genegraph (top) visualization of the regions indicated for each column. Enrichment is 
                            <italic toggle="yes">log</italic>
                            <sub>2</sub> observed over random. Positive enrichment is indicated with mustard color (scale from 0 to 0.66) vs. relative depletion in purple (scale from 0 to -0.37).</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/20361/63d78e54-5d21-4948-857f-3cafb9bd7554_figure5.gif"/>
                </fig>
            </sec>
            <sec>
                <title>Enhancer predictions</title>
                <p>To use ChIP-seq data for quantitative analysis, we ranked within each state by peak score from Macs2 output (generic peak height). We programmed 
                    <italic toggle="yes">StatePaintR</italic> to rank each state by normalizing on a scale of 1&#x2013;1000, 1000 being the highest rank. 
                    <italic toggle="yes">StatePaintR</italic> ranks the required dataset(s) for each state (
                    <italic toggle="yes">i.e.</italic> assigned &#x201c;3&#x201d; in the decision matrix). To evaluate the ranking function, we measured area under the precision-recall-gain curve (AUPRG) using the set of experimentally validated human and mouse noncoding fragments with gene enhancer activity as assessed in transgenic mice (
                    <ext-link ext-link-type="uri" xlink:href="https://enhancer.lbl.gov/">VISTA enhancer browser</ext-link> and 
                    <xref ref-type="bibr" rid="ref-27">27</xref>). We randomly sampled 100 enhancers from 7 VISTA tissues to evaluate different aspects of our models (training), and then used the remainder of the data to test our enhancer predictions against previously published predictions using the same data sets.</p>
                <p>Some states, including the ones that are germane for enhancer prediction, reference more than one required (matrix value 3) dataset, and therefore it was necessary to optimize the best method for ranking based on &gt; 1 ChIP-seq experiment. We computed the average, median and ceiling functions of ranks across multiple ChIP-seq tracks. The three methods were comparable, but median and average produced the best results (
                    <xref ref-type="other" rid="SF3">Figure S2</xref>). There are three required marks for active enhancers in our model, but if one of them is not informative for active enhancer prediction, using the ceiling &#x201c;max&#x201d; method would produce false positives when this mark has the highest peak rank. Therefore, we interrogated which marks are informative using a leave-one-out approach. We found that leaving out H3K4me1 significantly improved our predictions, whereas leaving out the other marks did not (
                    <xref ref-type="other" rid="SF4">Figure S3</xref>).</p>
                <p>Next we assessed AUPRG of different state calls 
                    <italic toggle="yes">vs.</italic> VISTA enhancers and found that predictive power descends in order AR + EAR &gt; EAR &gt; AR &gt; RPS &gt; EPRC &gt; etc (
                    <xref ref-type="other" rid="SF5">Figure S4</xref>). When we tried combinations of states the highest precision recall gain was observed for EAR, EARC, AR and ARC added together (
                    <xref ref-type="other" rid="SF5">Figure S4</xref>), and this was greater than other combinations and than any of the state calls individually. H3K27Ac is the only mark common to all these states, suggesting that H3K27Ac is the most informative predictor of enhancers.</p>
                <p>Since H3K4me1 does not improve predictions and is the only thing that distinguishes between AR and EAR (by its presence or absence), an improved model would consolidate AR and EAR into a single state and reassign &#x201c;1&#x201d; to H3K4me1 instead of &#x201c;3&#x201d;, leaving this mark exclusively to define weak (or primed) promoters.</p>
                <p>To validate our method of enhancer prediction, we compared our predictions with ENCODE Encyclopedia, Version 3 (
                    <ext-link ext-link-type="uri" xlink:href="http://zlab-annotations.umassmed.edu/enhancers/methods">zlab-annotations.umassmed.edu</ext-link>), EnhancerFinder, RFECS, DELTA, CSIANN, and REPTILE
                    <sup>
                        <xref ref-type="bibr" rid="ref-33">33</xref>&#x2013;
                        <xref ref-type="bibr" rid="ref-37">37</xref>
                    </sup> for held-out data using AUPRG (
                    <xref ref-type="other" rid="SF5">Figure S5</xref>)
                    <sup>
                        <xref ref-type="bibr" rid="ref-38">38</xref>
                    </sup>.</p>
                <p>Our predictions are comparable to the Encode model that uses H3K27Ac overlapping with distal DHS, RFECS and REPTILE, which had the lowest average rank across tissues (
                    <xref ref-type="table" rid="T3">Table 3</xref>, 
                    <xref ref-type="other" rid="SF5">Figure S5</xref>). Our predictions compared favorably to EnhancerFinder and CSIANN which had an average rank &gt; 6 across the different tissues; heart, midbrain, hindbrain, neural tube and limb. Predictions are only available for these tissues. Thus, 
                    <italic toggle="yes">StatePaintR</italic> ranking is useful for drawing quantitative comparisons between different models, making predictions, or prioritizing regions for functional evidence.</p>
            </sec>
        </sec>
        <sec sec-type="discussion">
            <title>Discussion</title>
            <p>We created a platform for hosting, browsing, and generating new genome annotation models called 
                <italic toggle="yes">StateHub</italic>. The 
                <italic toggle="yes">StateHub</italic> framework makes it possible to specify combinations of genomic data as they relate to regions of functional significance in epigenetically marked chromatin. In addition, we created a software package, 
                <italic toggle="yes">StatePaintR</italic>, that facilitates the use of 
                <italic toggle="yes">StateHub</italic> models to generate browser tracks for bioinformatic analyses. We showed how 
                <italic toggle="yes">StatePaintR</italic> can be used as part of a workflow with uniformly processed data to generate reproducible annotations from public and private data sources.</p>
            <p>Our framework does not replace current machine learning methods, the aim of which is to discover states. But these methods suffer from certain drawbacks that we have addressed with a rule-based approach that provides greater transparency and reproducibility. For example, it is often the case with machine-learning methods that more states are discovered than immediately understood, and there have been different solutions proposed. During discovery, one could iteratively reduce the number of states, minimizing the number of similar or redundant combinations of histone marks. Then the number of discovered states would depend on the number of unique data types used for learning and their distribution around known features. This procedure makes replication in different settings (in different labs or with different types of experiments) nearly impossible. Our method avoids these issues, allowing users to specify a model of the epigenome in a matrix (as in 
                <xref ref-type="fig" rid="f1">Figure 1</xref>) that accounts for all known possibilities. Thus, we built a comprehensive framework for a rule-based annotation, reflecting current hypotheses (or models) of the epigenome.</p>
            <p>A significant drawback of our approach is that some unusual combinations of marks that may have biological function will be ignored. This has much to do with the fact that 
                <italic toggle="yes">StatePaintR</italic> is not for discovering novel states, but rather for annotating the genome according to a specific, existing model. Nonetheless, the label assignment step of other chromatin state discovery tools also suffers the same limitations; states are aggregated or optimized in an iterative fashion based on prior knowledge and assumptions. ENCODE for example has published tracks for both ChromHMM and Segway that include multiple states with similar names (
                <italic toggle="yes">e.g.</italic> &#x201c;Tss&#x201d; 
                <italic toggle="yes">vs.</italic> &#x201c;TssF&#x201d; from ChromHMM, and &#x201c;EnhF1&#x201d; 
                <italic toggle="yes">vs.</italic> &#x201c;EnhF3&#x201d; from Segway
                <sup>
                    <xref ref-type="bibr" rid="ref-20">20</xref>
                </sup>). To resolve discrepancies between the two methods, the authors of those studies proposed a combined analysis to simplify the number of state labels and summarize discovery using a rule-based metric not unlike a 
                <italic toggle="yes">StateHub</italic> model. Thus, they classified regions into 7 types &#x201c;emphasizing biologically meaningful differences&#x201d;
                <sup>
                    <xref ref-type="bibr" rid="ref-20">20</xref>
                </sup>. In direct comparisons, we found that our own annotations exhibited greater similarity to the combined analysis than to either of the Segway or ChromHMM tracks separately (not shown). Whatever the protocol, the basic problem persists; machine-learning is able to provide insight into what the categories are, but not how many categories there should be. Currently this remains the exclusive province of the biologist.</p>
            <p>One of the additional challenges is compatibility between data sets. In order for two or more cell types to be annotated according to the same model, it is necessary to combine each of the cell types for the training step. One solution is concatenation of genomes
                <sup>
                    <xref ref-type="bibr" rid="ref-20">20</xref>
                </sup>. Another approach is to jointly model epigenomes in parallel, as proposed in Integrative and Discriminative Epigenome Annotation System (IDEAS)
                <sup>
                    <xref ref-type="bibr" rid="ref-39">39</xref>
                </sup>. This approach has the distinctive advantage of also modeling segment boundaries. Our approach does not model boundaries, but does offer some advantages. One is reproducibility: 
                <italic toggle="yes">StatePaintR</italic> always produces the same annotation independently for each cell type from the same model. Secondly, even samples with different types of data or missing data result in compatible annotations because they come from the same model. Third, the models, composed of a 2D matrix with a range of 4 values, are relatively easy to understand and author. Every file produced in 
                <italic toggle="yes">StatePaintR</italic> contains a record of the model ID, genome version and all the source files. Clinicians working with human genetics will value consistency and reproducibility across datasets. We produced annotations for REMC, ENCODE, IHEC and blueprint and made these available on the 
                <italic toggle="yes">StateHub</italic> website for the two models described in this paper. The website also has links to browser sessions where they can be explored and used to create figures. A fourth advantage is speed: samples can be processed in parallel and there is no computationally expensive learning step, allowing a typical sample to be annotated in 15 seconds (
                <xref ref-type="other" rid="SF2">Figure S1</xref>).</p>
            <p>A final feature that is very useful is the ranking by peak score (
                <xref ref-type="other" rid="SF6">Figure S5</xref>). Using this scheme, we investigated what states contribute most to true enhancers (
                <xref ref-type="other" rid="SF3">Figure S2</xref>&#x2013;
                <xref ref-type="other" rid="SF5">Figure S4</xref>). We found that H3K27Ac defined the best predictive subset of annotations for VISTA enhancers. We also investigated different approaches for handling multiple peak calls for a state and found the median to be optimal (
                <xref ref-type="other" rid="SF3">Figure S2</xref>), and incorporated this method as the default behavior of 
                <italic toggle="yes">StatePaintR</italic>. When we compared our predictions to held-out data, they were comparable to the best enhancer predictions
                <sup>
                    <xref ref-type="bibr" rid="ref-34">34</xref>,
                    <xref ref-type="bibr" rid="ref-37">37</xref>
                </sup> and ENCODE enhancers
                <sup>
                    <xref ref-type="bibr" rid="ref-26">26</xref>
                </sup> and 
                <ext-link ext-link-type="uri" xlink:href="http://zlab-annotations.umassmed.edu/enhancers/methods">on the web</ext-link> (unpublished).</p>
            <p>We demonstrated a workflow wherein new models generate annotations, which are used to test predictions against experimental data, and then in turn to make improvements to old models. We anticipate that this will be valuable in testing new ideas and hypotheses generated from unsupervised methods. The ability to rank features also aids in prioritizing variants for GWAS and studies of somatic mutations. Knowing which variants overlap features in the epigenomic landscape of a particular cell type is key. In the future, other methods may become available for incorporation into 
                <italic toggle="yes">StatePaintR</italic> but the models described in 
                <italic toggle="yes">StateHub</italic> will remain stable.</p>
        </sec>
        <sec sec-type="conclusions">
            <title>Conclusions</title>
            <p>We introduced two new computational resources, an online database of chromatin state models and processed genome segmentations called 
                <italic toggle="yes">StateHub</italic>, and an R/Bioconductor tool called 
                <italic toggle="yes">StatePaintR</italic>, which translates epigenomics files into segmentations using these models. One may annotate incomplete datasets rapidly and sensibly according to a single model specification that gracefully degrades to lesser annotations with missing data. Annotations have header documentation with genome version, 
                <italic toggle="yes">StateHub</italic> model, and the names of source files and their mappings. These tools document segmentations and state labels precisely as they are used in individual studies and to allow comparisons between evolving models of epigenomic states as they relate to NGS experiments. They also enable mixing of epigenomic states with other types of data, such as 3D looping assays, transcription factors, primary sequence features such as position weight matrices, or disease variants.</p>
        </sec>
        <sec>
            <title>Software availability</title>
            <p>
                <italic toggle="yes">StateHub</italic> available from: 
                <ext-link ext-link-type="uri" xlink:href="http://statehub.org/">http://statehub.org/</ext-link>
            </p>
            <p>Archived source code of 
                <italic toggle="yes">StateHub</italic> as at time of publication: 
                <ext-link ext-link-type="uri" xlink:href="https://zenodo.org/record/1148792">https://zenodo.org/record/1148792</ext-link>
                <sup>
                    <xref ref-type="bibr" rid="ref-40">40</xref>
                </sup>
            </p>
            <p>
                <italic toggle="yes">StatePaintR</italic> available from: 
                <ext-link ext-link-type="uri" xlink:href="http://www.bioconductor.org/packages/release/bioc/html/StatePaintR.html">http://www.bioconductor.org/packages/release/bioc/html/StatePaintR.html</ext-link>
            </p>
            <p>Source code of 
                <italic toggle="yes">StatePaintR</italic> availabe from: 
                <ext-link ext-link-type="uri" xlink:href="http://www.github.com/Simon-Coetzee/StatePaintR">http://www.github.com/Simon-Coetzee/StatePaintR</ext-link>
            </p>
            <p>Archived source code of 
                <italic toggle="yes">StatePaintR</italic> as at time of publication: 
                <ext-link ext-link-type="uri" xlink:href="https://zenodo.org/record/1137825">https://zenodo.org/record/1137825</ext-link>
                <sup>
                    <xref ref-type="bibr" rid="ref-41">41</xref>
                </sup>
            </p>
            <p>License: GPL v3.0</p>
            <p>At the time of publication we have submitted our package to Bioconductor. A new version of the article will be updated once this package is available. For now, the entire package is available on 
                <ext-link ext-link-type="uri" xlink:href="http://www.github.com/Simon-Coetzee/StatePaintR">GitHub</ext-link>
            </p>
        </sec>
        <sec>
            <title>Data availability</title>
            <p>The following are additional files containing manifests to run 
                <italic toggle="yes">StatePaintR</italic> with current releases of all public datasets listed in 
                <xref ref-type="table" rid="T2">Table 2</xref>, links to segmentation tracks, and all code used for analysis and generation of figures in this manuscript. Complete code generated from R markdown (Rnotebooks/html format) for generating all analyses, figures and tables 
                <ext-link ext-link-type="uri" xlink:href="http://statehub.org/statehub_media/statepaintr.nb.html">is available here</ext-link>.</p>
            <p>Supplementary material, including Supplementary File 1 and Supplemental Figures 1&#x2013;5 are available on figshare here:</p>
            <p>
                <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.6084/m9.figshare.12195087">https://doi.org/10.6084/m9.figshare.12195087</ext-link>
                <sup>
                    <xref ref-type="bibr" rid="ref-25">25</xref>
                </sup>
            </p>
            <p id="SF1">
                <bold>Supplementary File 1: Statepaintr.nb.html:</bold> This file contains code for all the examples and use cases in the text of this manuscript, generated as an html from Rmarkdown.</p>
            <p id="SF2">
                <bold>Figure S1: Relationship between data and runtime</bold>. 
                <italic toggle="yes">StatePaintR</italic> takes only a few seconds to run. The exact time depends on the number number of unique segments (lines of data) created by overlapping genomic intervals of all input files, cumulative. Thus, 128 Roadmap tissues can be run in 10 sec &#x00d7; 128 &#x2248; 1,280 sec (21 min).</p>
            <p id="SF3">
                <bold>Figure S2: Predictions with multiple marks.</bold> Ranked ChIP-seq peak scores for multiple marks were used to rank active enhancers (H3K4me1 + H3K27Ac + DHS) by 3 methods (median, mean, ceiling) and compared to a sample (
                <italic toggle="yes">n</italic> = 100) of experimentally validated enhancers. The average or median of three marks was a better predictor than ceiling. The choice of function is subservient to choice of data for ranking&#x2013;if one of the three is less informative, it will produce false positives when using the max method&#x2013;therefore it is better to eliminate uninformative marks. See also 
                <xref ref-type="other" rid="SF5">Figure S4</xref>.</p>
            <p id="SF4">
                <bold>Figure S3: Ranking enhancers with subsets of marks.</bold> Combinations of marks were used to predict active enhancers by the max ranking method (as in 
                <xref ref-type="other" rid="SF3">Figure S2</xref>) and compared to enhancer score. &#x201c;All&#x201d; includes regulatory (H3K4me1), active (H3K27Ac), and core (DHS). We also tried a leave-one-out strategy for each of these categories in succession. Leaving out H3K4me1 (&#x201c;no regulatory&#x201d;) produced superior predictions, suggesting that its inclusion made the predictions less specific.</p>
            <p id="SF5">
                <bold>Figure S4: Chromatin states as predictors of true enhancers.</bold> We tested different chromatin states for their ability to predict true enhancers under the poised focused promoter model. Active enhancers exhibited the greatest predictive power under the precision recall gain curve.</p>
            <p id="SF6">
                <bold>Figure S5: Performance of enhancer predictions.</bold> Area under precision-recall-gain curves reflect the accuracy of three models of enhancer prediction. True positive enhancers are those validated in the VISTA enhancer browser. The ENCODE method (in blue) and the 
                <italic toggle="yes">StatePaintR</italic> method (in red) show similar accuracy in retrieving VISTA enhancers showing tissue specific enhancer activity, while EnhancerFinder (in green) is less accurate.</p>
        </sec>
    </body>
    <back>
        <ref-list>
            <ref id="ref-1">
                <label>1</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Rando</surname>
                            <given-names>OJ</given-names>
                        </name>
</person-group>:
                    <article-title>Combinatorial complexity in chromatin structure and function: revisiting the histone code.</article-title>
                    <source>

                        <italic toggle="yes">Curr Opin Genet Dev.</italic>
</source>
                    <year>2012</year>;<volume>22</volume>(<issue>2</issue>):<fpage>148</fpage>&#x2013;<lpage>155</lpage>.
                    <pub-id pub-id-type="pmid">22440480</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.gde.2012.02.013</pub-id>
                    <pub-id pub-id-type="pmcid">3345062</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-2">
                <label>2</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Gardner</surname>
                            <given-names>KE</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Allie</surname>
                            <given-names>CD</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Strahl</surname>
                            <given-names>BD</given-names>
                        </name>
</person-group>:
                    <article-title>Operating on chromatin, a colorful language where context matters.</article-title>
                    <source>

                        <italic toggle="yes">J Mol Biol.</italic>
</source>
                    <year>2011</year>;<volume>409</volume>(<issue>1</issue>):<fpage>36</fpage>&#x2013;<lpage>46</lpage>.
                    <pub-id pub-id-type="pmid">21272588</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.jmb.2011.01.040</pub-id>
                    <pub-id pub-id-type="pmcid">3085666</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-3">
                <label>3</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Rothbart</surname>
                            <given-names>SE</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Strahl</surname>
                            <given-names>BD</given-names>
                        </name>
</person-group>:
                    <article-title>Interpreting the language of histone and DNA modifications.</article-title>
                    <source>

                        <italic toggle="yes">Biochim Biophys Acta.</italic>
</source>
                    <year>2014</year>;<volume>1839</volume>(<issue>8</issue>):<fpage>627</fpage>&#x2013;<lpage>643</lpage>.
                    <pub-id pub-id-type="pmid">24631868</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.bbagrm.2014.03.001</pub-id>
                    <pub-id pub-id-type="pmcid">4099259</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-4">
                <label>4</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Boyle</surname>
                            <given-names>AP</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Davis</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Shulha</surname>
                            <given-names>HP</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>High-resolution mapping and characterization of open chromatin across the genome.</article-title>
                    <source>

                        <italic toggle="yes">Cell.</italic>
</source>
                    <year>2008</year>;<volume>132</volume>(<issue>2</issue>):<fpage>311</fpage>&#x2013;<lpage>322</lpage>.
                    <pub-id pub-id-type="pmid">18243105</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.cell.2007.12.014</pub-id>
                    <pub-id pub-id-type="pmcid">2669738</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-5">
                <label>5</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Simon</surname>
                            <given-names>JM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Giresi</surname>
                            <given-names>PG</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Davis</surname>
                            <given-names>IJ</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Using formaldehyde-assisted isolation of regulatory elements (FAIRE) to isolate active regulatory DNA.</article-title>
                    <source>

                        <italic toggle="yes">Nat Protoc.</italic>
</source>
                    <year>2012</year>;<volume>7</volume>(<issue>2</issue>):<fpage>256</fpage>&#x2013;<lpage>267</lpage>.
                    <pub-id pub-id-type="pmid">22262007</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nprot.2011.444</pub-id>
                    <pub-id pub-id-type="pmcid">3784247</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-6">
                <label>6</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Buenrostro</surname>
                            <given-names>JD</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Giresi</surname>
                            <given-names>PG</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zaba</surname>
                            <given-names>LC</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position.</article-title>
                    <source>

                        <italic toggle="yes">Nat Methods.</italic>
</source>
                    <year>2013</year>;<volume>10</volume>(<issue>12</issue>):<fpage>1213</fpage>&#x2013;<lpage>1218</lpage>.
                    <pub-id pub-id-type="pmid">24097267</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nmeth.2688</pub-id>
                    <pub-id pub-id-type="pmcid">3959825</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-7">
                <label>7</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Gal-Yam</surname>
                            <given-names>EN</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Jeong</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Tanay</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Constitutive nucleosome depletion and ordered factor assembly at the 
                        <italic toggle="yes">GRP78</italic> promoter revealed by single molecule footprinting.</article-title>
                    <source>

                        <italic toggle="yes">PLoS Genet.</italic>
</source>
                    <year>2006</year>;<volume>2</volume>(<issue>9</issue>);<fpage>e160</fpage>.
                    <pub-id pub-id-type="pmid">17002502</pub-id>
                    <pub-id pub-id-type="doi">10.1371/journal.pgen.0020160</pub-id>
                    <pub-id pub-id-type="pmcid">1574359</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-8">
                <label>8</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Cokus</surname>
                            <given-names>SJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Feng</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zhang</surname>
                            <given-names>X</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Shotgun bisulphite sequencing of the 
                        <italic toggle="yes">Arabidopsis</italic> genome reveals DNA methylation patterning.</article-title>
                    <source>

                        <italic toggle="yes">Nature.</italic>
</source>
                    <year>2008</year>;<volume>452</volume>(<issue>7184</issue>):<fpage>215</fpage>&#x2013;<lpage>219</lpage>.
                    <pub-id pub-id-type="pmid">18278030</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nature06745</pub-id>
                    <pub-id pub-id-type="pmcid">2377394</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-9">
                <label>9</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Thurman</surname>
                            <given-names>RE</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Rynes</surname>
                            <given-names>E</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Humbert</surname>
                            <given-names>R</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The accessible chromatin landscape of the human genome.</article-title>
                    <source>

                        <italic toggle="yes">Nature.</italic>
</source>
                    <year>2012</year>;<volume>489</volume>(<issue>7414</issue>):<fpage>75</fpage>&#x2013;<lpage>82</lpage>.
                    <pub-id pub-id-type="pmid">22955617</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nature11232</pub-id>
                    <pub-id pub-id-type="pmcid">3721348</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-10">
                <label>10</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Li</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Chen</surname>
                            <given-names>CY</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kaye</surname>
                            <given-names>AM</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The identification of cis-regulatory elements: A review from a machine learning perspective.</article-title>
                    <source>

                        <italic toggle="yes">Biosystems.</italic>
</source>
                    <year>2015</year>;<volume>138</volume>:<fpage>6</fpage>&#x2013;<lpage>17</lpage>.
                    <pub-id pub-id-type="pmid">26499213</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.biosystems.2015.10.002</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-11">
                <label>11</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ernst</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kellis</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <article-title>ChromHMM: automating chromatin-state discovery and characterization.</article-title>
                    <source>

                        <italic toggle="yes">Nat Methods.</italic>
</source>
                    <year>2012</year>;<volume>9</volume>(<issue>3</issue>):<fpage>215</fpage>&#x2013;<lpage>216</lpage>.
                    <pub-id pub-id-type="pmid">22373907</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nmeth.1906</pub-id>
                    <pub-id pub-id-type="pmcid">3577932</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-12">
                <label>12</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Song</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Chen</surname>
                            <given-names>KC</given-names>
                        </name>
</person-group>:
                    <article-title>Spectacle: fast chromatin state annotation using spectral learning.</article-title>
                    <source>

                        <italic toggle="yes">Genome Biol.</italic>
</source>
                    <year>2015</year>;<volume>16</volume>(<issue>1</issue>):<fpage>33</fpage>.
                    <pub-id pub-id-type="pmid">25786205</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13059-015-0598-0</pub-id>
                    <pub-id pub-id-type="pmcid">4355146</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-13">
                <label>13</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Mammana</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Chung</surname>
                            <given-names>HR</given-names>
                        </name>
</person-group>:
                    <article-title>Chromatin segmentation based on a probabilistic model for read counts explains a large portion of the epigenome.</article-title>
                    <source>

                        <italic toggle="yes">Genome Biol.</italic>
</source>
                    <year>2015</year>;<volume>16</volume>(<issue>1</issue>):<fpage>151</fpage>.
                    <pub-id pub-id-type="pmid">26206277</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13059-015-0708-z</pub-id>
                    <pub-id pub-id-type="pmcid">4514447</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-14">
                <label>14</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hoffman</surname>
                            <given-names>MM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Buske</surname>
                            <given-names>OJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Unsupervised pattern discovery in human chromatin structure through genomic segmentation.</article-title>
                    <source>

                        <italic toggle="yes">Nat Methods.</italic>
</source>
                    <year>2012</year>;<volume>9</volume>(<issue>5</issue>):<fpage>473</fpage>&#x2013;<lpage>476</lpage>.
                    <pub-id pub-id-type="pmid">22426492</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nmeth.1937</pub-id>
                    <pub-id pub-id-type="pmcid">3340533</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-15">
                <label>15</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hon</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ren</surname>
                            <given-names>B</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>W</given-names>
                        </name>
</person-group>:
                    <article-title>ChromaSig: a probabilistic approach to finding common chromatin signatures in the human genome.</article-title>
                    <source>

                        <italic toggle="yes">PLoS Comput Biol.</italic>
</source>
                    <year>2008</year>;<volume>4</volume>(<issue>10</issue>):<fpage>e1000201</fpage>.
                    <pub-id pub-id-type="pmid">18927605</pub-id>
                    <pub-id pub-id-type="doi">10.1371/journal.pcbi.1000201</pub-id>
                    <pub-id pub-id-type="pmcid">2556089</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-16">
                <label>16</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Santoni</surname>
                            <given-names>FA</given-names>
                        </name>
</person-group>:
                    <article-title>EMdeCODE: a novel algorithm capable of reading words of epigenetic code to predict enhancers and retroviral integration sites and to identify h3r2me1 as a distinctive mark of coding versus non-coding genes.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2013</year>;<volume>41</volume>(<issue>3</issue>):<fpage>e48</fpage>.
                    <pub-id pub-id-type="pmid">23234700</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gks1214</pub-id>
                    <pub-id pub-id-type="pmcid">3561958</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-17">
                <label>17</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Zacher</surname>
                            <given-names>B</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lidschreiber</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Cramer</surname>
                            <given-names>P</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Annotation of genomics data using bidirectional hidden Markov models unveils variations in Pol II transcription cycle.</article-title>
                    <source>

                        <italic toggle="yes">Mol Syst Biol.</italic>
</source>
                    <year>2014</year>;<volume>10</volume>(<issue>12</issue>):<fpage>768</fpage>.
                    <pub-id pub-id-type="pmid">25527639</pub-id>
                    <pub-id pub-id-type="doi">10.15252/msb.20145654</pub-id>
                    <pub-id pub-id-type="pmcid">4300491</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-18">
                <label>18</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Sohn</surname>
                            <given-names>KA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ho</surname>
                            <given-names>JW</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Djordjevic</surname>
                            <given-names>D</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>hiHMM: Bayesian non-parametric joint inference of chromatin state maps.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2015</year>;<volume>31</volume>(<issue>13</issue>):<fpage>2066</fpage>&#x2013;<lpage>74</lpage>.
                    <pub-id pub-id-type="pmid">25725496</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btv117</pub-id>
                    <pub-id pub-id-type="pmcid">4481846</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-19">
                <label>19</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Biesinger</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Xie</surname>
                            <given-names>X</given-names>
                        </name>
</person-group>:
                    <article-title>Discovering and mapping chromatin states using a tree hidden markov model.</article-title>
                    <source>

                        <italic toggle="yes">BMC Bioinformatics.</italic>
</source>
                    <year>2013</year>;<volume>14 Suppl 5</volume>:<fpage>S4</fpage>.
                    <pub-id pub-id-type="pmid">23734743</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btq248</pub-id>
                    <pub-id pub-id-type="pmcid">3622631</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-20">
                <label>20</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hoffman</surname>
                            <given-names>MM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ernst</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wilder</surname>
                            <given-names>SP</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Integrative annotation of chromatin elements from ENCODE data.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2013</year>;<volume>41</volume>(<issue>2</issue>):<fpage>827</fpage>&#x2013;<lpage>841</lpage>.
                    <pub-id pub-id-type="pmid">23221638</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gks1284</pub-id>
                    <pub-id pub-id-type="pmcid">3553955</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-21">
                <label>21</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Nalls</surname>
                            <given-names>MA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Pankratz</surname>
                            <given-names>N</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lill</surname>
                            <given-names>CM</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Large-scale meta-analysis of genome-wide association data identifies six new risk loci for Parkinson&#x2019;s disease.</article-title>
                    <source>

                        <italic toggle="yes">Nat Genet.</italic>
</source>
                    <year>2014</year>;<volume>46</volume>(<issue>9</issue>):<fpage>989</fpage>&#x2013;<lpage>993</lpage>.
                    <pub-id pub-id-type="pmid">25064009</pub-id>
                    <pub-id pub-id-type="doi">10.1038/ng.3043</pub-id>
                    <pub-id pub-id-type="pmcid">4146673</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-22">
                <label>22</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Buske</surname>
                            <given-names>OJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hoffman</surname>
                            <given-names>MM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ponts</surname>
                            <given-names>N</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Exploratory analysis of genomic segmentations with segtools.</article-title>
                    <source>

                        <italic toggle="yes">BMC Bioinformatics.</italic>
</source>
                    <year>2011</year>;<volume>12</volume>:<fpage>415</fpage>.
                    <pub-id pub-id-type="pmid">22029426</pub-id>
                    <pub-id pub-id-type="doi">10.1186/1471-2105-12-415</pub-id>
                    <pub-id pub-id-type="pmcid">3224787</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-23">
                <label>23</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Patch</surname>
                            <given-names>AM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Christie</surname>
                            <given-names>EL</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Etemadmoghadam</surname>
                            <given-names>D</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Whole-genome characterization of chemoresistant ovarian cancer.</article-title>
                    <source>

                        <italic toggle="yes">Nature.</italic>
</source>
                    <year>2015</year>;<volume>521</volume>(<issue>7553</issue>):<fpage>489</fpage>&#x2013;<lpage>494</lpage>.
                    <pub-id pub-id-type="pmid">26017449</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nature14410</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-24">
                <label>24</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Teschendorff</surname>
                            <given-names>AE</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gao</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Jones</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>DNA methylation outliers in normal breast tissue identify field defects that are enriched in cancer.</article-title>
                    <source>

                        <italic toggle="yes">Nat Commun.</italic>
</source>
                    <year>2016</year>;<volume>7</volume>: 10478.
                    <pub-id pub-id-type="pmid">26823093</pub-id>
                    <pub-id pub-id-type="doi">10.1038/ncomms10478</pub-id>
                    <pub-id pub-id-type="pmcid">4740178</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-25">
                <label>25</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Coetzee</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ramjan</surname>
                            <given-names>Z</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Dinh</surname>
                            <given-names>Q</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Supplemental Figures for "StateHub-StatePaintR: rapid and reproducible chromatin state evaluation for custom genome annotation".</article-title>
                    <source>

                        <italic toggle="yes">figshare.</italic>
</source>Figure.<year>2020</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://www.doi.org/10.6084/m9.figshare.12195087.v1">http://www.doi.org/10.6084/m9.figshare.12195087.v1</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-26">
                <label>26</label>
                <mixed-citation publication-type="journal">
                    <collab>ENCODE Project Consortium</collab>:
                    <article-title>An integrated encyclopedia of DNA elements in the human genome</article-title>.
                    <source>
                        <italic toggle="yes">Nature</italic>
                    </source>.<year>2012</year>;<volume>489</volume>(<issue>7414</issue>):<fpage>57</fpage>&#x2013;<lpage>74</lpage>.
                    <pub-id pub-id-type="pmid">22955616</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nature11247</pub-id>
                    <pub-id pub-id-type="pmcid">3439153</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-27">
                <label>27</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Visel</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Minovitsky</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Dubchak</surname>
                            <given-names>I</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>VISTA Enhancer Browser--a database of tissue-specific human enhancers.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2007</year>;<volume>35</volume>(<issue>Database issue</issue>):<fpage>D88</fpage>&#x2013;<lpage>92</lpage>.
                    <pub-id pub-id-type="pmid">17130149</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkl822</pub-id>
                    <pub-id pub-id-type="pmcid">1716724</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-28">
                <label>28</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <collab>The Roadmap Epigenomics Consortium</collab>, 
                        <name name-style="western">
                            <surname>Kundaje</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Meuleman</surname>
                            <given-names>W</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Integrative analysis of 111 reference human epigenomes.</article-title>
                    <source>

                        <italic toggle="yes">Nature.</italic>
</source>
                    <year>2015</year>;<volume>518</volume>(<issue>7539</issue>):<fpage>317</fpage>&#x2013;<lpage>330</lpage>.
                    <pub-id pub-id-type="pmid">25693563</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nature14248</pub-id>
                    <pub-id pub-id-type="pmcid">4530010</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-29">
                <label>29</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Coetzee</surname>
                            <given-names>SG</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Pierce</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Brundin</surname>
                            <given-names>P</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Enrichment of risk SNPs in regulatory regions implicate diverse tissues in Parkinson&#x2019;s disease etiology.</article-title>
                    <source>

                        <italic toggle="yes">Sci Rep.</italic>
</source>
                    <year>2016</year>;<volume>6</volume>: 30509.
                    <pub-id pub-id-type="pmid">27461410</pub-id>
                    <pub-id pub-id-type="doi">10.1038/srep30509</pub-id>
                    <pub-id pub-id-type="pmcid">4962314</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-30">
                <label>30</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Gal-Yam</surname>
                            <given-names>EN</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Egger</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Iniguez</surname>
                            <given-names>L</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Frequent switching of Polycomb repressive marks and DNA hypermethylation in the PC3 prostate cancer cell line.</article-title>
                    <source>

                        <italic toggle="yes">Proc Natl Acad Sci U S A.</italic>
</source>
                    <year>2008</year>;<volume>105</volume>(<issue>35</issue>):<fpage>12979</fpage>&#x2013;<lpage>12984</lpage>.
                    <pub-id pub-id-type="pmid">18753622</pub-id>
                    <pub-id pub-id-type="doi">10.1073/pnas.0806437105</pub-id>
                    <pub-id pub-id-type="pmcid">2529074</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-31">
                <label>31</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Calo</surname>
                            <given-names>E</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wysocka</surname>
                            <given-names>J</given-names>
                        </name>
</person-group>:
                    <article-title>Modification of enhancer chromatin: what, how, and why?</article-title>
                    <source>

                        <italic toggle="yes">Mol Cell.</italic>
</source>
                    <year>2013</year>;<volume>49</volume>(<issue>5</issue>):<fpage>825</fpage>&#x2013;<lpage>837</lpage>.
                    <pub-id pub-id-type="pmid">23473601</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.molcel.2013.01.038</pub-id>
                    <pub-id pub-id-type="pmcid">3857148</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-32">
                <label>32</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Zerbino</surname>
                            <given-names>DR</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wilder</surname>
                            <given-names>SP</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Johnson</surname>
                            <given-names>N</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The ensembl regulatory build.</article-title>
                    <source>

                        <italic toggle="yes">Genome Biol.</italic>
</source>
                    <year>2015</year>;<volume>16</volume>:<fpage>56</fpage>.
                    <pub-id pub-id-type="pmid">25887522</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13059-015-0621-5</pub-id>
                    <pub-id pub-id-type="pmcid">4407537</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-33">
                <label>33</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Erwin</surname>
                            <given-names>GD</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Oksenberg</surname>
                            <given-names>N</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Truty</surname>
                            <given-names>RM</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Integrating diverse datasets improves developmental enhancer prediction.</article-title>
                    <source>

                        <italic toggle="yes">PLoS Comput Biol.</italic>
</source>
                    <year>2014</year>;<volume>10</volume>(<issue>6</issue>):<fpage>e1003677</fpage>.
                    <pub-id pub-id-type="pmid">24967590</pub-id>
                    <pub-id pub-id-type="doi">10.1371/journal.pcbi.1003677</pub-id>
                    <pub-id pub-id-type="pmcid">4072507</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-34">
                <label>34</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Rajagopal</surname>
                            <given-names>N</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Xie</surname>
                            <given-names>W</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Li</surname>
                            <given-names>Y</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>RFECS: a random-forest based algorithm for enhancer identification from chromatin state.</article-title>
                    <source>

                        <italic toggle="yes">PLoS Comput Biol.</italic>
</source>
                    <year>2013</year>;<volume>9</volume>(<issue>3</issue>):<fpage>e1002968</fpage>.
                    <pub-id pub-id-type="pmid">23526891</pub-id>
                    <pub-id pub-id-type="doi">10.1371/journal.pcbi.1002968</pub-id>
                    <pub-id pub-id-type="pmcid">3597546</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-35">
                <label>35</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Lu</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Qu</surname>
                            <given-names>W</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Shan</surname>
                            <given-names>G</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>DELTA: A Distal Enhancer Locating Tool Based on AdaBoost Algorithm and Shape Features of Chromatin Modifications.</article-title>
                    <source>

                        <italic toggle="yes">PLoS One.</italic>
</source>
                    <year>2015</year>;<volume>10</volume>(<issue>6</issue>):<fpage>e0130622</fpage>.
                    <pub-id pub-id-type="pmid">26091399</pub-id>
                    <pub-id pub-id-type="doi">10.1371/journal.pone.0130622</pub-id>
                    <pub-id pub-id-type="pmcid">4474808</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-36">
                <label>36</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Firpi</surname>
                            <given-names>HA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ucar</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Tan</surname>
                            <given-names>K</given-names>
                        </name>
</person-group>:
                    <article-title>Discover regulatory DNA elements using chromatin signatures and artificial neural network.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2010</year>;<volume>26</volume>(<issue>13</issue>):<fpage>1579</fpage>&#x2013;<lpage>1586</lpage>.
                    <pub-id pub-id-type="pmid">20453004</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btq248</pub-id>
                    <pub-id pub-id-type="pmcid">2887052</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-37">
                <label>37</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>He</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gorkin</surname>
                            <given-names>DU</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Dickel</surname>
                            <given-names>DE</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Improved regulatory element prediction based on tissue-specific local epigenomic signatures.</article-title>
                    <source>

                        <italic toggle="yes">Proc Natl Acad Sci U S A.</italic>
</source>
                    <year>2017</year>;<volume>114</volume>(<issue>9</issue>):<fpage>E1633</fpage>&#x2013;<lpage>E1640</lpage>.
                    <pub-id pub-id-type="pmid">28193886</pub-id>
                    <pub-id pub-id-type="doi">10.1073/pnas.1618353114</pub-id>
                    <pub-id pub-id-type="pmcid">5338528</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-38">
                <label>38</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Flach</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kull</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <article-title>Precision-recall-gain curves: Pr analysis done right</article-title>. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, and Garnett R, editors,
                    <italic toggle="yes">Advances in Neural Information Processing Systems</italic>. Curran Associates, Inc.,<year>2015</year>;<volume>28</volume>:<fpage>838</fpage>&#x2013;<lpage>846</lpage>.
                    <ext-link ext-link-type="uri" xlink:href="https://papers.nips.cc/paper/5867-precision-recall-gain-curves-pr-analysis-done-right.pdf">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-39">
                <label>39</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Zhang</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>An</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Yue</surname>
                            <given-names>F</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Jointly characterizing epigenetic dynamics across multiple human cell types.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2016</year>;<volume>44</volume>(<issue>14</issue>):<fpage>6721</fpage>&#x2013;<lpage>6731</lpage>.
                    <pub-id pub-id-type="pmid">27095202</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkw278</pub-id>
                    <pub-id pub-id-type="pmcid">5772166</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-40">
                <label>40</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ramjan</surname>
                            <given-names>Z</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Coetzee</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>zackramjan/statehubweb: initial release of the statehub web frontend app with doi (Version v1.1).</article-title>
                    <source>

                        <italic toggle="yes">Zenodo.</italic>
</source>
                    <year>2018</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://www.doi.org/10.5281/zenodo.1148792">http://www.doi.org/10.5281/zenodo.1148792</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-41">
                <label>41</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Coetzee</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>Simon-Coetzee/StatePaintR v0.99.6 (Version v0.99.6).</article-title>
                    <source>

                        <italic toggle="yes">Zenodo.</italic>
</source>
                    <year>2018</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://www.doi.org/10.5281/zenodo.1137825">http://www.doi.org/10.5281/zenodo.1137825</ext-link>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report63182">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.20361.r63182</article-id>
            <title-group>
                <article-title>Reviewer response for version 2</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Libbrecht</surname>
                        <given-names>Maxwell W.</given-names>
                    </name>
                    <xref ref-type="aff" rid="r63182a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-2502-0262</uri>
                </contrib>
                <aff id="r63182a1">
                    <label>1</label>School of Computing Sciences, Simon Fraser University, Burnaby, BC, Canada</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>29</day>
                <month>5</month>
                <year>2020</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2020 Libbrecht MW</copyright-statement>
                <copyright-year>2020</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport63182" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.13535.2"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The revision has greatly improved the clarity of the text and the revised version is much more understandable.</p>
            <p> </p>
            <p> I agree with the other reviewers that a performance comparison is necessary to demonstrate the utility of StatePaintR over existing methods. I missed this claim in my first review: "&#x00a0;In direct comparisons, we found that our own annotations exhibited greater similarity to the combined analysis than to either of the Segway or ChromHMM tracks separately (not shown)."&#x00a0; If true, this is a key result; the results must be shown (as a&#x00a0;primary figure, not a supplement).</p>
            <p>Is the rationale for developing the new method (or application) clearly explained?</p>
            <p>No</p>
            <p>Is the description of the method technically sound?</p>
            <p>Yes</p>
            <p>Are the conclusions about the method and its performance adequately supported by the findings presented in the article?</p>
            <p>Yes</p>
            <p>If any results are presented, are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Are sufficient details provided to allow replication of the method development and its use by others?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>computational genomics</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report63183">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.20361.r63183</article-id>
            <title-group>
                <article-title>Reviewer response for version 2</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Park</surname>
                        <given-names>Yongjin</given-names>
                    </name>
                    <xref ref-type="aff" rid="r63183a1">1</xref>
                    <xref ref-type="aff" rid="r63183a2">2</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-8915-2876</uri>
                </contrib>
                <aff id="r63183a1">
                    <label>1</label>Broad Institute, Massachusetts Institute of Technology, Harvard University, Cambridge, MA, USA</aff>
                <aff id="r63183a2">
                    <label>2</label>Department of Pathology and Statistics, The University of British Columbia, Vancouver, BC, Canada</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interest.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>28</day>
                <month>5</month>
                <year>2020</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2020 Park Y</copyright-statement>
                <copyright-year>2020</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport63183" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.13535.2"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>Since the last review, I could only find a minute change in the text and no change in the figures. I found no clear reason for using StateHub-StatePaint tool by simply reading this paper. I am sure the authors must have spent lots of time developing the Bioconductor package, but a large portion of details are simply missing in the paper. 
                <list list-type="bullet">
                    <list-item>
                        <p>I honestly don't think there is no systematic performance comparison (doesn't have to be exhaustive).&#x00a0;</p>
                    </list-item>
                    <list-item>
                        <p>There is no mathematical definition of the metrics used in the paper: What is the gold standard for the tests? What is the test statistics? How do you compute enrichment score? How do you estimate the confidence interval?&#x00a0;</p>
                    </list-item>
                </list> Since the authors' claim for the paper is really about transparency and reproduciblity, these are too important to embedded in the stack of R codes.</p>
            <p> </p>
            <p> Moreover, I would emphasize on why rule-based methods are more transparent and reproducible, compared to other ML-based methods. Readers may disagree on the definition of transparency and reproducibility, but it is important to give them a chance to judge by themselves. It would also nice to have examples that clearly contrast between this method and other methods.</p>
            <p>Is the rationale for developing the new method (or application) clearly explained?</p>
            <p>Partly</p>
            <p>Is the description of the method technically sound?</p>
            <p>Partly</p>
            <p>Are the conclusions about the method and its performance adequately supported by the findings presented in the article?</p>
            <p>Partly</p>
            <p>If any results are presented, are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Are sufficient details provided to allow replication of the method development and its use by others?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>NA</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report63184">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.20361.r63184</article-id>
            <title-group>
                <article-title>Reviewer response for version 2</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Filion</surname>
                        <given-names>Guillaume J.</given-names>
                    </name>
                    <xref ref-type="aff" rid="r63184a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-3473-1632</uri>
                </contrib>
                <aff id="r63184a1">
                    <label>1</label>Gene Regulation, Stem Cells and Cancer Program, Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Spain</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>26</day>
                <month>5</month>
                <year>2020</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2020 Filion GJ</copyright-statement>
                <copyright-year>2020</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport63184" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.13535.2"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The second version of the article is quite similar to the first, the strong points remain the same, but the weak points about clarity as well.</p>
            <p> </p>
            <p> I was not specific enough in the first version, my bad. The list of issues that must be addressed in my opinion are the following: 
                <list list-type="order">
                    <list-item>
                        <p>In Figure 2, indicate what the colors are, even if the color code is defined in Roadmap publications (it cannot be that the legend of a figure is in another paper). Also indicate somewhere what the legend is for the left bar (possibly in supplementary material, but it has to be defined somewhere in the article).</p>
                    </list-item>
                    <list-item>
                        <p>In Figure 3, tell what the points represent. Looking at the figure raises many questions: why some tissues have more points than others? Where does the data comes from? What is the nature of the data? What is plotted exactly? What are the grey boxes? What is the definition for "significant enrichment"? What does p in the label of y-axis stand for? The nomenclature suggests it is a probability, why are some values negative then?</p>
                    </list-item>
                    <list-item>
                        <p>The same questions apply verbatim to Figure 4, except the y-axis label. In addition, I found some additional typos that the authors may want to correct.</p>
                    </list-item>
                    <list-item>
                        <p>Fix "required or state?" in Table 1 (should be "required for state" I suppose).</p>
                    </list-item>
                    <list-item>
                        <p>Capitalize "roadmap" in the legend of Figure 3 and Figure 4.</p>
                    </list-item>
                    <list-item>
                        <p>The "/" seems to be missing in the label of the y-axis.</p>
                    </list-item>
                </list>
            </p>
            <p>Is the rationale for developing the new method (or application) clearly explained?</p>
            <p>Yes</p>
            <p>Is the description of the method technically sound?</p>
            <p>Yes</p>
            <p>Are the conclusions about the method and its performance adequately supported by the findings presented in the article?</p>
            <p>Yes</p>
            <p>If any results are presented, are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Are sufficient details provided to allow replication of the method development and its use by others?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>Bioinformatics.</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report33702">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.14699.r33702</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Filion</surname>
                        <given-names>Guillaume J.</given-names>
                    </name>
                    <xref ref-type="aff" rid="r33702a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-3473-1632</uri>
                </contrib>
                <aff id="r33702a1">
                    <label>1</label>Gene Regulation, Stem Cells and Cancer Program, Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Spain</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>23</day>
                <month>5</month>
                <year>2018</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2018 Filion GJ</copyright-statement>
                <copyright-year>2018</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport33702" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.13535.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The authors present a pair of tools called StateHub and StatePaintR to annotate genomic states based on chromatin data (ChIP-seq, ATAC-seq etc.). This work has a very pragmatic take on the problem: the software is fast, based on universal rules, linked to a wealth of data etc. For this, the authors have decided to use a rule-based method instead of the traditional machine-learning approach, which in my opinion is completely justified. The early to discover and annotate chromatin states were based on different methods, all &#x201c;optimal&#x201d; in their own way. However, none of these methods has proved to have a decisive advantage over the others for the following two reasons: First, chromatin states do not &#x201c;exist&#x201d;, they are useful representations associated with a particular state of our knowledge and a particular problem at hand. Second, the choice of the input data and the number of states (i.e. the granularity of the segmentation) seems to be the most influential factor on the end result. With these elements in mind, it makes perfect sense to develop a tool aiming to satisfy the needs of the user and the demand for reproducibility and traceability, rather than some mathematical constraint.</p>
            <p> </p>
            <p> Overall, the manuscript is well constructed &#x2013; and as mentioned above it describes a relevant advance &#x2013; but it could be streamlined for clarity. Many terms are ambiguous (like &#x201c;active states&#x201d;) or are jargon for chromatin specialist (like &#x201c;PolycombNarrow&#x201d;). The figure legends are barely enough to understand what is plotted and the axes are not all properly labelled. It is a good thing that the authors give some examples to explain the entries of the design matrix. For didactic purposes, they could give more of those, or make the examples more concrete throughout the manuscript to help the reader understand the logic of their tool.</p>
            <p> </p>
            <p> The manuscript does otherwise a great job at making the work reproducible, explaining the limitations and the scope of their software, and also at giving a high level description of the implementation. To help the authors sharpen the manuscript for more readability, below is a list of typos and minor issues.</p>
            <p> </p>
            <p> Page 3, paragraph starting with &#x201c;All these data...&#x201d;: Perhaps a word is missing in the sentence &#x201c;The input and output (final) data are both [?] as browser extensible data&#x2026;&#x201d;.</p>
            <p> </p>
            <p> Page 3, last sentence of the main text: it should read &#x201c;... PolycombNarrow data is required [to] be present&#x201d;.</p>
            <p> </p>
            <p> Page 4, second paragraph, fourth sentence from the end: a space seems to be missing in &#x201c;StatePaintR[space]selects&#x201d;.</p>
            <p> </p>
            <p> Page 4, third paragraph, fourth sentence: &#x201c;an unique&#x201d; should be &#x201c;a unique&#x201d;.</p>
            <p> </p>
            <p> Page 4, paragraph &#x201c;Enrichment calculations&#x201d;, first sentence. A word seems to be missing in &#x201c;... an earlier study of Parkinson&#x2019;s disease in which [?] tested for&#x2026;&#x201d;.</p>
            <p> </p>
            <p> Page 6, paragraph &#x201c;A framework for rules-based annotation&#x201d;: &#x201c;rules-based&#x201d; should be &#x201c;rule-based&#x201d;. See https://english.stackexchange.com/q/1366/44109</p>
            <p> </p>
            <p> Page 6, paragraph &#x201c;Segmentation of public datasets&#x201d;, second sentence from the end, a space is missing before the parentheses in &#x201c;...high-quality[space](at least 15m reads&#x201d;. Also, the &#x201c;m&#x201d; probably stands for &#x201c;million&#x201d; but in scientific texts it must stand for &#x201c;metres&#x201d;. If the authors mean &#x201c;million&#x201d;, the best option is to write &#x201c;million&#x201d;.</p>
            <p> </p>
            <p> Figure 3, legend: what are &#x201c;active states&#x201d;? The authors could give the complete list.</p>
            <p> </p>
            <p> Figure 4, legend: the authors should indicate on the graph what is plotted on the Y axis (and give the unit). Are the data also plotted in &#x201c;active&#x201d; states? Whatever the answer, this should be stated clearly.</p>
            <p> </p>
            <p> Page 8, first paragraph of the main text, last sentence: there is one &#x201c;been&#x201d; too much in &#x201c;... with these properties have been also been called&#x2026;&#x201d;.</p>
            <p> </p>
            <p> Page 9, first paragraph, last sentence: &#x201c;roadmap&#x201d; should be written with a capital R.</p>
            <p> </p>
            <p> Page 11, second paragraph, second sentence: &#x201c;rules-based&#x201d; should be &#x201c;rule-based&#x201d; (and again in the last sentence of the paragraph).</p>
            <p>Is the rationale for developing the new method (or application) clearly explained?</p>
            <p>Yes</p>
            <p>Is the description of the method technically sound?</p>
            <p>Yes</p>
            <p>Are the conclusions about the method and its performance adequately supported by the findings presented in the article?</p>
            <p>Yes</p>
            <p>If any results are presented, are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Are sufficient details provided to allow replication of the method development and its use by others?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>NA</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <sub-article article-type="response" id="comment5368-33702">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Hazelett</surname>
                            <given-names>Dennis</given-names>
                        </name>
                        <aff>Cedars-Sinai Medical Center, USA</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>30</day>
                    <month>3</month>
                    <year>2020</year>
                </pub-date>
            </front-stub>
            <body>
                <p>First, I would like to thank the reviewer for their comments and corrections. I address them below. And in the resubmission of the article.</p>
                <p> &#x201c;Overall, the manuscript is well constructed &#x2013; and as mentioned above it describes a relevant advance &#x2013; but it could be streamlined for clarity. Many terms are ambiguous (like &#x201c;active states&#x201d;) or are jargon for chromatin specialist (like &#x201c;PolycombNarrow&#x201d;). The figure legends are barely enough to understand what is plotted and the axes are not all properly labelled. It is a good thing that the authors give some examples to explain the entries of the design matrix. For didactic purposes, they could give more of those, or make the examples more concrete throughout the manuscript to help the reader understand the logic of their tool.&#x201d;</p>
                <p> The reviewer is correct that legends and the decision matrix were sparsely explained, and the explanation of the decision matrix has been made more clear and explicit, and grounded in an concrete set of examples. Also, figure legends have been expanded, and axes have been made clear.&#x00a0;</p>
                <p> &#x201c;The manuscript does otherwise a great job at making the work reproducible, explaining the limitations and the scope of their software, and also at giving a high level description of the implementation. To help the authors sharpen the manuscript for more readability, below is a list of typos and minor issues.&#x201d;</p>
                <p> Thank you for the corrections, all issues have been corrected, and implemented in the figure legends.</p>
            </body>
        </sub-article>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report33619">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.14699.r33619</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Park</surname>
                        <given-names>Yongjin</given-names>
                    </name>
                    <xref ref-type="aff" rid="r33619a1">1</xref>
                    <xref ref-type="aff" rid="r33619a2">2</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-8915-2876</uri>
                </contrib>
                <aff id="r33619a1">
                    <label>1</label>Broad Institute, Massachusetts Institute of Technology, Harvard University, Cambridge, MA, USA</aff>
                <aff id="r33619a2">
                    <label>2</label>Department of Pathology and Statistics, The University of British Columbia, Vancouver, BC, Canada</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>10</day>
                <month>5</month>
                <year>2018</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2018 Park Y</copyright-statement>
                <copyright-year>2018</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport33619" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.13535.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>Overall the paper could be quite impactful and the software they developed can be highly usable. But the paper doesn&#x2019;t read well. I assume the authors intended to write a research paper, not a technical note. All of my comments are based on this assumption. 
                <list list-type="bullet">
                    <list-item>
                        <p>The authors need to put more efforts to convince ordinary users that&#x00a0;StatePaintR&#x00a0;is more powerful compared to a single model trained on relevant cell / tissue types. Perhaps expanding from the results in the supplementary section could improve the paper.</p>
                    </list-item>
                    <list-item>
                        <p>Why not just train&#x00a0;chromHMM&#x00a0;or&#x00a0;segway&#x00a0;given current chip-seq tracks? What&#x2019;s a clear advantage of the rule-based method? I don' think the rule-based method can clearly estimate underlying model complexity of epigenomics.&#x00a0;I think this is too important information to be omitted:</p>
                    </list-item>
                </list> 
                <italic>In direct comparisons, we found that our own annotations exhibited greater similarity to the combined analysis than to either of the Segway or ChromHMM tracks separately (not shown). Whatever the protocol, the basic problem persists; machine-learning is able to provide insight into what the categories are, but not how many categories there should be. Currently this remains the exclusive province of the biologist.</italic> 
                <list list-type="bullet">
                    <list-item>
                        <p>Does this method help prioritize relevant cell / tissue types?</p>
                    </list-item>
                    <list-item>
                        <p>Description in the method section is fuzzy. I think a complete paper needs to be self-contained without looking up definitions and terminology from other sources. However, many terms are either vaguely used or never defined. Moreover, the method section needs to be better organized in a top-down fashion instead of enumerating what were implemented.</p>
                    </list-item>
                    <list-item>
                        <p>Figure 1 is confusing and not so informative. Why don&#x2019;t you include real-world example such as chip-seq or methylation tracks?</p>
                    </list-item>
                    <list-item>
                        <p>Perhaps you can combine Table 1 with Figure 1. First of all, is Table 1 really necessary? Why do you need both binary and decimal code (I know why but it is irrelevant to the main story of this paper)? It is probably better to show graphical examples how you assign decimal values.</p>
                    </list-item>
                    <list-item>
                        <p>How do you define information content? How do you define enrichment? How do you calibrate significance?</p>
                    </list-item>
                    <list-item>
                        <p>Is Beta-Binomial reasonable assumption? There are more examples in the background. Do you estimate Beta-Binomial by moment-matching of posterior distribution or maximum-likelihood?</p>
                    </list-item>
                    <list-item>
                        <p>y-axis labels are either missing or badly named (Fig 3 and 4).</p>
                    </list-item>
                    <list-item>
                        <p>As&#x00a0;future direction, how easy is it to implement user-defined enrichment models / methods?</p>
                    </list-item>
                </list>
            </p>
            <p>Is the rationale for developing the new method (or application) clearly explained?</p>
            <p>Partly</p>
            <p>Is the description of the method technically sound?</p>
            <p>Partly</p>
            <p>Are the conclusions about the method and its performance adequately supported by the findings presented in the article?</p>
            <p>Partly</p>
            <p>If any results are presented, are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Are sufficient details provided to allow replication of the method development and its use by others?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>NA</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <sub-article article-type="response" id="comment5367-33619">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Hazelett</surname>
                            <given-names>Dennis</given-names>
                        </name>
                        <aff>Cedars-Sinai Medical Center, USA</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>30</day>
                    <month>3</month>
                    <year>2020</year>
                </pub-date>
            </front-stub>
            <body>
                <p>First, I would like to thank the reviewer for their comments and corrections. I address them below. And in the resubmission of the article.</p>
                <p> &#x201c;The authors need to put more efforts to convince ordinary users that StatePaintR is more powerful compared to a single model trained on relevant cell / tissue types. Perhaps expanding from the results in the supplementary section could improve the paper.&#x201d;</p>
                <p> It is difficult to justify a particular segmentation over another. We propose, rather, that the model espoused by StateHub/StatePaintR of rule-based annotation of predicted states provides an alternative to the existing model of learning the states within the relevant cell/tissue types. The priority of the method is to provide a framework to express existing statements about the relationships of genomic annotations and how they combine to reveal underlying chromatin states thereby bypassing denovo learning and annotating of states within each sample and annotating solely based upon simple rules and available data. However, in order to demonstrate the utility of the method in annotating chromatin states, we have expanded the description of the method&#x2019;s annotation scoring and how we may predict experimentally validated enhancer regions from the VISTA database.</p>
                <p> &#x201c;Why not just train chromHMM or segway given current chip-seq tracks? What&#x2019;s a clear advantage of the rule-based method? I don' think the rule-based method can clearly estimate underlying model complexity of epigenomics. I think this is too important information to be omitted.&#x201d;</p>
                <p> It is true that StatePaintR does not discover new states, nor provide a complete model of the underlying epigenomics. What the tool provides is a language to encode existing knowledge of the relationships between genomic annotations of any kind and the underlying epigenomic state, in a manner that is descriptive, robust to missing data, and consistent across diverse data sets.</p>
                <p> &#x201c;Does this method help prioritize relevant cell / tissue types?&#x201d;</p>
                <p> This method allows the user to annotate diverse cell / tissue types with a single model. This may provide useful segmentations for the purpose of describing the chromatin state relative to external data, but the method itself does not prioritize specific samples.</p>
                <p> &#x201c;Description in the method section is fuzzy. I think a complete paper needs to be self-contained without looking up definitions and terminology from other sources. However, many terms are either vaguely used or never defined. Moreover, the method section needs to be better organized in a top-down fashion instead of enumerating what were implemented.&#x201d;</p>
                <p> We agree that the methods were confusing, and have been reorganized and expanded to be more effectively self-contained, clear, and precise.</p>
                <p> &#x201c;Figure 1 is confusing and not so informative. Why don&#x2019;t you include real-world example such as chip-seq or methylation tracks?&#x201d;</p>
                <p> Figure 1 is intended to be a schematic representation of how the decision matrix and abstraction layer work together to produce an annotation for a genomic segment. To this end, we have clarified and expanded, both in the figure legend, and the Implementation section of the Methods, the nature of the decision matrix and abstraction layer, including concrete examples of how different chromatin marks combine to form a state.</p>
                <p> &#x201c;Perhaps you can combine Table 1 with Figure 1. First of all, is Table 1 really necessary? Why do you need both binary and decimal code (I know why but it is irrelevant to the main story of this paper)? It is probably better to show graphical examples how you assign decimal values.&#x201d;</p>
                <p> While the decimal values lend familiarity to the visualization, by defining the two bits that represent the two questions that are answered in the matrix for each cell, we believe that it streamlines understanding of the decision matrix, and also how it works from an implementation standpoint.</p>
                <p> &#x201c;How do you define information content? How do you define enrichment? How do you calibrate significance?</p>
                <p> Is Beta-Binomial reasonable assumption? There are more examples in the background. Do you estimate Beta-Binomial by moment-matching of posterior distribution or maximum-likelihood?&#x201d;</p>
                <p> Information content was an unintentionally misleading term that we used to refer to state complexity, so the term information content has been removed from discussion of the decision matrix.</p>
                <p> For calculating the enrichment within genomic states for SNPs or hypermethylated CpG for figures 3 and 4, we calculate enrichment relative to the appropriate background. As mentioned in Methods: Enrichment Calculations, we considered the background rate of SNPs within active chromatin states to be the proportion of all SNPs within a 1 Mb region of the index SNP with a MAF of greater than 0.01 in the population of interest (Europeans, from 1000 genomes), while the foreground is those SNPs with a linkage disequilibrium R^2 &gt; 0.8. For the methylation data, the full HM450 methylation array was considered as the background, while probes on the array with a difference in beta value between cancer and normal of 0.3 and significance in Mann-Whitney U-test at a p-value of &lt; 0.01. For the GWAS data set, we determined the difference between foreground and background by simulation of posterior draws from the two beta distributions. Significance is determined if the credible interval (95%) did not contain 0. We believe that beta-binomial is a reasonable assumption given that we are controlling the background to be within the exact same genomic region from which we are drawing the foreground, thereby accounting for the heterogeneity of the chromatin states across the genome. For the methlyation dataset, the odds ratio and the 95% confidence interval where calculated with Fisher's exact test.</p>
                <p> &#x201c;y-axis labels are either missing or badly named (Fig 3 and 4).&#x201d;</p>
                <p> We have corrected the poorly labeled and unlabeled y-axis labels in these figures.</p>
                <p> &#x201c;As future direction, how easy is it to implement user-defined enrichment models / methods?&#x201d;</p>
                <p> It is not currently easy, though it is possible for users to implement a decision matrix and abstraction layer. The difficulty lies in determining if a model is complete, and non redundant. As a future direction, we intend to create a tool on the StateHub website where models may be submitted and checked for validity, and then, optionally, published for others to use.</p>
            </body>
        </sub-article>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report33280">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.14699.r33280</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Libbrecht</surname>
                        <given-names>Maxwell W.</given-names>
                    </name>
                    <xref ref-type="aff" rid="r33280a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-2502-0262</uri>
                </contrib>
                <aff id="r33280a1">
                    <label>1</label>School of Computing Sciences, Simon Fraser University, Burnaby, BC, Canada</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>30</day>
                <month>4</month>
                <year>2018</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2018 Libbrecht MW</copyright-statement>
                <copyright-year>2018</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport33280" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.13535.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The authors present a method for annotating the genome using genomics data sets such as histone modifications, transcription factor binding and methylation. The algorithm is applied to data from a given tissue. It takes as input a collection of genomics data sets that have been binarized in a preprocessing step, such that each is represented by a binary vector over the genome. The method outputs a genomic vector of one of K states, such as "Promoter" or "Transcribed" (K=20 in their default model). The method uses a "model matrix" which defines, for each state-dataset pair, for a given base to be called as that state, if (1) the dataset *may* be positive for that base, and (2) if that dataset *must* be positive for that base.</p>
            <p> </p>
            <p> StatePaintR is likely to be an impactful method. Genome annotations are a very useful product of epigenomics data sets, as evidenced by the wide array of methods developed for their production. StatePaintR is an alternative to existing algorithms based on probabilistic models&#x00a0;that is much simpler and more transparent.</p>
            <p> </p>
            <p> Unfortunately, the manuscript is&#x00a0;difficult to understand in its current form because many key definitions are missing. Several examples:</p>
            <p> - The term "functional category" is not defined.</p>
            <p> - The Introduction uses the term "functional category" to mean a state, where later that term is used to refer to a collection of data sets (such as "silencing marks")</p>
            <p> - The form of the input and output are not explicitly mentioned.&#x00a0;</p>
            <p> - It is not explicitly mentioned that the model matrix is generated manually.</p>
            <p> </p>
            <p> Minor notes:</p>
            <p> - It is claimed that the information content of a state equals the sum of the cell values. However, it seems to me that the maximally-permissive value is 1 (neither required nor exclusionary), not 0.</p>
            <p> - P3: "3 bit code". Should be 2 bit code.</p>
            <p> - Figure 1: "red dotted arrows indicate non-matching rows". I don't understand -- each arrow connects to two rows, not just one.</p>
            <p>Is the rationale for developing the new method (or application) clearly explained?</p>
            <p>No</p>
            <p>Is the description of the method technically sound?</p>
            <p>Yes</p>
            <p>Are the conclusions about the method and its performance adequately supported by the findings presented in the article?</p>
            <p>Yes</p>
            <p>If any results are presented, are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Are sufficient details provided to allow replication of the method development and its use by others?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>computational genomics</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <sub-article article-type="response" id="comment5366-33280">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Hazelett</surname>
                            <given-names>Dennis</given-names>
                        </name>
                        <aff>Cedars-Sinai Medical Center, USA</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>30</day>
                    <month>3</month>
                    <year>2020</year>
                </pub-date>
            </front-stub>
            <body>
                <p>1) First, I would like to thank the reviewer for their comments and corrections. I address them below.</p>
                <p> &#x201c;- The term "functional category" is not defined.</p>
                <p> - The Introduction uses the term "functional category" to mean a state, where later that term is used to refer to a collection of data sets (such as "silencing marks")&#x201d;</p>
                <p> The definition of &#x201c;functional category&#x201d; has been expanded and clarified in the Implementation section of the methods. Briefly, specific assays may be assigned functional categories via an abstraction layer implemented in StateHub/StatePaintR, where assays that may represent similar biology e.g. ChIP-Seq for H3K27ac and H3K9ac may both be represented by the functional category &#x201c;Active&#x201d;. These functional categories may be combined into states following the rules of the decision matrix.</p>
                <p> &#x201c;- The form of the input and output are not explicitly mentioned.&#x201d;</p>
                <p> The form of the input and output data are now indicated in the Implementation section as consisting of BED files.</p>
                <p> &#x201c;- It is not explicitly mentioned that the model matrix is generated manually.&#x201d;</p>
                <p> In the expanded explanation of the decision matrix, it has been made clear how they may be constructed manually, or retrieved from StateHub.</p>
                <p> &#x00a0;&#x201c;- It is claimed that the information content of a state equals the sum of the cell values. However, it seems to me that the maximally-permissive value is 1 (neither required nor exclusionary), not 0.&#x201d;</p>
                <p> The reviewer is correct, and the language around this concept has been updated throughout the document. No longer do we refer to information content, as this was an unintentionally misleading term used to describe the complexity of the state. Revised language indicates that potential states are organized by state complexity, with lower complexity states called first. This takes into account the reviewers correct understanding that 1 is the most permissive state.</p>
                <p> &#x201c;- P3: "3 bit code". Should be 2 bit code.&#x201d;</p>
                <p> This correct and has been fixed in the text.</p>
                <p> &#x201c;- Figure 1: "red dotted arrows indicate non-matching rows". I don't understand -- each arrow connects to two rows, not just one.&#x201d;</p>
                <p> The figure legend has been updated to reflect the reading of the figure as a series of arrows, proceeding downward, pointing to the subsequent state. Either red, indicating that the segment was checked as being consistent with the state, and failing, or green indicating that the segment was consistent with the state call. In the context of the figure: &#x201c;In this example with the presence of H3K4me1 (&#x201c;Regulatory&#x201d;), H3K27ac (&#x201c;Active&#x201d;) and DNase1 hypersensitivity (&#x201c;Core&#x201d;), the first state consistent with the presence of these functional categories is &#x201c;Enhancer&#x201d;, followed by the increasingly more complex &#x201c;Regulatory Site&#x201d;, &#x201c;Active Chromatin&#x201d;, &#x201c;Active Enhancer&#x201d;, &#x201c;Enhancer Core&#x201d;, &#x201c;Active Chromatin Core&#x201d;, and finally &#x201c;Active Enhancer Core&#x201d;.&#x201d;</p>
            </body>
        </sub-article>
    </sub-article>
</article>
