<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.9703.1</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Research Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                    <subj-group>
                        <subject>Genomics</subject>
                    </subj-group>
                    <subj-group>
                        <subject>Neurogenetics</subject>
                    </subj-group>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>A putative role for genome-wide epigenetic regulatory mechanisms in Huntington&#x2019;s disease: A computational assessment</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 1; peer review: 1 approved, 1 not approved]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Mina</surname>
                        <given-names>Eleni</given-names>
                    </name>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>van Roon-Mom</surname>
                        <given-names>Willeke</given-names>
                    </name>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Verschure</surname>
                        <given-names>Pernette</given-names>
                    </name>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>'t Hoen</surname>
                        <given-names>Peter A.C.</given-names>
                    </name>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Thompson</surname>
                        <given-names>Mark</given-names>
                    </name>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Kaliyaperumal</surname>
                        <given-names>Rajaram</given-names>
                    </name>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Hettne</surname>
                        <given-names>Kristina</given-names>
                    </name>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Schultes</surname>
                        <given-names>Erik</given-names>
                    </name>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Mons</surname>
                        <given-names>Barend</given-names>
                    </name>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Roos</surname>
                        <given-names>Marco</given-names>
                    </name>
                    <xref ref-type="corresp" rid="c2">b</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>Department of Human Genetics, Leiden University Medical Center, Leiden, 2300 RC, The Netherlands</aff>
                <aff id="a2">
                    <label>2</label>Synthetic Systems Biology and Nuclear Organization Group, Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, 1098 XH, The Netherlands</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:e.mina@lumc.nl">e.mina@lumc.nl</email>
                </corresp>
                <corresp id="c2">
                    <label>b</label>
                    <email xlink:href="mailto:m.roos@lumc.nl">m.roos@lumc.nl</email>
                </corresp>
                <fn fn-type="con">
                    <p>EM designed and executed the experiments, analyzed the data and wrote the manuscript; WvRM helped design and interpret the experiments as the Huntington&#x2019;s Disease expert and reviewed the manuscript; PH helped design and interpret the experiments as bioinformatics expert and reviewed the manuscript; PJV reviewed the manuscript; RK, MT provided technical support for the web services; KMH helped with the CPA and reviewed the manuscript; EAS reviewed the manuscript; MR helped design and interpret the experiments, reviewed the manuscript, general supervision; BM: senior advice.</p>
                </fn>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>25</day>
                <month>10</month>
                <year>2017</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2017</year>
            </pub-date>
            <volume>6</volume>
            <elocation-id>1888</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>19</day>
                    <month>6</month>
                    <year>2026</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2017 Mina E et al.</copyright-statement>
                <copyright-year>2017</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/6-1888/pdf"/>
            <abstract>
                <p>
                    <bold>Background</bold>: Huntington&#x2019;s Disease (HD) is currently an incurable disease of the adult brain. Massive changes in gene expression are a prominent feature. Epigenetic effects have been reported to be implicated in the disease, but the role of chromatin is not well understood. We tested if the chromatin state of dysregulated genes in HD is affected at a genome-wide scale and have examined how epigenetic processes are associated with CpG-island-mediated gene expression.</p>
                <p>
                    <bold>Methods</bold>: Our general approach incorporates computational and functional analysis of public data before embarking on expensive wet-lab experiments. We compared the location in the genome of the genes that were deregulated in HD human brain, obtained from public gene expression data, to the location of particular chromatin marks in reference tissues using public data from the ENCODE project.</p>
                <p>
                    <bold>Results</bold>: We found that differentially expressed genes were enriched in the active chromatin state, but not enriched in the silent state. In the caudate nucleus, the most highly affected brain region in HD, genes in the active state were associated with transcription, cell cycle, protein transport and modification, RNA splicing, histone post-translational modifications and RNA processing, whereas genes in the repressed state were linked with developmental processes and responses related to zinc and cadmium stimulus. We confirmed that genes within CpG-islands are enriched among HD dysregulated genes in both human and mouse in HD. Epigenetic processes were associated more with genes that overlap with CpG-islands than with genes that do not.</p>
                <p>
                    <bold>Conclusion</bold>: Our results suggest that massive transcriptional dysregulation in HD is not matched by large-scale relocation of gene activity, i.e. that inactive chromatin regions are altered into actively expressed chromatin regions and vice versa. We expect that changes in epigenetic chromatin state might occur at the level of single genes (e.g. promoters, gene body) and scattered genomic sites (e.g. CTCF sites, enhancer regions) instead of large-scale genomic regions.</p>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>Huntington's disease</kwd>
                <kwd>gene expression</kwd>
                <kwd>CpG-islands</kwd>
                <kwd>caudate nucleus</kwd>
                <kwd>chromatin regions</kwd>
            </kwd-group>
            <funding-group>
                <award-group id="fund-1">
                    <funding-source>Netherlands Bioinformatics Centre</funding-source>
                </award-group>
                <award-group id="fund-2" xlink:href="http://dx.doi.org/10.13039/100011102">
                    <funding-source>Seventh Framework Programme</funding-source>
                    <award-id>FP7/2007-2013</award-id>
                    <award-id>305444</award-id>
                    <award-id>270129</award-id>
                </award-group>
                <award-group id="fund-3">
                    <funding-source>Innovative Medicines Initiative Joint Undertaking</funding-source>
                    <award-id>115191</award-id>
                </award-group>
                <funding-statement>The research leading to these results is supported by grants received from the Netherlands Bioinformatics Centre (NBIC) under the BioAssist program, the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreements No. 305444 (RD-Connect; HEALTH.2012.2.1.1-1-C) and No. 270129 (Wf4Ever; ICT-2009.4.1), and the Innovative Medicines Initiative Joint Undertaking project Open PHACTS (grant agreement No. 115191).</funding-statement>
                <funding-statement>
                    <italic>The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</italic>
                </funding-statement>
            </funding-group>
        </article-meta>
    </front>
    <body>
        <sec sec-type="intro">
            <title>Introduction</title>
            <p>Huntington&#x2019;s Disease (HD) is a complex disease of the brain associated with massive changes in gene expression. The genetic cause was identified in 1993
                <sup>
                    <xref ref-type="bibr" rid="ref-1">1</xref>
                </sup>, but no successful treatment has been found yet. HD is a dominantly inherited neurodegenerative disease that affects 1 &#x2013; 10/100.000 individuals, making it the most common heritable neurodegenerative disorder
                <sup>
                    <xref ref-type="bibr" rid="ref-2">2</xref>
                </sup>.</p>
            <p>Although HD is considered a monogenic disease
                <sup>
                    <xref ref-type="bibr" rid="ref-3">3</xref>
                </sup>, extensive research since 1993 into the underlying pathology suggests that the disease mechanisms are more complex than originally considered. Transcriptional dysregulation is a widespread phenomenon in HD that can be observed well before the first clinical symptoms appear
                <sup>
                    <xref ref-type="bibr" rid="ref-4">4</xref>
                </sup>. It suggests that mutant huntingtin causes a broad and complex cascade of downstream effects. There are several ways in which mutant huntingtin can interfere with the transcriptional machinery and alter gene expression
                <sup>
                    <xref ref-type="bibr" rid="ref-5">5</xref>,
                    <xref ref-type="bibr" rid="ref-6">6</xref>
                </sup>. Given the extent of the transcriptional changes, the question arises if epigenetic mechanisms that operate at a genome-wide scale are also involved in HD. There is increasing evidence for epigenetic mechanisms playing a role in HD in human and model systems. For instance, earlier computational analysis of HD gene expression data showed that expression is deregulated in large genomic regions, indicative of a coordinated genome-wide mechanism
                <sup>
                    <xref ref-type="bibr" rid="ref-7">7</xref>
                </sup>. These studies did not determine whether genome-wide alterations in gene expression are associated with changes in the composition of histone modifications.</p>
            <p>More recently, H3K4me3 was shown to be enriched in 136 loci in an HD case/control study, which included genes that may affect the neuronal epigenome at large
                <sup>
                    <xref ref-type="bibr" rid="ref-8">8</xref>
                </sup>. In HD mice, down-regulated genes were found to be associated with a selective decrease of H3K27ac and RNA Polymerase II
                <sup>
                    <xref ref-type="bibr" rid="ref-9">9</xref>
                </sup>. Inhibitors of HDAC1, broad-range regulators of chromatin structure, have indeed been shown to be effective in HD mice
                <sup>
                    <xref ref-type="bibr" rid="ref-10">10</xref>
                </sup>. The interplay of enzymes involved in post-translational histone modifications like histone deacetylaces (HDACs), may make chromatin less accessible in many places and therefore alter gene expression patterns. In Drosophila, the homolog of htt was found to suppress position effect variegation (PEV), possibly by influencing PEV modifier genes
                <sup>
                    <xref ref-type="bibr" rid="ref-11">11</xref>
                </sup>.</p>
            <p>A role for epigenetic mechanisms in HD is further corroborated by numerous neurodevelopmental and neurodegenerative disorders that have been associated with an altered chromatin structure
                <sup>
                    <xref ref-type="bibr" rid="ref-12">12</xref>&#x2013;
                    <xref ref-type="bibr" rid="ref-14">14</xref>
                </sup>. Neuroepigenetics has therefore become a prime topic of interest
                <sup>
                    <xref ref-type="bibr" rid="ref-15">15</xref>
                </sup>, with researchers seeking to identify epigenomic signatures and how they can contribute to brain health or brain disease
                <sup>
                    <xref ref-type="bibr" rid="ref-16">16</xref>
                </sup>.</p>
            <p>Considering the growing body of evidence that epigenetic regulation is involved in neurological pathology, we asked if the massive transcriptional dysregulation in HD coincides with large scale changes in chromatin state across the genome. Under that assumption, we hypothesize that significant numbers of differentially expressed genes in HD will be found in regions that are not normally associated with active chromatin states.</p>
            <p>Here we report on a computational test of our hypothesis before laboratory experiments using only publicly available data sets: HD gene expression from the Gene Expression Omnibus (GEO)
                <sup>
                    <xref ref-type="bibr" rid="ref-17">17</xref>
                </sup> and chromatin state from ENCODE
                <sup>
                    <xref ref-type="bibr" rid="ref-18">18</xref>
                </sup>. We assessed the overlap between differentially expressed genes and chromatin state, and we applied literature-based concept profile analysis (CPA) to interpret our findings
                <sup>
                    <xref ref-type="bibr" rid="ref-19">19</xref>,
                    <xref ref-type="bibr" rid="ref-20">20</xref>
                </sup>.</p>
            <p>Our results indicate that the massive transcriptional dysregulation in HD is not matched by a significant largescale change of activity of genes that are part of inactive chromatin states in reference tissue. Our report includes a functional characterization of differentially expressed genes in HD in relation to huntingtin, chromatin state and CpG islands, based on a literature-based semantic analysis.</p>
            <p>The analysis we performed to test our hypothesis is part of an interdisciplinary research approach where computational analysis is used to help steer laboratory experiments to increase the overall efficiency of a research laboratory
                <sup>
                    <xref ref-type="bibr" rid="ref-21">21</xref>
                </sup>.</p>
        </sec>
        <sec sec-type="methods">
            <title>Methods</title>
            <sec>
                <title>Concept profile analysis</title>
                <p>In Concept Profile Analysis (CPA), the vector space model is used to associate two concepts mined from the literature with each other. Advantages of this model include efficient and transparent comparisons, and the possibility of attaching a weight to the association
                    <sup>
                        <xref ref-type="bibr" rid="ref-20">20</xref>
                    </sup>. The CPA algorithms have previously been used for a range of different gene expression data analysis purposes such as functional annotation
                    <sup>
                        <xref ref-type="bibr" rid="ref-22">22</xref>
                    </sup>, comparison of studies
                    <sup>
                        <xref ref-type="bibr" rid="ref-23">23</xref>
                    </sup>, prediction of novel interactions
                    <sup>
                        <xref ref-type="bibr" rid="ref-24">24</xref>
                    </sup>, generation of gene sets
                    <sup>
                        <xref ref-type="bibr" rid="ref-19">19</xref>,
                        <xref ref-type="bibr" rid="ref-25">25</xref>
                    </sup>, and association with chemical structures
                    <sup>
                        <xref ref-type="bibr" rid="ref-26">26</xref>
                    </sup>). The methodology has been described previously
                    <sup>
                        <xref ref-type="bibr" rid="ref-27">27</xref>
                    </sup>. In short: In our database, every concept is associated with PubMed records using the indexing engine Peregrine (
                    <ext-link ext-link-type="uri" xlink:href="https://trac.nbic.nl/data-mining/">https://trac.nbic.nl/data-mining/</ext-link>) which is equipped with an in-house thesaurus of biomedical and chemical concepts that have been prepared for text mining
                    <sup>
                        <xref ref-type="bibr" rid="ref-28">28</xref>,
                        <xref ref-type="bibr" rid="ref-29">29</xref>
                    </sup>. For all concepts except genes and Gene Ontology (GO) terms the PubMed records are comprised of the texts in which the concept is mentioned. For genes, only a subset of PubMed records are used in order to limit the impact of ambiguous terms and distant homologs. GO terms are sometimes given as words or phrases that are infrequently found in the normal texts. To still provide broad coverage of GO terms, the PubMed records that were used as evidence for annotating genes with the GO term are added. For every concept in the thesaurus that is associated to at least five PubMed records, a vector containing all concepts related to the main concept (direct co-occurrence), weighted by the symmetric uncertainty coefficient is created. We call this a "Concept Profile". Concept profiles are matched to identify similarities via their shared concepts (indirect relations). Any distance measure can be used for this matching such as the mutual information, inner product, cosine angle, Euclidean distance or Pearson&#x2019;s correlation. The CPA Web Services that we used for our analysis use an inner product measure
                    <sup>
                        <xref ref-type="bibr" rid="ref-30">30</xref>
                    </sup>. These web services can be found in the BioCatalogue web service registry 
                    <ext-link ext-link-type="uri" xlink:href="https://www.biocatalogue.org/services/3559">https://www.biocatalogue.org/services/3559</ext-link>.</p>
            </sec>
            <sec>
                <title>Data analysis and interpretation</title>
                <p>For data analysis and interpretation we implemented a series of workflows using the Taverna workbench (Taverna workbench version 2.4.0)
                    <sup>
                        <xref ref-type="bibr" rid="ref-31">31</xref>,
                        <xref ref-type="bibr" rid="ref-32">32</xref>
                    </sup>. Taverna is an open source software for the development and execution of workflows. Our workflows are deposited online on the Zenodo repository (
                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.164201">https://doi.org/10.5281/zenodo.164201</ext-link>
                    <sup>
                        <xref ref-type="bibr" rid="ref-33">33</xref>
                    </sup> and 
                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/ 10.5281/zenodo.164198">https://doi.org/10.5281/zenodo.164198</ext-link>
                    <sup>
                        <xref ref-type="bibr" rid="ref-34">34</xref>
                    </sup>) and the myExperiment platform (
                    <ext-link ext-link-type="uri" xlink:href="http://www.myexperiment.org/packs/ 553">http://www.myexperiment.org/packs/553</ext-link>).</p>
            </sec>
            <sec>
                <title>Workflows for data analysis</title>
                <p>The first Taverna workflow load_data_identify_DE_genes_Array_A, was implemented to examine differential gene expression between control and HD samples. Required workflow inputs were two data files with gene expression values and the phenotype information that describes the samples from the microarray experiment. Differential expression was computed using moderated t statistics with the package limma
                    <sup>
                        <xref ref-type="bibr" rid="ref-35">35</xref>
                    </sup> (version 3.14.1), which is provided by the bioconductor project
                    <sup>
                        <xref ref-type="bibr" rid="ref-36">36</xref>
                    </sup>, R version 
                    <ext-link ext-link-type="uri" xlink:href="https://www.bioconductor.org/">https://www.bioconductor.org/</ext-link>. We analyzed each brain region separately, because previous analysis
                    <sup>
                        <xref ref-type="bibr" rid="ref-37">37</xref>
                    </sup> revealed regional patterns in gene expression. The workflow maps the expression data from probes to entrez gene ids using the Affymetrix Human Genome U133 set annotation data, (packages hgu133a and hgu133b, version 2.8.0; 
                    <ext-link ext-link-type="uri" xlink:href="https://bioconductor.org/packages/hgu133a.db">https://bioconductor.org/packages/hgu133a.db</ext-link>; 
                    <ext-link ext-link-type="uri" xlink:href="https://bioconductor.org/packages/hgu133b.db/">https://bioconductor.org/packages/hgu133b.db/</ext-link>). When multiple probe names map to the same gene id, the ones exhibiting the most significant changes were used for further analysis. Final outcome of this workflow is a report. Each row is composed of a gene id, a fold change and its corresponding 
                    <italic toggle="yes">P</italic>-value indicating the significance of every change in gene expression, between HD and controls for each brain region. Adjusted 
                    <italic toggle="yes">P</italic>-values, generated by Benjamini and Hochberg&#x2019;s method for multiple testing correction, are also included
                    <sup>
                        <xref ref-type="bibr" rid="ref-38">38</xref>
                    </sup>. This workflow can be adjusted to compute differential gene expression between other variables such as male/female or the grade of disease pathology by editing the nested workflow &#x201c;compute_DE_limma&#x201d; within the R workflow component. An additional workflow create_exprs_obj_download_files is included in the myExperiment pack that was used to download data from the ArrayExpress repository. The workflow saves the gene expression data and the corresponding phenotype file in the directory indicated in the workflow input.</p>
                <p>We note that this particular microarray experiment was composed of two microarrays, Human Genome U133A and U133B. For convenience we added the workflow: Get differentially expressed genes for Array B one brain region, but in principle the first workflow could be reused by adjusting the &#x201c;libraries&#x201d; component.</p>
                <p>The second workflow map_genes_on_chromosome uses the output from the first workflow in order to map genes to their corresponding genomic location. The workflow uses the Biomart
                    <sup>
                        <xref ref-type="bibr" rid="ref-39">39</xref>
                    </sup> service within R, to obtain information regarding the position of each gene at the chromosome, HGNC gene symbols
                    <sup>
                        <xref ref-type="bibr" rid="ref-40">40</xref>
                    </sup>, transcription start and end site and the transcription strand. The database that was used was the Ensembl genes 68 from Sanger institute and the 
                    <italic toggle="yes">Homo Sapiens</italic> dataset GRCh37.p8. The mouse assembly that was used to map genes to their chromosomal location was Dec 2011, GRCm38mm10.</p>
                <p>The last workflow get_promoter_region_calculate_overlaps, first computes a promoter region for each gene and then operates on genomic intervals to compute gene promoters that overlap with a genomic region. The promoter region is computed for each gene, according to prespecified values, indicating the number of base pairs (bp) upstream and downstream of the transcriptional start site (TSS); for the CpG island analysis 5000bp upstream and 2000bp downstream and for the chromatin states analysis 50bp upstream and 50bp downstream was used. The decision for the promoter size in each case was taken after discussing with the domain experts and from knowledge acquired from previous experiments using the ENCODE data from Ernst 
                    <italic toggle="yes">et al.</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref-41">41</xref>
                    </sup>. Using those data we performed multiple runs with different input values for the &#x201c;upstream&#x201d; and &#x201c;downstream&#x201d; variables, and the overlap parameter (
                    <xref ref-type="other" rid="SF1">Supplementary File 1</xref>).</p>
                <p>When genes had multiple transcription start sites, we computed a promoter region for every TSS. Next part of this workflow is to compute overlapping regions between the input datasets. It includes a two sample Kolmogorov &#x2013; Smirnov test to compare the empirical cumulative distribution functions (ecdf) of the 
                    <italic toggle="yes">P</italic>-values between the gene promoters that overlap with a specific genomic region and the ones that do not. The null hypothesis tested here was that there is no difference between the two groups. The workflow returns two lists of genes, one for the genes that overlap with a particular genomic region and another that does not. Furthermore, the results of the statistical test are reported: the ks test statistic (maximum distance D between the ecdf of the two samples) and the 
                    <italic toggle="yes">P</italic>-value of the test. If 
                    <italic toggle="yes">P</italic>-value &lt; 0.05 we reject the null hypothesis.</p>
            </sec>
            <sec>
                <title>Workflows for data interpretation</title>
                <p>The workflows that we implemented for gene interpretation and gene prioritization are based on the workflow pack at Zenodo, 
                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.164198">https://doi.org/10.5281/zenodo.164198</ext-link> (see also, at myExperiment: 
                    <ext-link ext-link-type="uri" xlink:href="http://www.myexperiment.org/packs/368">http://www.myexperiment.org/packs/368</ext-link>), and are implemented using CPA web services
                    <sup>
                        <xref ref-type="bibr" rid="ref-30">30</xref>
                    </sup>.</p>
                <p>The CPA workflow Annotate gene list with top ranking concepts annotates a gene list with top ranking concepts by matching concept profiles of genes with for example in our case the concept set of Biological Processes. The web services part of this workflow query the Anni database
                    <sup>
                        <xref ref-type="bibr" rid="ref-27">27</xref>
                    </sup> that stores the concept profiles for each concept of interest. The first web service mapDatabaseIDListToConceptIDs maps a list of concepts, in our case Entrez gene identifiers, to their corresponding concept profile ids. Necessary inputs are a concept list (gene list) in a comma separated file, and the database identifier of the gene list necessary for the mapping (EG for Entrez Gene, see here: 
                    <ext-link ext-link-type="uri" xlink:href="https://www.biocatalogue.org/soap_operations/41197">https://www.biocatalogue.org/soap_operations/41197</ext-link> for more details on database identifiers). The next web service getSimilarConceptProfilesPredefined matches our gene list with the predefined concept set &#x201c;Biological Process&#x201d; (ID = &#x201c;5&#x201d;), and gives the top scoring biological processes that describe our gene list. For a complete list of predefined concept sets, the workflow List Concept Sets (provided in the current pack) can be run to choose the ID of the predefined concept set of interest. The web service getConceptName, gives the complete (human readable) names of the top matching biological processes. Lastly, the workflow Explain score between two concepts can be included to the analysis to provide evidence for the association between each gene and the annotations of the biological processes. The evidence reported is a list of concepts that link one concept with another and the contribution to the overall strength of the association. In addition, the corresponding concept ids are reported.</p>
                <p>The workflow Prioritize gene list can prioritize a set of genes with respect to their association with particular concepts, in our case the HTT concept and epigenetics (concept profile: &#x201c;epigene&#x201d;). In order to obtain the concept profile identifiers the workflow getConceptSuggestionsFromTerm needs to run first.</p>
            </sec>
            <sec>
                <title>Data obtained from public sources</title>
                <p>
                    <bold>
                        <italic toggle="yes">Human brain data.</italic>
                    </bold> The HD human brain data that was used in this analysis was originally produced and analyzed by A. Hodges and co-workers
                    <sup>
                        <xref ref-type="bibr" rid="ref-37">37</xref>
                    </sup>. This experiment contains 44 HD positive cases and 36 age and sex matched controls. The processed data are available from the public repository NCBI Gene Expression Omnibus, entry GSE3790. Three brain regions were included and analyzed; the caudate nucleus, frontal cortex and cerebellum, with an Affymetrix Microarray GeneChip (Human Genome U133A and U133B). Furthermore, the HD positive cases were further classified based on whether symptoms were present or absent and according to Vonsattel grade of disease pathology (scale = 0 &#x2013; 4). In our analysis we used the processed data and performed our own differential gene expression analysis (
                    <xref ref-type="other" rid="DS0">Dataset 1</xref>
                    <sup>
                        <xref ref-type="bibr" rid="ref-42">42</xref>
                    </sup>) with the workflow that performs differential gene expression analysis that was described previously in Methods.</p>
                <supplementary-material id="DS0" orientation="portrait" position="float" xlink:href="https://f1000researchdata.s3.amazonaws.com/datasets/9703/e2f95098-3820-4da3-910b-33405e704ba7_Dataset1_gene_expression_data.zip">
                    <label>Gene expression data</label>
                    <caption>
                        <p>This folder contains the gene expression data for the three human brain regions and the three mouse brain regions.</p>
                    </caption>
                </supplementary-material>
                <p>
                    <bold>
                        <italic toggle="yes">CpG island data.</italic>
                    </bold> CpG island information in the human genome was obtained from UCSC genome browser
                    <sup>
                        <xref ref-type="bibr" rid="ref-43">43</xref>
                    </sup>, hg19 assembly FEB 2009 1 (
                    <xref ref-type="other" rid="DS1">Dataset 2</xref>
                    <sup>
                        <xref ref-type="bibr" rid="ref-44">44</xref>
                    </sup>). Here, CpG islands are marked as the DNA regions where the following conditions hold:</p>
                <list list-type="bullet">
                    <list-item>
                        <p>GC content of 50% or greater</p>
                    </list-item>
                    <list-item>
                        <p>length greater than 200 bp</p>
                    </list-item>
                    <list-item>
                        <p>ratio greater than 0.6 of observed number of CG dinucleotides to the expected number on the basis of the number of Gs and Cs in the segment</p>
                    </list-item>
                </list>
                <p>The ratio observed/expected (Obs/Exp) CpG was calculated as follows:</p>
                <p>
                    <disp-formula>
                        <mml:math display="inline" id="math1">
                            <mml:mrow>
                                <mml:mrow>
                                    <mml:mrow>
                                        <mml:mtext>Obs</mml:mtext>
                                    </mml:mrow>
                                    <mml:mo>/</mml:mo>
                                    <mml:mrow>
                                        <mml:mtext>Exp CpG</mml:mtext>
                                    </mml:mrow>
                                </mml:mrow>
                                <mml:mo>=</mml:mo>
                                <mml:mfrac>
                                    <mml:mrow>
                                        <mml:mtext>number of CpG</mml:mtext>
                                    </mml:mrow>
                                    <mml:mrow>
                                        <mml:mtext>number of C</mml:mtext>
                                        <mml:mo>&#x00d7;</mml:mo>
                                        <mml:mtext>number of G</mml:mtext>
                                    </mml:mrow>
                                </mml:mfrac>
                                <mml:mo>&#x00d7;</mml:mo>
                                <mml:mi>N</mml:mi>
                                <mml:mo>,</mml:mo>
                            </mml:mrow>
                        </mml:math>
                    </disp-formula>
                </p>
                <p>where N is the total amount of nucleotides in the sequence that is being analyzed. For CpG island information of the mouse genome the assembly dec 2011 (GRCm38/mm10) was used.</p>
                <supplementary-material id="DS1" orientation="portrait" position="float" xlink:href="https://f1000researchdata.s3.amazonaws.com/datasets/9703/27687277-d496-46ad-abf2-36dabb6e0c0d_Dataset2_CpG_islands.zip">
                    <label>CpG islands</label>
                    <caption>
                        <p>This folder contains the CpG island information both for human and the mouse data.</p>
                    </caption>
                </supplementary-material>
                <p>
                    <bold>
                        <italic toggle="yes">Chromatin states data.</italic>
                    </bold> The chromatin marks were obtained from the encode project
                    <sup>
                        <xref ref-type="bibr" rid="ref-45">45</xref>
                    </sup>. The chromatin states were part of an integrative analysis of 111 reference human epigenomes profiled for histone modification patterns based on DNA accessibility, DNA methylation and RNA expression. We used the two cell types that were more suitable for our analysis; the anterior caudate and dorsolateral prefrontal cortex (
                    <xref ref-type="other" rid="DS2">Dataset 3</xref>
                    <sup>
                        <xref ref-type="bibr" rid="ref-46">46</xref>
                    </sup> and 
                    <xref ref-type="other" rid="DS3">Dataset 4</xref>
                    <sup>
                        <xref ref-type="bibr" rid="ref-47">47</xref>
                    </sup>, respectively). The chromatin states we used for the current analysis were: active transcription start site proximal promoter (TssA), bivalent regulatory region (TssBiv), heterochromatin (Het) and repressed Polycomb (ReprPC).</p>
                <supplementary-material id="DS2" orientation="portrait" position="float" xlink:href="https://f1000researchdata.s3.amazonaws.com/datasets/9703/1abab5f9-aa1c-4859-9197-31c0ef3c361d_Dataset3_anterior_caudate_chromatin_states.zip">
                    <label>Chromatin states for the anterior caudate</label>
                    <caption>
                        <p>This folder contains the four chromatin state data for the anterior caudate. Active TSS proximal promoter:TssA, bivalent regulatory region: TssBiv, heterochromatin: Het, repressed Polycomb: ReprPC.</p>
                    </caption>
                </supplementary-material>
                <supplementary-material id="DS3" orientation="portrait" position="float" xlink:href="https://f1000researchdata.s3.amazonaws.com/datasets/9703/0033695b-c47f-475d-935d-6527c2dd63d0_Dataset4_dorsolateral_prefrontal_cortex_chromatin_states.zip">
                    <label>Chromatin states for the prefrontal cortex</label>
                    <caption>
                        <p>This folder contains the four chromatin state data for the prefrontal cortex. Active TSS proximal promoter:TssA, bivalent regulatory region: TssBiv, heterochromatin: Het, repressed Polycomb: ReprPC.</p>
                    </caption>
                </supplementary-material>
                <p>
                    <bold>
                        <italic toggle="yes">Mouse data.</italic>
                    </bold> The mouse brain data was taken from a published study
                    <sup>
                        <xref ref-type="bibr" rid="ref-10">10</xref>
                    </sup>. There, total RNA was extracted from cortex, striatum, and cerebellum from WT and R62 transgenic mice. This study examines the effect of the HDACi 4b inhibitor on the disease phenotype. However only the data from animals treated with vehicle was used (
                    <xref ref-type="other" rid="DS0">Dataset 1</xref>
                    <sup>
                        <xref ref-type="bibr" rid="ref-42">42</xref>
                    </sup>). This study used Illumina Mouse Mouseref-8 Expression Beadchips v1. The raw data were analyzed using the Bioconductor packages and contrast analysis of differential expression was performed by using the LIMMA package. The differential expression values are available in the supplemental material of that publication.</p>
            </sec>
        </sec>
        <sec sec-type="results">
            <title>Results</title>
            <sec>
                <title>Chromatin state analysis and semantic interpretation</title>
                <p>To test if and how differential gene expression in HD is associated with particular chromatin states we used publicly available datasets from the GEO and ENCODE public repositories. We selected HD gene expression data from three regions of the brain made available by Hodges and coworkers
                    <sup>
                        <xref ref-type="bibr" rid="ref-37">37</xref>
                    </sup>, and data from the ENCODE consortium carrying information about four chromatin states. These were: active TSS proximal promoter (TssA), bivalent regulatory region (TssBiv), heterochromatin (Het) and repressed Polycomb (ReprPC)
                    <sup>
                        <xref ref-type="bibr" rid="ref-45">45</xref>
                    </sup>. Briefly, the first state pertains to active genes, the second to repressed genes that are ready to be activated, and the latter two represent repressed genes. The chromatin states were part of an integrative analysis of 111 reference human epigenomes profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression. The different chromatin states were defined by a computational model that is based on a multivariate Hidden Markov Model
                    <sup>
                        <xref ref-type="bibr" rid="ref-48">48</xref>
                    </sup>.</p>
                <p>First we confirmed the results reported by the previous study
                    <sup>
                        <xref ref-type="bibr" rid="ref-37">37</xref>
                    </sup>: large and numerous changes in transcriptional activity in the caudate nucleus brain region and notably smaller changes in the frontal cortex and cerebellum. More specifically, the number of differentially expressed genes was 5219, 127 and 96 for each brain region respectively with a FDR of 0.05. These results confirm previous observations illustrating that specific HD-affected brain regions exhibit defined changes in gene expression, in line with observable physiological effects
                    <sup>
                        <xref ref-type="bibr" rid="ref-37">37</xref>
                    </sup>.</p>
                <p>Secondly, we paired the data describing gene expression changes in the caudate nucleus and frontal cortex from the Hodges study to chromatin state data from the anterior caudate and dorsolateral prefrontal cortex from ENCODE. We considered these brain regions as the most comparable between the two studies. The cerebellum was excluded from this part of our analysis due to the absence of chromatin state data at ENCODE for the cerebellum. We determined the reference chromatin state of the genes from the gene expression study by determining the overlap of their promoter regions with the start and end positions of the chromatin state from the aforementioned brain regions used by ENCODE. The effect of a chromatin state on differential gene expression in HD was assessed by comparing the distribution of expression levels of genes overlapping with a particular chromatin state with the distribution of expression of non-overlapping genes. If chromatin state has substantial functional impact on the genes that are differentially expressed in HD, then these are expected to be significantly different. More details on the promoter region calculation and the overlap between the genomic regions can be found in the Methods section.</p>
                <p>Specifically, we compared the distribution of the 
                    <italic toggle="yes">P</italic>-values for differential expression between these two groups of genes and assessed the difference by a KolmogorovSmirnov test, in the two brain regions and for each chromatin state (
                    <xref ref-type="fig" rid="f1">Figure 1</xref>). In the caudate region, we found an enrichment of genes overlapping with the active TSS and a depletion of genes overlapping with ReprPC (
                    <xref ref-type="fig" rid="f1">Figure 1</xref>). Heterochromatic or bivalent chromatin were not significantly associated with a genomic location of differentially expressed genes. A similar, but less pronounced pattern was observed for the frontal cortex for both the active TSS and ReprPC. In addition, in the frontal cortex the number of genes overlapping with the bivalent state was reduced. The magnitude of the association of a brain region with each chromatin state is in line with the HD neurodegeneration pattern, where the caudate nucleus exhibits the largest gene expression changes while the frontal cortex exhibits an intermediate to low pathology. In summary, we found enrichment of differentially expressed genes in the active chromatin state of reference tissue, but no strong evidence for significant enrichment in chromatin regions that are associated with repressed gene expression activity.</p>
                <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                    <label>Figure 1. </label>
                    <caption>
                        <title>Chromatin states.</title>
                        <p>For each brain region, the 
                            <italic toggle="yes">P</italic>-value distribution for differential expression in HD patients compared to controls was compared between genes overlapping with a specific chromatin state and all other genes, in the anterior caudate and dorsolateral prefrontal cortex cells. The plot displays the maximum absolute distance D between the cumulative distribution function of the 
                            <italic toggle="yes">P</italic>-values for each group of genes (overlapping and non overlapping), as reflected by the KS test statistic. Positive values correspond to an enrichment of differentially expressed genes in a chromatin state, negative values with a depletion. Stars indicate a significant enrichment/depletion cf the KS test (
                            <italic toggle="yes">P</italic>-value &lt; 0.05). TssA= active TSS, TssBiv= bivalent Tss, Heterochr= Heterochromatin, ReprPC= repressed Polycomb state. For caudate: TssA D= 0.31, TssBiv D=0.04, Heterochr D=0.17, ReprPC D= 0.27, for frontal cortex TssA D=0.168, TssBiv D=0.097, Heterochr D=0.095, ReprPC D=0.147. The corresponding p-values were for the caudate : TssA pval &lt; 2.2e-16, TssBiv pval= 0.59, Heterochr pval= 0.10, ReprPC pval=4.5167e-12, for frontal cortex TssA pval &lt; 2.2e-16, TssBiv pval= 0.0002, Heterochr pval= 0.872, ReprPC pval= 1.4596e-05.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/10459/812c50b0-babf-4c92-a322-d5a0854366cc_figure1.gif"/>
                </fig>
                <p>To further interpret the results from the chromatin state analysis, we used literature information (CPA
                    <sup>
                        <xref ref-type="bibr" rid="ref-19">19</xref>,
                        <xref ref-type="bibr" rid="ref-27">27</xref>
                    </sup>) to assess the biological processes within the lists of genes per chromatin state that were associated with gene deregulation (
                    <xref ref-type="other" rid="DS4">Dataset 5</xref>
                    <sup>
                        <xref ref-type="bibr" rid="ref-49">49</xref>
                    </sup>).</p>
                <p>Using CPA, we found that for the caudate nucleus, genes in the active TSS state are mainly associated with processes related to transcription, cell cycle, protein transport and modification, RNA splicing, chromatin modifications and RNA processing. Genes in the repressed polycomb chromatin state are mainly associated with (brain) developmental processes and responses related to zinc and cadmium stimulus.</p>
                <p>For the frontal cortex, genes in the active TSS state are associated with protein transport and modification, cell cycle, RNA splicing and signal transduction pathways (MAPK, notch, smoothened). Genes in the bivalent TSS are associated with brain development, neurogenesis, synapse and responses related to zinc and cadmium stimulus. We found that the genes that are part of the polycomb repressed group to be associated with similar functions as the genes in the bivalent state: (brain) developmental processes, responses related to zinc, cadmium and copper stimuli.</p>
                <p>We note that at this point of our analysis, we filtered out genes of which the promoters were labelled with more than one state, using the criterion of at least 50 bps overlap in a promoter region of 100 bps. Interpretation of this set of genes would be ambivalent.</p>
                <supplementary-material id="DS4" orientation="portrait" position="float" xlink:href="https://f1000researchdata.s3.amazonaws.com/datasets/9703/8662bfeb-e12e-402f-8e7b-d34efa40d038_Dataset5_annotations_genes_Biological_Processes.txt">
                    <label>Unique annotations of genes in each chromatin state with Biological Processes</label>
                    <caption>
                        <p>This file contains the results from the semantic analysis with Biological Processes of the genes that were overlapping with active TSS, bivalent TSS, heterochromatin and repressed polycomb chromatin state. The annotations that we present here are uniquely characterizing each gene list (as resulted from the CPA out of the top 50 annotations).</p>
                    </caption>
                </supplementary-material>
            </sec>
            <sec>
                <title>Overlap of HD deregulated genes with CpG islands and semantic interpretation</title>
                <p>Because CpG island methylation is a known epigenetic regulatory mechanism that is ubiquitous in the human genome and may be a target for genome-wide regulation
                    <sup>
                        <xref ref-type="bibr" rid="ref-50">50</xref>,
                        <xref ref-type="bibr" rid="ref-51">51</xref>
                    </sup>, we also applied our approach to test if genes within CpG islands are overrepresented among differentially expressed genes in HD. We measured this by similar KS test statistics as in the previous section. We found that genes overlapping with CpG-islands in their promoter region, were significantly enriched in the group of HD-deregulated genes in all three brain regions (
                    <xref ref-type="fig" rid="f2">Figure 2A</xref>; 
                    <italic toggle="yes">P</italic>-value &lt; 0.05), which supports taking this mechanism into account to formulate hypotheses when studying gene deregulation in HD.</p>
                <fig fig-type="figure" id="f2" orientation="portrait" position="float">
                    <label>Figure 2. </label>
                    <caption>
                        <title>CpG island effect in human and mouse data.</title>
                        <p>Maximum distance between the cumulative distribution function of the 
                            <italic toggle="yes">P</italic>-values between genes containing a CpG island in their promoter and genes that do not. The maximum distance (D) for each brain region was plotted for both datasets (human (
                            <bold>A</bold>) and mouse (
                            <bold>B</bold>) datasets). The gradient adjacent to each plot indicates the extent of neurodegeneration in each brain region. Black represents severe and white mild neurodegeneration. 
                            <bold>A</bold>: The analysis performed in the human data. Caudate nucleus is exhibiting the largest differences with a distance D between the two distributions of 0.2808, frontal cortex follows with D = 0.1482 and cerebellum with D = 0.147. 
                            <bold>B</bold>: The analysis performed in the mouse data. The results are analogous to the human data with striatum showing the largest differences with D= 0.118, cortex following with D=0.0739 and cerebellum with an insignificant difference (
                            <italic toggle="yes">P</italic>-value = 0.596 &gt;&gt; 0.05) of D = 0.0383. * : depicts significance.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/10459/812c50b0-babf-4c92-a322-d5a0854366cc_figure2.gif"/>
                </fig>
                <p>To further analyze and verify the association of gene deregulation in HD with the presence of CpG islands, we also analysed overrepresentation in data obtained from a HD transgenic mouse model
                    <sup>
                        <xref ref-type="bibr" rid="ref-10">10</xref>
                    </sup>. Our results based on the mouse data corroborated those from the human data (
                    <xref ref-type="fig" rid="f2">Figure 2B</xref>). Comparable to the results for caudate nucleus in human data, the most highly affected mouse brain region, the striatum, showed the biggest overrepresentation of CpG islands among deregulated genes.</p>
                <p>We next examined the biological processes that are associated with the differentially expressed genes that do or do not overlap with CpG islands by CPA. We inspected the top 50 annotations of each group. We identified many similarities between these two groups, but also annotations that were specific to each group (
                    <xref ref-type="other" rid="DS5">Dataset 6</xref>
                    <sup>
                        <xref ref-type="bibr" rid="ref-52">52</xref>
                    </sup>). For example, concepts related to chromatin alterations, such as chromatin remodelling and chromatin (dis)assembly, and histone post-translational modifications, such as (de)acetylation, and (de)methylation, were only found with a high rank in the list of genes containing CpG islands in their promoters.</p>
                <p>Conversely, lymphocyte activation, angiogenesis, antigen presentation and neurogenesis were only high ranking associations for the non-CpG containing genes.</p>
                <p>Some annotations were found in both groups but in a different ranking order. For example, the rankings of gene silencing, RNA splicing and phosphorylation were increased for genes within CpG islands while transcriptional activation and mitotic cell cycle rankings were higher for non-CpG containing genes (
                    <xref ref-type="other" rid="DS5">Dataset 6</xref>
                    <sup>
                        <xref ref-type="bibr" rid="ref-52">52</xref>
                    </sup>).</p>
                <supplementary-material id="DS5" orientation="portrait" position="float" xlink:href="https://f1000researchdata.s3.amazonaws.com/datasets/9703/15a2c01a-a69a-4636-854d-a1d1e2c123ab_Dataset6_CpG_annotations.txt">
                    <label>Annotations of CpG containing genes and non CpG containing genes</label>
                    <caption>
                        <p>The top 50 annotations characterising each group of genes with CpG islands and without, resulted from the semantic analysis of those two groups of genes with Biological Processes.</p>
                    </caption>
                </supplementary-material>
            </sec>
            <sec>
                <title>Semantic analysis to identify proteins associated with HTT and epigenetics</title>
                <p>CPA can also be used to prioritize findings by a specific biological interest. In our case, we aimed to prioritise the list of genes overlapping with CpG islands that were also differentially expressed in HD caudate nucleus by their association with HTT and epigenetics. Here, we prioritised 100 proteins based on their association with HTT and epigenetics and absence of a direct relation with HTT by CPA (
                    <xref ref-type="other" rid="DS6">Dataset 7</xref>
                    <sup>
                        <xref ref-type="bibr" rid="ref-53">53</xref>
                    </sup>). Such relations are typically novel, or lost in tables or figures. If a novel relation is found (i.e. the relation is not found in our database of relations in MedLine abstracts), then CPA also provides intermediate concepts that link the two concepts.</p>
                <p>In 
                    <xref ref-type="table" rid="T1">Table 1</xref>, we present the top 5 novel proteins that have the strongest association, and the intermediate concepts that link each prioritized protein to HTT and epigenetics. The intermediate concepts are grouped under the semantic categories &#x201c;General&#x201d;, &#x201c;Biological Processes&#x201d;, &#x201c;Disease or Syndrome&#x201d;, &#x201c;Homo Sapiens proteins&#x201d; and &#x201c;Molecular Functions&#x201d;.</p>
                <table-wrap id="T1" orientation="portrait" position="anchor">
                    <label>Table 1. </label>
                    <caption>
                        <title>Evidence table for the association between the top 5 proteins and the concepts of 
                            <italic toggle="yes">HTT</italic> and epigene, grouped in five semantic categories: &#x201c;General&#x201d;, &#x201c;Biological Processes&#x201d;, &#x201c;Disease or Syndrome&#x201d;, &#x201c;Homo Sapiens Genes&#x201d; and &#x201c;Molecular Function&#x201d;.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="center" colspan="1" rowspan="1" valign="top">gene</th>
                                <th align="center" colspan="1" rowspan="1" valign="top">concept</th>
                                <th align="center" colspan="1" rowspan="1" valign="top">general</th>
                                <th align="center" colspan="1" rowspan="1" valign="top">biological processes</th>
                                <th align="center" colspan="1" rowspan="1" valign="top">disease or syndrome</th>
                                <th align="center" colspan="1" rowspan="1" valign="top">h.sapiens gene</th>
                                <th align="center" colspan="1" rowspan="1" valign="top">molecular functions</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="center" colspan="1" rowspan="1">
                                    <bold>1.APBA1</bold>
                                </td>
                                <td align="center" colspan="1" rowspan="1">
                                    <bold>HTT</bold>
                                </td>
                                <td colspan="1" rowspan="1">nerve tissue
                                    <break/>trafficking
                                    <break/>mice,transgenic
                                    <break/>PC12 Cells,
                                    <break/>Caenorhabditis elegans</td>
                                <td colspan="1" rowspan="1">Endocytosis
                                    <break/>protein transport
                                    <break/>Pathogenesis
                                    <break/>intracellular protein transport
                                    <break/>RNA Interference</td>
                                <td colspan="1" rowspan="1">Alzheimer&#x2019;s Disease
                                    <break/>Friedreich Ataxia
                                    <break/>Neuropathogenesis
                                    <break/>Degenerative disorder
                                    <break/>Malnutrition</td>
                                <td colspan="1" rowspan="1">GRIN2B
                                    <break/>CDK5
                                    <break/>ITSN1
                                    <break/>STX1A
                                    <break/>DLGAP2</td>
                                <td colspan="1" rowspan="1">kinesin activity
                                    <break/>NMDA receptor
                                    <break/>Protein Binding
                                    <break/>membrane associated guanylate kinase
                                    <break/>cyclin-dependent protein kinase
                                    <break/>activity</td>
                            </tr>
                            <tr>
                                <td colspan="1" rowspan="1"/>
                                <td align="center" colspan="1" rowspan="1">
                                    <bold>epigene</bold>
                                </td>
                                <td colspan="1" rowspan="1">Epigene
                                    <break/>cpg islands
                                    <break/>Hypermethylation
                                    <break/>Epigenetic Process
                                    <break/>microsatellite instability</td>
                                <td colspan="1" rowspan="1">Methylation
                                    <break/>DNA Methylation
                                    <break/>Gene Silencing
                                    <break/>tumor suppressor activity
                                    <break/>cytosine
                                    <break/>methylation</td>
                                <td colspan="1" rowspan="1">hypomyelination and congenital
                                    <break/>cataract
                                    <break/>Werner Syndrome
                                    <break/>marinesco-sjogren syndrome
                                    <break/>friedreich ataxia 1
                                    <break/>Angelman Syndrome</td>
                                <td colspan="1" rowspan="1">CDKN2A
                                    <break/>APBA1
                                    <break/>MLH1
                                    <break/>DNMT3B
                                    <break/>APBA2</td>
                                <td colspan="1" rowspan="1">MGMT
                                    <break/>methyl-cpg binding
                                    <break/>calcium channel activity
                                    <break/>methylase activity
                                    <break/>PDGFRA</td>
                            </tr>
                            <tr>
                                <td align="center" colspan="1" rowspan="1">
                                    <bold>2.KAT2A</bold>
                                </td>
                                <td align="center" colspan="1" rowspan="1">
                                    <bold>HTT</bold>
                                </td>
                                <td colspan="1" rowspan="1">histone
                                    <break/>acetylation
                                    <break/>ARID1A
                                    <break/>Transcription
                                    <break/>HDAC1</td>
                                <td colspan="1" rowspan="1">Transcription, Genetic
                                    <break/>Histone Acetylation
                                    <break/>histone modification
                                    <break/>Transcriptional Activation
                                    <break/>RNA Interference</td>
                                <td colspan="1" rowspan="1">Neurodegenerative Disorders
                                    <break/>Huntington Disease
                                    <break/>neu-laxova syndrome
                                    <break/>Recruitment
                                    <break/>Disease</td>
                                <td colspan="1" rowspan="1">EP300
                                    <break/>CREBBP
                                    <break/>TBP
                                    <break/>KAT2B
                                    <break/>PPARGC1A</td>
                                <td colspan="1" rowspan="1">Histone acetyltransferase activity
                                    <break/>ubiquitin activity
                                    <break/>acetyltransferase activity
                                    <break/>cyclin-dependent protein kinase activity
                                    <break/>Protein Binding</td>
                            </tr>
                            <tr>
                                <td colspan="1" rowspan="1"/>
                                <td align="center" colspan="1" rowspan="1">
                                    <bold>epigene</bold>
                                </td>
                                <td colspan="1" rowspan="1">histone
                                    <break/>epigene
                                    <break/>chromatin location
                                    <break/>Transcription
                                    <break/>histone modification</td>
                                <td colspan="1" rowspan="1">Histone Acetylation
                                    <break/>Transcription
                                    <break/>histone modification
                                    <break/>chromatin remodeling
                                    <break/>Methylation</td>
                                <td colspan="1" rowspan="1">Recruitment
                                    <break/>Adenovirus Infections
                                    <break/>Neurodegenerative Disorders
                                    <break/>Infection
                                    <break/>Disease</td>
                                <td colspan="1" rowspan="1">KAT2A
                                    <break/>KAT2B
                                    <break/>HDAC1
                                    <break/>EP300
                                    <break/>DNMT1</td>
                                <td colspan="1" rowspan="1">IGL
                                    <break/>DNA Binding
                                    <break/>histone deacetylase activity
                                    <break/>transcription factor binding
                                    <break/>histone binding</td>
                            </tr>
                            <tr>
                                <td align="center" colspan="1" rowspan="1">
                                    <bold>3.CARM1</bold>
                                </td>
                                <td align="center" colspan="1" rowspan="1">
                                    <bold>HTT</bold>
                                </td>
                                <td colspan="1" rowspan="1">Knock-in
                                    <break/>histone
                                    <break/>Nerve Tissue
                                    <break/>Ca150
                                    <break/>PC12 Cells</td>
                                <td colspan="1" rowspan="1">Transcription
                                    <break/>histone methylation
                                    <break/>RNA Interference
                                    <break/>histone modification
                                    <break/>protein processing,
                                    <break/>post-translational</td>
                                <td colspan="1" rowspan="1">Recruitment
                                    <break/>muscular atrophy
                                    <break/>spinal muscular atrophy
                                    <break/>Malnutrition
                                    <break/>Disease</td>
                                <td colspan="1" rowspan="1">EP300
                                    <break/>CREBBP
                                    <break/>ARID1A
                                    <break/>THOC4
                                    <break/>DNM1L</td>
                                <td colspan="1" rowspan="1">nuclear hormone receptor activity
                                    <break/>acetyltransferase activity
                                    <break/>Protein Binding
                                    <break/>histone acetyltransferase activity
                                    <break/>DNA Binding</td>
                            </tr>
                            <tr>
                                <td colspan="1" rowspan="1"/>
                                <td align="center" colspan="1" rowspan="1">
                                    <bold>epigene</bold>
                                </td>
                                <td colspan="1" rowspan="1">epigene
                                    <break/>histone
                                    <break/>chromatin_immunoprecipitation
                                    <break/>chromatin location
                                    <break/>Protein Acetylation</td>
                                <td colspan="1" rowspan="1">Methylation
                                    <break/>histone modification
                                    <break/>histone methylation
                                    <break/>DNA Methylation Transcription</td>
                                <td colspan="1" rowspan="1">Recruitment
                                    <break/>Chimera
                                    <break/>Cholestasis Hyperhomocysteinemia
                                    <break/>Cerebrovascular accident</td>
                                <td colspan="1" rowspan="1">CARM1
                                    <break/>EP300
                                    <break/>EHMT2
                                    <break/>PRMT1 PRMT5</td>
                                <td colspan="1" rowspan="1">methyltransferase activity
                                    <break/>DNA Binding
                                    <break/>methyltransferase 1
                                    <break/>histone methyltransferase activity
                                    <break/>methylase activity</td>
                            </tr>
                            <tr>
                                <td align="center" colspan="1" rowspan="1">
                                    <bold>4.SLIT2</bold>
                                </td>
                                <td align="center" colspan="1" rowspan="1">
                                    <bold>HTT</bold>
                                </td>
                                <td colspan="1" rowspan="1">Nerve Tissue
                                    <break/>Tract
                                    <break/>Knock-in
                                    <break/>Caenorhabditis elegans
                                    <break/>Mutant</td>
                                <td colspan="1" rowspan="1">Neurogenesis
                                    <break/>Gene Silencing
                                    <break/>RNA Interference
                                    <break/>regulation of osteoblast
                                    <break/>differentiation
                                    <break/>central nervous system development</td>
                                <td colspan="1" rowspan="1">Adult disease
                                    <break/>Disease
                                    <break/>Degenerative disorder
                                    <break/>Malnutrition
                                    <break/>Kidney Diseases</td>
                                <td colspan="1" rowspan="1">HDAC5
                                    <break/>HDAC6
                                    <break/>ISL2
                                    <break/>RBP1
                                    <break/>PAX6</td>
                                <td colspan="1" rowspan="1">GTP Binding
                                    <break/>Protein Binding
                                    <break/>kinesin activity
                                    <break/>molecular function
                                    <break/>transcription factor binding</td>
                            </tr>
                            <tr>
                                <td colspan="1" rowspan="1"/>
                                <td align="center" colspan="1" rowspan="1">
                                    <bold>epigene</bold>
                                </td>
                                <td colspan="1" rowspan="1">epigene
                                    <break/>Hypermethylation
                                    <break/>Islands
                                    <break/>tumor suppressor genes
                                    <break/>Epigenetic Silencing</td>
                                <td colspan="1" rowspan="1">Methylation
                                    <break/>DNA Methylation
                                    <break/>Gene Silencing
                                    <break/>tumor suppressor activity
                                    <break/>Embryonic Development</td>
                                <td colspan="1" rowspan="1">hypomyelination and congenital
                                    <break/>cataract
                                    <break/>Adult disease
                                    <break/>Proteinuria
                                    <break/>Diabetic
                                    <break/>Nephropathy
                                    <break/>Asthma</td>
                                <td colspan="1" rowspan="1">SLIT2
                                    <break/>SLIT3
                                    <break/>SLIT1
                                    <break/>DNMT1
                                    <break/>CDKN2A</td>
                                <td colspan="1" rowspan="1">MGMT
                                    <break/>transcription factor binding
                                    <break/>deacetylase activity
                                    <break/>1-phosphatidylinositol-3-kinase
                                    <break/>activity
                                    <break/>binding (molecular function)</td>
                            </tr>
                            <tr>
                                <td align="center" colspan="1" rowspan="1">
                                    <bold>5.BNIP3</bold>
                                </td>
                                <td align="center" colspan="1" rowspan="1">
                                    <bold>HTT</bold>
                                </td>
                                <td colspan="1" rowspan="1">caspase
                                    <break/>Mitochondria
                                    <break/>SETD2
                                    <break/>FRAP1
                                    <break/>EP300</td>
                                <td colspan="1" rowspan="1">Autophagy
                                    <break/>Cell Death
                                    <break/>Apoptosis
                                    <break/>RNA Interference
                                    <break/>Gene Silencing</td>
                                <td colspan="1" rowspan="1">Neurodegenerative
                                    <break/>Disorders
                                    <break/>dentatorubral-pallidoluysian atrophy
                                    <break/>muscular atrophy
                                    <break/>Ischemia
                                    <break/>Posttransfusion purpura</td>
                                <td colspan="1" rowspan="1">TGM2
                                    <break/>CASP2 CREBBP CASP7
                                    <break/>HTRA2</td>
                                <td colspan="1" rowspan="1">proteasome
                                    <break/>endopeptidase complex
                                    <break/>ubiquitin activity
                                    <break/>cytochromec activity
                                    <break/>phenylalanine dehydrogenase activity
                                    <break/>Protein Binding</td>
                            </tr>
                            <tr>
                                <td colspan="1" rowspan="1"/>
                                <td align="center" colspan="1" rowspan="1">
                                    <bold>epigene</bold>
                                </td>
                                <td colspan="1" rowspan="1">epigene
                                    <break/>Hypermethylation
                                    <break/>5-aza-2&#x2019;-deoxycytidine
                                    <break/>Gene Silencing
                                    <break/>tumor suppressor genes</td>
                                <td colspan="1" rowspan="1">Methylation
                                    <break/>DNA Methylation
                                    <break/>Gene Silencing
                                    <break/>tumor suppressor activity
                                    <break/>Transcription, Genetic</td>
                                <td colspan="1" rowspan="1">NPC
                                    <break/>hypomyelination and
                                    <break/>congenital_cataract
                                    <break/>Ischemia
                                    <break/>Adenovirus Infections
                                    <break/>van der woude syndrome</td>
                                <td colspan="1" rowspan="1">BNIP3
                                    <break/>HDAC1
                                    <break/>MLH1
                                    <break/>CDKN2A
                                    <break/>DmelCG3861</td>
                                <td colspan="1" rowspan="1">MGMT
                                    <break/>IGL
                                    <break/>DNA Binding
                                    <break/>cytochrome c activity
                                    <break/>ABL1</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <p>Grouping the evidence provides a more complete insight into the processes involved in gene deregulation in HD mediated by CpG islands. For example, for one of our candidates, the amyloid beta (A4) precursor protein-binding, family A, member 1 (
                    <italic toggle="yes">APBA1</italic>), we found nerve tissue (General), endocytosis (Biological Process), Alzheimer&#x2019;s Disease (disease or syndrome), 
                    <italic toggle="yes">GRIN2B</italic> (
                    <italic toggle="yes">H. sapiens</italic> Genes) and kinesin activity (Molecular Function) as intermediate links with HTT. This suggests that mechanisms involving 
                    <italic toggle="yes">APBA1</italic> in HD share common components with the mechanisms involved in endocytosis and Alzheimer&#x2019;s Disease and those involving 
                    <italic toggle="yes">GRIN2B</italic> and kinesin activity. Intermediate links with epigenetics were respectively: epigene (General), methylation (Biological Process), hypomyelination and congenital cataract (disease or syndrome), 
                    <italic toggle="yes">CDKN2A</italic> (H. sapiens Genes) and 
                    <italic toggle="yes">MGMT</italic> - O6-alkylguanine DNA alkyltransferase - (Molecular Function). Accordingly, these concepts provide suggestions about the epigenetic role of 
                    <italic toggle="yes">APBA1</italic>, which can be taken into account when further studying the role of 
                    <italic toggle="yes">APBA1</italic> in HD.</p>
                <supplementary-material id="DS6" orientation="portrait" position="float" xlink:href="https://f1000researchdata.s3.amazonaws.com/datasets/9703/4860a2ee-6a59-4c70-acf3-8cbcb9a4586c_Dataset7_gene_prioritization.xls">
                    <label>Gene prioritization</label>
                    <caption>
                        <p>Top 100 novel proteins, resulted from the semantic analysis associated with HTT and epigenetics.</p>
                    </caption>
                </supplementary-material>
            </sec>
            <sec>
                <title>Validation of concept profile gene prioritization</title>
                <p>We next investigated whether the prioritized genes reflect valid biological knowledge. 
                    <xref ref-type="fig" rid="f3">Figure 3</xref> shows that CPA is able to prioritize true associations with huntingtin as measured by a gene expression experiment, but that combining experimental (differential expression) measurements and literature evidence enables to select even more specific HD signatures. We used our concept profile technology to match all genes in our database that have a concept profile (12,391 genes) to the &#x201c;huntingtin&#x201d; concept profile (black line). We then compared the distribution of those CPA scores to the CPA scores of genes that were found to be differentially expressed in the caudate nucleus (p value &lt; 0.05). We included two gene lists in our analysis: the top 100 most differentially expressed genes (red line) and the top 1000 (green line). The shift in the distribution of CPA match scores between the differentially expressed genes (top 100 and top 1000) and the scores of all genes reflects the added value of CPA (CPA scores of top 100 and top 1000 can be found in 
                    <xref ref-type="other" rid="DS7">Dataset 8</xref>
                    <sup>
                        <xref ref-type="bibr" rid="ref-54">54</xref>
                    </sup>). We found a significant shift in the scores when comparing all CPA scores from our database with the top 100 differentially expressed genes: p = 2.67
                    <italic toggle="yes">e &#x2212;</italic>08 and the top 1000 p &lt; 2.2
                    <italic toggle="yes">e&#x2212;</italic> 16.</p>
                <fig fig-type="figure" id="f3" orientation="portrait" position="float">
                    <label>Figure 3. </label>
                    <caption>
                        <title>Combination of CPA with differential gene expression for effective gene prioritization.</title>
                        <p>Cumulative distribution of match scores of the concept profiles (CP) between differentially expressed genes in HD with the concept profile of htt: match scores of all genes with a concept profile (black), match scores of the top 1000 differentially expressed genes (green), and the match scores of the top 100 of differentially expressed genes (red).</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/10459/812c50b0-babf-4c92-a322-d5a0854366cc_figure3.gif"/>
                </fig>
                <p>Also the top 100 and top 1000 differ significantly (p = 0.03184), showing that it is useful to narrow down on the top ranks for follow-up research. In principle, more extreme p-values are associated with higher CPA scores. In addition, to show that our list of 100 prioritized gene-HTT CPA match scores would not have been found by chance, we assessed the percentile score of our list when compared to the frequency distribution of 100 match scores of randomly sampled gene-HTT concept pairs out of the 12,391 genes in our concept profile database (
                    <xref ref-type="other" rid="DS8">Dataset 9</xref>
                    <sup>
                        <xref ref-type="bibr" rid="ref-55">55</xref>
                    </sup>). All genes were in the top 95 percentile, except NTRK3 (55 percentile).</p>
                <supplementary-material id="DS7" orientation="portrait" position="float" xlink:href="https://f1000researchdata.s3.amazonaws.com/datasets/9703/461112bc-3532-4c9f-a333-1644c258d260_Dataset8_concept_profile_scores_against_htt_top_100_and_top_1000_deg.zip">
                    <label>Concept profile analysis (CPA) scores for the two gene lists</label>
                    <caption>
                        <p>This folder contains the CPA scores per gene list (top 100 differentially expressed genes in the caudate nucleus and top 1000 differentially expressed from the same brain region) that were obtained by matching the gene lists against the HTT concept profile.</p>
                    </caption>
                </supplementary-material>
                <supplementary-material id="DS8" orientation="portrait" position="float" xlink:href="https://f1000researchdata.s3.amazonaws.com/datasets/9703/c63e9e91-404f-4d5c-9792-811394e6af8c_Dataset9_percentile_scores.txt">
                    <label>Percentile scores</label>
                    <caption>
                        <p>This document presents the percentile score of the prioritized gene list when compared to the frequency distribution of 100 match scores of randomly sampled gene-HTT concept pairs out of the 12,391 genes in our concept profile database.</p>
                    </caption>
                </supplementary-material>
            </sec>
        </sec>
        <sec sec-type="discussion">
            <title>Discussion</title>
            <p>The computational analysis with public data that we present in this paper shows that there is no strong evidence for genome-wide relocalization of gene activity to repressed chromatin states, at least not at a scale that could explain the massive transcriptional deregulation that we observe in HD. Most of the deregulated genes mapped to the active chromatin state of our reference tissue and were underrepresented in silenced states of chromatin (
                <xref ref-type="fig" rid="f1">Figure 1</xref>). Previous reports supported the implication of large scale chromatin alterations in gene deregulation in HD. For instance, Anderson 
                <italic toggle="yes">et al.</italic> reported that gene expression is deregulated in large genomic regions in blood and post mortem tissue of HD patients
                <sup>
                    <xref ref-type="bibr" rid="ref-7">7</xref>
                </sup>. The authors inferred a relation with repressed and active chromatin, but did not use chromatin annotation data directly. Our results suggest that the association with genome clusters is mostly within the active state and does not extend to disruption of chromatin states at large scales.</p>
            <p>Active chromatin is normally more prone to regulation and deregulation. Our results suggest that the epigenetic mechanisms that have been observed in HD are mainly bound to this fraction of chromatin. We speculate that the effects are more closely associated with transcription regulation of individual genes, than with large scale higher-order rearrangements of chromatin structure
                <sup>
                    <xref ref-type="bibr" rid="ref-56">56</xref>
                </sup>. Our CPA analysis of deregulated genes with CpG-islands corroborates this suggestion: chromatin-related concepts from our CPA ranked high in this set of genes, while CpGislands are mostly known for their direct role in transcription initiation of the proximal gene. Nevertheless, CpG-mediated changes in gene expression are a common mechanism for many genes and deregulation of this mechanism is thus likely to have a genome-wide effect. It was unexpected that the differentially expressed genes are significantly depleted in the bivalent chromatin state in the prefrontal cortex. The bivalent state is expected to be associated with genes that are prone to become active. Our expectation was that deregulated genes in this brain region would be weakly enriched in this state, as we did find for the caudate nucleus. We currently lack a good explanation for this result.</p>
            <p>We incorporated the computational analysis on public data in our research strategy to advise on next experiments. However, working with public data has its limitations. Here, we had to rely on reference tissue for the chromatin state in order to compare it with gene expression. In addition, the reference tissue chromatin state was measured in healthy individuals. Therefore, we cannot rule out that large scale chromatin effects will be observed when chromatin state and differential expression are measured together in new HD specific experiments. This could reveal new evidence for chromatin state deregulation in HD and give insight in the relationship with transcriptional deregulation, possibly at higher resolution than we could achieve in our current study. However, our results suggest that this is not very likely, leading us to advise fellow HD researchers to prioritise experiments that assess the role of epigenetic mechanisms in HD at the scale of individual genes or small clusters of genes, and within the active fraction of chromatin.</p>
            <p>Our results can be used to refine hypotheses about molecular mechanisms involved in HD. For instance, we surmise that the reported association of DNA methylation and chromatin organisation, and the effects of HDAC on the HD phenotype in mice
                <sup>
                    <xref ref-type="bibr" rid="ref-57">57</xref>
                </sup> are bound to the active fraction of chromatin. Furthermore, it appears that CpG islands located within the promoter region of a gene increase the probability that genes in such genomic regions are deregulated in HD. This is in accordance with a study where changes in DNA methylation were observed in cells expressing mutant huntingtin
                <sup>
                    <xref ref-type="bibr" rid="ref-58">58</xref>
                </sup>. This in turn suggests that DNA methylation in promoters is implicated in alterations in the brain, which is in accordance with a study that noted changes in DNA methylation in cells expressing mutant huntingtin
                <sup>
                    <xref ref-type="bibr" rid="ref-58">58</xref>
                </sup>. Based on our semantic analysis chromatin remodelling, chromatin (dis)assembly, and histone modification were associated with altered gene expression profiles. In contrast, non-CpG containing genes are more likely involved in immune response and neurogenesis, which represents functionally linked processes
                <sup>
                    <xref ref-type="bibr" rid="ref-59">59</xref>
                </sup>.</p>
            <p>Furthermore, we used CPA as a means to prioritise genes for our hypotheses. For instance, we prioritised genes overlapping with CpG islands by their association with HTT, assuming that CPA rank scores are higher for genes that are higher-up in the cascade of events caused by the mutant protein. We recently showed that this is a fair assumption for CPA, although literature bias cannot be completely mitigated
                <sup>
                    <xref ref-type="bibr" rid="ref-60">60</xref>
                </sup>. Similarly, we further prioritized candidate genes in terms of their association with epigenetics. Genes such as 
                <italic toggle="yes">APBA1, KAT2A, CARM1, SLIT2</italic> and 
                <italic toggle="yes">BNIP3</italic> came forward as the most likely candidates to play a role in HD in the context of downstream effects of HTT involving epigenetic mechanisms. These candidates were filtered on potential novelty: only those genes were reported for which there was no direct association found in our database of PubMed abstracts. This does not exclude associations that were reported in tables and supplemental material that are much harder to mine for technical and legal reasons. Our study also retrieved several well established associations consistent with earlier studies.</p>
            <p>In summary, our results show how literature information in combination with data analysis present useful tools for exploration of hypotheses for possible future experiments.</p>
        </sec>
        <sec sec-type="conclusions">
            <title>Conclusions</title>
            <p>Our methodology offers support for hypothesis generation to elucidate missing links in mechanisms involved in a complex disease such as HD. We have shown how the analysis of microarray data and the integration of publicly available datasets and literature information enables prioritization of associations, such as proteins and mechanisms, that are likely to be involved in HD. In addition, we were able to focus on mechanisms that are associated with epigenetic regulation that may regulate changes that are part of the disease pathology. We argue that such a methodology can be of great value to the scientific community for narrowing down the amount of possible associations but also providing evidence to support a particular hypothesis.</p>
        </sec>
        <sec>
            <title>Data availability</title>
            <p>The data referenced by this article are under copyright with the following copyright statement: Copyright: &#x00ef;&#x00bf;&#x00bd; 2017 Mina E et al.</p>
            <p>Data associated with the article are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication).
                <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/publicdomain/zero/1.0/"/>
            </p>
            <p>All workflows are deposited in Zenodo (
                <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.164201">https://doi.org/10.5281/zenodo.164201</ext-link>
                <sup>
                    <xref ref-type="bibr" rid="ref-33">33</xref>
                </sup> and 
                <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.164198">https://doi.org/10.5281/zenodo.164198</ext-link>
                <sup>
                    <xref ref-type="bibr" rid="ref-34">34</xref>
                </sup>), and the myExperiment platform (
                <ext-link ext-link-type="uri" xlink:href="http://www.myexperiment.org/packs/553">http://www.myexperiment.org/packs/553</ext-link>).</p>
            <p>

                <bold>Dataset 1: Gene expression data.</bold> This folder contains the gene expression data for the three human brain regions and the three mouse brain regions. doi, 
                <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.5256/f1000research.9703.d179468">10.5256/f1000research.9703.d179468</ext-link>
                <sup>
                    <xref ref-type="bibr" rid="ref-42">42</xref>
                </sup>
            </p>
            <p>
                <bold>Dataset 2: CpG islands.</bold> This folder contains the CpG island information both for human and the mouse data. doi, 
                <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.5256/f1000research.9703.d179469">10.5256/f1000research.9703.d179469</ext-link>
                <sup>
                    <xref ref-type="bibr" rid="ref-44">44</xref>
                </sup>
            </p>
            <p>
                <bold>Dataset 3: Chromatin states for the anterior caudate.</bold> This folder contains the four chromatin state data for the anterior caudate. Active TSS proximal promoter: TssA, bivalent regulatory region: TssBiv, heterochromatin: Het, repressed Polycomb: ReprPC. doi, 
                <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.5256/f1000research.9703.d179470">10.5256/f1000research.9703.d179470</ext-link>
                <sup>
                    <xref ref-type="bibr" rid="ref-46">46</xref>
                </sup>
            </p>
            <p>
                <bold>Dataset 4: Chromatin states for the prefrontal cortex.</bold> This folder contains the four chromatin state data for the prefrontal cortex. Active TSS proximal promoter: TssA, bivalent regulatory region: TssBiv, heterochromatin: Het, repressed Polycomb: ReprPC. doi, 
                <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.5256/f1000research.9703.d179471">10.5256/f1000research.9703.d179471</ext-link>
                <sup>
                    <xref ref-type="bibr" rid="ref-47">47</xref>
                </sup>
            </p>
            <p>

                <bold>Dataset 5: Unique annotations of genes in each chromatin state with Biological Processes.</bold> This file contains the results from the semantic analysis with Biological Processes of the genes that were overlapping with active TSS, bivalent TSS, heterochromatin and repressed polycomb chromatin state. The annotations that we present here are uniquely characterizing each gene list (as resulted from the CPA out of the top 50 annotations). doi, 
                <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.5256/f1000research.9703.d179472">10.5256/f1000research.9703.d179472</ext-link>
                <sup>
                    <xref ref-type="bibr" rid="ref-49">49</xref>
                </sup>
            </p>
            <p>

                <bold>Dataset 6: Annotations of CpG containing genes and non CpG containing genes.</bold> The top 50 annotations characterising each group of genes with CpG islands and without, resulted from the semantic analysis of those two groups of genes with Biological Processes. doi, 
                <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.5256/f1000research.9703.d179473">10.5256/f1000research.9703.d179473</ext-link>
                <sup>
                    <xref ref-type="bibr" rid="ref-52">52</xref>
                </sup>
            </p>
            <p>

                <bold>Dataset 7: Gene prioritization.</bold> Top 100 novel proteins, resulted from the semantic analysis associated with HTT and epigenetics. doi, 
                <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.5256/f1000research.9703.d179474">10.5256/f1000research.9703.d179474</ext-link>
                <sup>
                    <xref ref-type="bibr" rid="ref-53">53</xref>
                </sup>
            </p>
            <p>

                <bold>Dataset 8: Concept profile analysis (CPA) scores for the two gene lists.</bold> This folder contains the CPA scores per gene list (top 100 differentially expressed genes in the caudate nucleus and top 1000 differentially expressed from the same brain region) that were obtained by matching the gene lists against the HTT concept profile. doi, 
                <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.5256/f1000research.9703.d179475">10.5256/f1000research.9703.d179475</ext-link>
                <sup>
                    <xref ref-type="bibr" rid="ref-54">54</xref>
                </sup>
            </p>
            <p>
                <bold>Dataset 9: Percentile scores.</bold> This document presents the percentile score of the prioritized gene list when compared to the frequency distribution of 100 match scores of randomly sampled gene-HTT concept pairs out of the 12,391 genes in our concept profile database. doi, 
                <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.5256/f1000research.9703.d179476">10.5256/f1000research.9703.d179476</ext-link>
                <sup>
                    <xref ref-type="bibr" rid="ref-55">55</xref>
                </sup>
            </p>
        </sec>
    </body>
    <back>
        <ack>
            <title>Acknowledgements</title>
            <p>We gratefully acknowledge Herman van Haagen for methodological input, Jelle J. Goeman and Maarten van Iterson for statistical advice and Silvere van der Maarel for advice on the role of epigenetics in disease.</p>
        </ack>
        <sec sec-type="supplementary-material">
            <title>Supplementary material</title>
            <p id="SF1">

                <bold>Supplementary File 1: Decision about the promoter region of the chromatin states analysis and CpG islands.</bold> This document is included as a reference to describe the decisions that were made concerning the promoter size using an older version of epigenetic data from ENCODE. In this file we present the additional runs of the workflow compute_overlaps that were performed with different parameters in order to test and decide for the best promoter region and overlap parameters.</p>
            <p>
                <ext-link ext-link-type="uri" xlink:href="https://f1000researchdata.s3.amazonaws.com/supplementary/9703/7eaf3751-f5f4-4a43-8c80-d5022865d4b1.doc">Click here to access the data</ext-link>
            </p>
        </sec>
        <ref-list>
            <ref id="ref-1">
                <label>1</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>MacDonald</surname>
                            <given-names>ME</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Ambrose</surname>
                            <given-names>CM</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Duyao</surname>
                            <given-names>MP</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington&#x2019;s disease chromosomes. The Huntington&#x2019;s Disease Collaborative Research Group.</article-title>
                    <source>
						
                        <italic toggle="yes">Cell.</italic>
					</source>
                    <year>1993</year>;<volume>72</volume>(<issue>6</issue>):<fpage>971</fpage>&#x2013;<lpage>983</lpage>.
                    <pub-id pub-id-type="pmid">8458085</pub-id>
                    <pub-id pub-id-type="doi">10.1016/0092-8674(93)90585-E</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-2">
                <label>2</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Landles</surname>
                            <given-names>C</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Bates</surname>
                            <given-names>GP</given-names>
                        </name>
					</person-group>:
                    <article-title>Huntingtin and the molecular pathogenesis of Huntington&#x2019;s disease. Fourth in molecular medicine review series.</article-title>
                    <source>
						
                        <italic toggle="yes">EMBO Rep.</italic>
					</source>
                    <year>2004</year>;<volume>5</volume>(<issue>10</issue>):<fpage>958</fpage>&#x2013;<lpage>963</lpage>.
                    <pub-id pub-id-type="pmid">15459747</pub-id>
                    <pub-id pub-id-type="doi">10.1038/sj.embor.7400250</pub-id>
                    <pub-id pub-id-type="pmcid">1299150</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-3">
                <label>3</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Arning</surname>
                            <given-names>L</given-names>
                        </name>
					</person-group>:
                    <article-title>The search for modifier genes in huntington disease - multifactorial aspects of a monogenic disorder.</article-title>
                    <source>
						
                        <italic toggle="yes">Mol Cell Probes.</italic>
					</source>
                    <year>2016</year>;<volume>30</volume>(<issue>6</issue>):<fpage>404</fpage>&#x2013;<lpage>409</lpage>.
                    <pub-id pub-id-type="pmid">27417534</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.mcp.2016.06.006</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-4">
                <label>4</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Cha</surname>
                            <given-names>JH</given-names>
                        </name>
					</person-group>:
                    <article-title>Transcriptional dysregulation in Huntington&#x2019;s disease.</article-title>
                    <source>
						
                        <italic toggle="yes">Trends Neurosci.</italic>
					</source>
                    <year>2000</year>;<volume>23</volume>(<issue>9</issue>):<fpage>387</fpage>&#x2013;<lpage>392</lpage>.
                    <pub-id pub-id-type="pmid">10941183</pub-id>
                    <pub-id pub-id-type="doi">10.1016/S0166-2236(00)01609-X</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-5">
                <label>5</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Luthi-Carter</surname>
                            <given-names>R</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Cha</surname>
                            <given-names>JH</given-names>
                        </name>
					</person-group>:
                    <article-title>Mechanisms of transcriptional dysregulation in huntington&#x2019;s disease.</article-title>
                    <source>
						
                        <italic toggle="yes">Clin Neurosci Res.</italic>
					</source>
                    <year>2003</year>;<volume>3</volume>(<issue>3</issue>):<fpage>165</fpage>&#x2013;<lpage>177</lpage>.
                    <pub-id pub-id-type="doi">10.1016/S1566-2772(03)00059-8</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-6">
                <label>6</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Valor</surname>
                            <given-names>LM</given-names>
                        </name>
					</person-group>:
                    <article-title>Transcription, epigenetics and ameliorative strategies in huntington&#x2019;s disease: a genome-wide perspective.</article-title>
                    <source>
						
                        <italic toggle="yes">Mol Neurobiol.</italic>
					</source>
                    <year>2015</year>;<volume>51</volume>(<issue>1</issue>):<fpage>406</fpage>&#x2013;<lpage>423</lpage>.
                    <pub-id pub-id-type="pmid">24788684</pub-id>
                    <pub-id pub-id-type="doi">10.1007/s12035-014-8715-8</pub-id>
                    <pub-id pub-id-type="pmcid">4309905</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-7">
                <label>7</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Anderson</surname>
                            <given-names>AN</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Roncaroli</surname>
                            <given-names>F</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Hodges</surname>
                            <given-names>A</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Chromosomal profiles of gene expression in huntington&#x2019;s disease.</article-title>
                    <source>
						
                        <italic toggle="yes">Brain.</italic>
					</source>
                    <year>2008</year>;<volume>131</volume>(<issue>pt 2</issue>):<fpage>381</fpage>&#x2013;<lpage>388</lpage>.
                    <pub-id pub-id-type="pmid">18156153</pub-id>
                    <pub-id pub-id-type="doi">10.1093/brain/awm312</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-8">
                <label>8</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Bai</surname>
                            <given-names>G</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Cheung</surname>
                            <given-names>I</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Shulha</surname>
                            <given-names>HP</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Epigenetic dysregulation of 
                        <italic toggle="yes">hairy and enhancer of split 4 (HES4)</italic> is associated with striatal degeneration in postmortem huntington brains.</article-title>
                    <source>
						
                        <italic toggle="yes">Hum Mol Genet.</italic>
					</source>
                    <year>2015</year>;<volume>24</volume>(<issue>5</issue>):<fpage>1441</fpage>&#x2013;<lpage>1456</lpage>.
                    <pub-id pub-id-type="pmid">25480889</pub-id>
                    <pub-id pub-id-type="doi">10.1093/hmg/ddu561</pub-id>
                    <pub-id pub-id-type="pmcid">4321450</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-9">
                <label>9</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Achour</surname>
                            <given-names>M</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Le Gras</surname>
                            <given-names>S</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Keime</surname>
                            <given-names>C</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Neuronal identity genes regulated by super-enhancers are preferentially down-regulated in the striatum of huntington&#x2019;s disease mice.</article-title>
                    <source>
						
                        <italic toggle="yes">Hum Mol Genet.</italic>
					</source>
                    <year>2015</year>;<volume>24</volume>(<issue>12</issue>):<fpage>3481</fpage>&#x2013;<lpage>3496</lpage>.
                    <pub-id pub-id-type="pmid">25784504</pub-id>
                    <pub-id pub-id-type="doi">10.1093/hmg/ddv099</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-10">
                <label>10</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Thomas</surname>
                            <given-names>EA</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Coppola</surname>
                            <given-names>G</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Desplats</surname>
                            <given-names>PA</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>The HDAC inhibitor 4b ameliorates the disease phenotype and transcriptional abnormalities in Huntington&#x2019;s disease transgenic mice.</article-title>
                    <source>
						
                        <italic toggle="yes">Proc Natl Acad Sci U S A.</italic>
					</source>
                    <year>2008</year>;<volume>105</volume>(<issue>40</issue>):<fpage>15564</fpage>&#x2013;<lpage>15569</lpage>.
                    <pub-id pub-id-type="pmid">18829438</pub-id>
                    <pub-id pub-id-type="doi">10.1073/pnas.0804249105</pub-id>
                    <pub-id pub-id-type="pmcid">2563081</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-11">
                <label>11</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Dietz</surname>
                            <given-names>KN</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Di Stefano</surname>
                            <given-names>L</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Maher</surname>
                            <given-names>RC</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>The 
                        <italic toggle="yes">Drosophila</italic> Huntington's disease gene ortholog dhtt influences chromatin regulation during development.</article-title>
                    <source>
						
                        <italic toggle="yes">Hum Mol Genet.</italic>
					</source>
                    <year>2015</year>;<volume>24</volume>(<issue>2</issue>):<fpage>330</fpage>&#x2013;<lpage>345</lpage>.
                    <pub-id pub-id-type="pmid">25168387</pub-id>
                    <pub-id pub-id-type="doi">10.1093/hmg/ddu446</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-12">
                <label>12</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Urdinguio</surname>
                            <given-names>RG</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Sanchez-Mut</surname>
                            <given-names>JV</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Esteller</surname>
                            <given-names>M</given-names>
                        </name>
					</person-group>:
                    <article-title>Epigenetic mechanisms in neurological diseases: genes, syndromes, and therapies.</article-title>
                    <source>
						
                        <italic toggle="yes">Lancet Neurol.</italic>
					</source>
                    <year>2009</year>;<volume>8</volume>(<issue>11</issue>):<fpage>1056</fpage>&#x2013;<lpage>1072</lpage>.
                    <pub-id pub-id-type="pmid">19833297</pub-id>
                    <pub-id pub-id-type="doi">10.1016/S1474-4422(09)70262-5</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-13">
                <label>13</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Jakovcevski</surname>
                            <given-names>M</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Akbarian</surname>
                            <given-names>S</given-names>
                        </name>
					</person-group>:
                    <article-title>Epigenetic mechanisms in neurological disease.</article-title>
                    <source>
						
                        <italic toggle="yes">Nat Med.</italic>
					</source>
                    <year>2012</year>;<volume>18</volume>(<issue>8</issue>):<fpage>1194</fpage>&#x2013;<lpage>1204</lpage>.
                    <pub-id pub-id-type="pmid">22869198</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nm.2828</pub-id>
                    <pub-id pub-id-type="pmcid">3596876</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-14">
                <label>14</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>He</surname>
                            <given-names>F</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Todd</surname>
                            <given-names>PK</given-names>
                        </name>
						</person-group>:
                    <article-title>Epigenetics in nucleotide repeat expansion disorders.</article-title>
                    <source>
						
                        <italic toggle="yes">Semin Neurol.</italic>
					</source>
                    <year>2011</year>;<volume>31</volume>(<issue>5</issue>):<fpage>470</fpage>&#x2013;<lpage>483</lpage>.
                    <pub-id pub-id-type="pmid">22266885</pub-id>
                    <pub-id pub-id-type="doi">10.1055/s-0031-1299786</pub-id>
                    <pub-id pub-id-type="pmcid">3655547</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-15">
                <label>15</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Shin</surname>
                            <given-names>J</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Ming</surname>
                            <given-names>GL</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Song</surname>
                            <given-names>H</given-names>
                        </name>
					</person-group>:
                    <article-title>Seeking a roadmap toward neuroepigenetics.</article-title>
                    <source>
						
                        <italic toggle="yes">Neuron.</italic>
					</source>
                    <year>2015</year>;<volume>86</volume>(<issue>1</issue>):<fpage>12</fpage>&#x2013;<lpage>15</lpage>.
                    <pub-id pub-id-type="pmid">25856479</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.neuron.2015.03.051</pub-id>
                    <pub-id pub-id-type="pmcid">5531057</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-16">
                <label>16</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Mo</surname>
                            <given-names>A</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Mukamel</surname>
                            <given-names>EA</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Davis</surname>
                            <given-names>FP</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Epigenomic Signatures of Neuronal Diversity in the Mammalian Brain.</article-title>
                    <source>
						
                        <italic toggle="yes">Neuron.</italic>
					</source>
                    <year>2015</year>;<volume>86</volume>(<issue>6</issue>):<fpage>1369</fpage>&#x2013;<lpage>1384</lpage>.
                    <pub-id pub-id-type="pmid">26087164</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.neuron.2015.05.018</pub-id>
                    <pub-id pub-id-type="pmcid">4499463</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-17">
                <label>17</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Edgar</surname>
                            <given-names>R</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Domrachev</surname>
                            <given-names>M</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Lash</surname>
                            <given-names>AE</given-names>
                        </name>
					</person-group>:
                    <article-title>Gene Expression Omnibus: NCBI gene expression and hybridization array data repository.</article-title>
                    <source>
						
                        <italic toggle="yes">Nucleic Acids Res.</italic>
					</source>
                    <year>2002</year>;<volume>30</volume>(<issue>1</issue>):<fpage>207</fpage>&#x2013;<lpage>210</lpage>.
                    <pub-id pub-id-type="pmid">11752295</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/30.1.207</pub-id>
                    <pub-id pub-id-type="pmcid">99122</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-18">
                <label>18</label>
                <mixed-citation publication-type="journal">
                    <collab>ENCODE Project Consortium</collab>:
                    <article-title>An integrated encyclopedia of DNA elements in the human genome.</article-title>
                    <source>
						
                        <italic toggle="yes">Nature.</italic>
					</source>
                    <year>2012</year>;<volume>489</volume>(<issue>7414</issue>):<fpage>57</fpage>&#x2013;<lpage>74</lpage>.
                    <pub-id pub-id-type="pmid">22955616</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nature11247</pub-id>
                    <pub-id pub-id-type="pmcid">3439153</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-19">
                <label>19</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Jelier</surname>
                            <given-names>R</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Goeman</surname>
                            <given-names>JJ</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Hettne</surname>
                            <given-names>KM</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Literature-aided interpretation of gene expression data with the weighted global test.</article-title>
                    <source>
						
                        <italic toggle="yes">Brief Bioinform.</italic>
					</source>
                    <year>2011</year>;<volume>12</volume>(<issue>5</issue>):<fpage>518</fpage>&#x2013;<lpage>529</lpage>.
                    <pub-id pub-id-type="pmid">21183478</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bib/bbq082</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-20">
                <label>20</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Jelier</surname>
                            <given-names>R</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Schuemie</surname>
                            <given-names>MJ</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Roes</surname>
                            <given-names>PJ</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Literature-based concept profiles for gene annotation: the issue of weighting.</article-title>
                    <source>
						
                        <italic toggle="yes">Int J Med Inform.</italic>
					</source>
                    <year>2008</year>;<volume>77</volume>(<issue>5</issue>):<fpage>354</fpage>&#x2013;<lpage>362</lpage>.
                    <pub-id pub-id-type="pmid">17827057</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.ijmedinf.2007.07.004</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-21">
                <label>21</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
			
                        <name name-style="western">
                            <surname>Mina</surname>
                            <given-names>E</given-names>
                        </name>
			
                        <name name-style="western">
                            <surname>Thompson</surname>
                            <given-names>M</given-names>
                        </name>
			
                        <name name-style="western">
                            <surname>Hettne</surname>
                            <given-names>KM</given-names>
                        </name>
			
                        <etal/>
		</person-group>:
                    <article-title>Multidisciplinary collaboration to facilitate hypotheses generation in huntington&#x2019;s disease</article-title>. In:
                    <italic toggle="yes">IEEE 11th International Conference on e-Science (e-Science)</italic>.<year>2015</year>;<fpage>118</fpage>&#x2013;<lpage>125</lpage>.
                    <pub-id pub-id-type="doi">10.1109/eScience.2015.71</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-22">
                <label>22</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
			
                        <name name-style="western">
                            <surname>Jelier</surname>
                            <given-names>R</given-names>
                        </name>
			
                        <name name-style="western">
                            <surname>Jenster</surname>
                            <given-names>G</given-names>
                        </name>
			
                        <name name-style="western">
                            <surname>Dorssers</surname>
                            <given-names>LC</given-names>
                        </name>
			
                        <etal/>
		</person-group>:
                    <article-title>Text-derived concept profiles support assessment of DNA microarray data for acute myeloid leukemia and for androgen receptor stimulation.</article-title>
                    <source>
			
                        <italic toggle="yes">BMC Bioinformatics.</italic>
		</source>
                    <year>2007</year>;<volume>8</volume>:<fpage>14</fpage>.
                    <pub-id pub-id-type="pmid">17233900</pub-id>
                    <pub-id pub-id-type="doi">10.1186/1471-2105-8-14</pub-id>
                    <pub-id pub-id-type="pmcid">1784107</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-23">
                <label>23</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
			
                        <name name-style="western">
                            <surname>Jelier</surname>
                            <given-names>R</given-names>
                        </name>
			
                        <name name-style="western">
                            <surname>&#x2019;t Hoen</surname>
                            <given-names>PA</given-names>
                        </name>
			
                        <name name-style="western">
                            <surname>Sterrenburg</surname>
                            <given-names>E</given-names>
                        </name>
			
                        <etal/>
		</person-group>:
                    <article-title>Literature-aided meta-analysis of microarray data: a compendium study on muscle development and disease.</article-title>
                    <source>
			
                        <italic toggle="yes">BMC Bioinformatics.</italic>
		</source>
                    <year>2008</year>;<volume>9</volume>:<fpage>291</fpage>.
                    <pub-id pub-id-type="pmid">18577208</pub-id>
                    <pub-id pub-id-type="doi">10.1186/1471-2105-9-291</pub-id>
                    <pub-id pub-id-type="pmcid">2459190</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-24">
                <label>24</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
			
                        <name name-style="western">
                            <surname>van Haagen</surname>
                            <given-names>HH</given-names>
                        </name>
			
                        <name name-style="western">
                            <surname>&#x2019;t Hoen</surname>
                            <given-names>PA</given-names>
                        </name>
			
                        <name name-style="western">
                            <surname>de Morr&#x00e9;e</surname>
                            <given-names>A</given-names>
                        </name>
			
                        <etal/>
		</person-group>:
                    <article-title>
                        <italic toggle="yes">In silico</italic> discovery and experimental validation of new protein-protein interactions.</article-title>
                    <source>
			
                        <italic toggle="yes">Proteomics.</italic>
		</source>
                    <year>2011</year>;<volume>11</volume>(<issue>5</issue>):<fpage>843</fpage>&#x2013;<lpage>853</lpage>.
                    <pub-id pub-id-type="pmid">21280221</pub-id>
                    <pub-id pub-id-type="doi">10.1002/pmic.201000398</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-25">
                <label>25</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
			
                        <name name-style="western">
                            <surname>van Dartel</surname>
                            <given-names>DA</given-names>
                        </name>
			
                        <name name-style="western">
                            <surname>Pennings</surname>
                            <given-names>JL</given-names>
                        </name>
			
                        <name name-style="western">
                            <surname>Hendriksen</surname>
                            <given-names>PJ</given-names>
                        </name>
			
                        <etal/>
		</person-group>:
                    <article-title>Early gene expression changes during embryonic stem cell differentiation into cardiomyocytes and their modulation by monobutyl phthalate.</article-title>
                    <source>
			
                        <italic toggle="yes">Reprod Toxicol.</italic>
		</source>
                    <year>2009</year>;<volume>27</volume>(<issue>2</issue>):<fpage>93</fpage>&#x2013;<lpage>102</lpage>.
                    <pub-id pub-id-type="pmid">19162170</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.reprotox.2008.12.009</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-26">
                <label>26</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
			
                        <name name-style="western">
                            <surname>Hettne</surname>
                            <given-names>KM</given-names>
                        </name>
			
                        <name name-style="western">
                            <surname>Boorsma</surname>
                            <given-names>A</given-names>
                        </name>
			
                        <name name-style="western">
                            <surname>van Dartel</surname>
                            <given-names>DA</given-names>
                        </name>
			
                        <etal/>
		</person-group>:
                    <article-title>Next-generation text-mining mediated generation of chemical response-specific gene sets for interpretation of gene expression data.</article-title>
                    <source>
			
                        <italic toggle="yes">BMC Med Genomics.</italic>
		</source>
                    <year>2013</year>;<volume>6</volume>:<fpage>2</fpage>.
                    <pub-id pub-id-type="pmid">23356878</pub-id>
                    <pub-id pub-id-type="doi">10.1186/1755-8794-6-2</pub-id>
                    <pub-id pub-id-type="pmcid">3572439</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-27">
                <label>27</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
			
                        <name name-style="western">
                            <surname>Jelier</surname>
                            <given-names>R</given-names>
                        </name>
			
                        <name name-style="western">
                            <surname>Schuemie</surname>
                            <given-names>MJ</given-names>
                        </name>
			
                        <name name-style="western">
                            <surname>Veldhoven</surname>
                            <given-names>A</given-names>
                        </name>
			
                        <etal/>
		</person-group>:
                    <article-title>Anni 2.0: a multipurpose text-mining tool for the life sciences.</article-title>
                    <source>
			
                        <italic toggle="yes">Genome Biol.</italic>
		</source>
                    <year>2008</year>;<volume>9</volume>(<issue>6</issue>):<fpage>R96</fpage>.
                    <pub-id pub-id-type="pmid">18549479</pub-id>
                    <pub-id pub-id-type="doi">10.1186/gb-2008-9-6-r96</pub-id>
                    <pub-id pub-id-type="pmcid">2481428</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-28">
                <label>28</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
			
                        <name name-style="western">
                            <surname>Hettne</surname>
                            <given-names>KM</given-names>
                        </name>
			
                        <name name-style="western">
                            <surname>van Mulligen</surname>
                            <given-names>EM</given-names>
                        </name>
			
                        <name name-style="western">
                            <surname>Schuemie</surname>
                            <given-names>MJ</given-names>
                        </name>
			
                        <etal/>
		</person-group>:
                    <article-title>Rewriting and suppressing UMLS terms for improved biomedical term identification.</article-title>
                    <source>
			
                        <italic toggle="yes">J Biomed Semantics.</italic>
		</source>
                    <year>2010</year>;<volume>1</volume>(<issue>1</issue>):<fpage>5</fpage>.
                    <pub-id pub-id-type="pmid">20618981</pub-id>
                    <pub-id pub-id-type="doi">10.1186/2041-1480-1-5</pub-id>
                    <pub-id pub-id-type="pmcid">2895736</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-29">
                <label>29</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
			
                        <name name-style="western">
                            <surname>Hettne</surname>
                            <given-names>KM</given-names>
                        </name>
			
                        <name name-style="western">
                            <surname>Stierum</surname>
                            <given-names>RH</given-names>
                        </name>
			
                        <name name-style="western">
                            <surname>Schuemie</surname>
                            <given-names>MJ</given-names>
                        </name>
			
                        <etal/>
		</person-group>:
                    <article-title>A dictionary to identify small molecules and drugs in free text.</article-title>
                    <source>
			
                        <italic toggle="yes">Bioinformatics.</italic>
		</source>
                    <year>2009</year>;<volume>25</volume>(<issue>22</issue>):<fpage>2983</fpage>&#x2013;<lpage>2991</lpage>.
                    <pub-id pub-id-type="pmid">19759196</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btp535</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-30">
                <label>30</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
			
                        <name name-style="western">
                            <surname>Hettne</surname>
                            <given-names>K</given-names>
                        </name>
			
                        <name name-style="western">
                            <surname>van Schouwen</surname>
                            <given-names>R</given-names>
                        </name>
			
                        <name name-style="western">
                            <surname>Mina</surname>
                            <given-names>E</given-names>
                        </name>
			
                        <etal/>
		</person-group>:
                    <article-title>Explain your data by Concept Profile Analysis Web Services [version 1; referees: 2 approved with reservations].</article-title>
                    <source>
			
                        <italic toggle="yes">F1000Res.</italic>
		</source>
                    <year>2014</year>;<volume>3</volume>:<fpage>173</fpage>.
                    <pub-id pub-id-type="doi">10.12688/f1000research.4830.1</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-31">
                <label>31</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Hull</surname>
                            <given-names>D</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Wolstencroft</surname>
                            <given-names>K</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Stevens</surname>
                            <given-names>R</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Taverna: a tool for building and running workflows of services.</article-title>
                    <source>
						
                        <italic toggle="yes">Nucleic Acids Res.</italic>
					</source>
                    <year>2006</year>;<volume>34</volume>(<issue>Web Server issue</issue>):<fpage>W729</fpage>&#x2013;<lpage>732</lpage>.
                    <pub-id pub-id-type="pmid">16845108</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkl320</pub-id>
                    <pub-id pub-id-type="pmcid">1538887</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-32">
                <label>32</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Wolstencroft</surname>
                            <given-names>K</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Haines</surname>
                            <given-names>R</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Fellows</surname>
                            <given-names>D</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud.</article-title>
                    <source>
						
                        <italic toggle="yes">Nucleic Acids Res.</italic>
					</source>
                    <year>2013</year>;<volume>41</volume>(<issue>Web Server issue</issue>):<fpage>W557</fpage>&#x2013;<lpage>561</lpage>.
                    <pub-id pub-id-type="pmid">23640334</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkt328</pub-id>
                    <pub-id pub-id-type="pmcid">3692062</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-33">
                <label>33</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Mina</surname>
                            <given-names>E</given-names>
                        </name>
					</person-group>:
                    <article-title>HD data analysis workflows for paper: A putative role for genome-wide epigenetic regulatory mechanisms in Huntington&#x2019;s disease: A computational assessment_v2.</article-title>
                    <source>
						
                        <italic toggle="yes">Zenodo.</italic>
					</source>
                    <year>2016</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.5281/zenodo.164201">Data Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-34">
                <label>34</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Mina</surname>
                            <given-names>E</given-names>
                        </name>
					</person-group>:
                    <article-title>HD data interpretation workflows for paper: A putative role for genome-wide epigenetic regulatory mechanisms in Huntington&#x2019;s disease: A computational assessment_v2.</article-title>
                    <source>
						
                        <italic toggle="yes">Zenodo.</italic>
					</source>
                    <year>2016</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.5281/zenodo.164198">Data Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-35">
                <label>35</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Smyth</surname>
                            <given-names>DK</given-names>
                        </name>
					</person-group>:
                    <article-title>Linear models and empirical bayes methods for assessing differential expression in microarray experiments.</article-title>
                    <source>
						
                        <italic toggle="yes">Stat Appl Genet Mol Biol.</italic>
					</source>
                    <year>2004</year>;<volume>3</volume>: Article3.
                    <pub-id pub-id-type="pmid">16646809</pub-id>
                    <pub-id pub-id-type="doi">10.2202/1544-6115.1027</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-36">
                <label>36</label>
                <mixed-citation publication-type="journal">
                    <article-title>Bioconductor - home</article-title>.
                    <ext-link ext-link-type="uri" xlink:href="http://www.bioconductor.org/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-37">
                <label>37</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Hodges</surname>
                            <given-names>A</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Strand</surname>
                            <given-names>AD</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Aragaki</surname>
                            <given-names>AK</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Regional and cellular gene expression changes in human Huntington&#x2019;s disease brain.</article-title>
                    <source>
						
                        <italic toggle="yes">Hum Mol Genet.</italic>
					</source>
                    <year>2006</year>;<volume>15</volume>(<issue>6</issue>):<fpage>965</fpage>&#x2013;<lpage>977</lpage>.
                    <pub-id pub-id-type="pmid">16467349</pub-id>
                    <pub-id pub-id-type="doi">10.1093/hmg/ddl013</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-38">
                <label>38</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Benjamini</surname>
                            <given-names>Y</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Hochberg</surname>
                            <given-names>Y</given-names>
                        </name>
					</person-group>:
                    <article-title>Controlling the false discovery rate: A practical and powerful approach to multiple testing.</article-title>
                    <source>
						
                        <italic toggle="yes">J Roy Stat Soc B Met.</italic>
					</source>
                    <year>1995</year>;<volume>57</volume>(<issue>1</issue>):<fpage>289</fpage>&#x2013;<lpage>300</lpage>.
                    <ext-link ext-link-type="uri" xlink:href="http://engr.case.edu/ray_soumya/mlrg/controlling_fdr_benjamini95.pdf">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-39">
                <label>39</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Kasprzyk</surname>
                            <given-names>A</given-names>
                        </name>
					</person-group>:
                    <article-title>BioMart: driving a paradigm change in biological data management.</article-title>
                    <source>
						
                        <italic toggle="yes">Database (Oxford).</italic>
					</source>
                    <year>2011</year>;<volume>2011</volume>:<fpage>bar049</fpage>.
                    <pub-id pub-id-type="pmid">22083790</pub-id>
                    <pub-id pub-id-type="doi">10.1093/database/bar049</pub-id>
                    <pub-id pub-id-type="pmcid">3215098</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-40">
                <label>40</label>
                <mixed-citation publication-type="journal">
                    <article-title>HUGO gene nomenclature committee home page | HUGO gene nomenclature committee</article-title>.
                    <ext-link ext-link-type="uri" xlink:href="http://www.genenames.org/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-41">
                <label>41</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Ernst</surname>
                            <given-names>J</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Kheradpour</surname>
                            <given-names>P</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Mikkelsen</surname>
                            <given-names>TS</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Mapping and analysis of chromatin state dynamics in nine human cell types.</article-title>
                    <source>
						
                        <italic toggle="yes">Nature.</italic>
					</source>
                    <year>2011</year>;<volume>473</volume>(<issue>7345</issue>):<fpage>43</fpage>&#x2013;<lpage>49</lpage>.
                    <pub-id pub-id-type="pmid">21441907</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nature09906</pub-id>
                    <pub-id pub-id-type="pmcid">3088773</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-42">
                <label>42</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Mina</surname>
                            <given-names>E</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>van Roon-Mom</surname>
                            <given-names>W</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Verschure</surname>
                            <given-names>P</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Dataset 1 in: A putative role for genome-wide epigenetic regulatory mechanisms in Huntington&#x2019;s disease: A computational assessment.</article-title>
                    <source>
						
                        <italic toggle="yes">F1000Research.</italic>
					</source>
                    <year>2017</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.5256/f1000research.9703.d179468">Data Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-43">
                <label>43</label>
                <mixed-citation publication-type="journal">
                    <article-title>UCSC genome browser home</article-title>.
                    <ext-link ext-link-type="uri" xlink:href="http://genome.ucsc.edu/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-44">
                <label>44</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Mina</surname>
                            <given-names>E</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>van Roon-Mom</surname>
                            <given-names>W</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Verschure</surname>
                            <given-names>P</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Dataset 2 in: A putative role for genome-wide epigenetic regulatory mechanisms in Huntington&#x2019;s disease: A computational assessment.</article-title>
                    <source>
						
                        <italic toggle="yes">F1000Research.</italic>
					</source>
                    <year>2017</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.5256/f1000research.9703.d179469">Data Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-45">
                <label>45</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <collab>Roadmap Epigenomics Consortium, </collab>
						
                        <name name-style="western">
                            <surname>Kundaje</surname>
                            <given-names>A</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Meuleman</surname>
                            <given-names>W</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Integrative analysis of 111 reference human epigenomes.</article-title>
                    <source>
						
                        <italic toggle="yes">Nature.</italic>
					</source>
                    <year>2015</year>;<volume>518</volume>(<issue>7539</issue>):<fpage>317</fpage>&#x2013;<lpage>30</lpage>.
                    <pub-id pub-id-type="pmid">25693563</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nature14248</pub-id>
                    <pub-id pub-id-type="pmcid">4530010</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-46">
                <label>46</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Mina</surname>
                            <given-names>E</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>van Roon-Mom</surname>
                            <given-names>W</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Verschure</surname>
                            <given-names>P</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Dataset 3 in: A putative role for genome-wide epigenetic regulatory mechanisms in Huntington&#x2019;s disease: A computational assessment.</article-title>
                    <source>
						
                        <italic toggle="yes">F1000Research.</italic>
					</source>
                    <year>2017</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.5256/f1000research.9703.d179470">Data Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-47">
                <label>47</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Mina</surname>
                            <given-names>E</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>van Roon-Mom</surname>
                            <given-names>W</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Verschure</surname>
                            <given-names>P</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Dataset 4 in: A putative role for genome-wide epigenetic regulatory mechanisms in Huntington&#x2019;s disease: A computational assessment.</article-title>
                    <source>
						
                        <italic toggle="yes">F1000Research.</italic>
					</source>
                    <year>2017</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.5256/f1000research.9703.d179471">Data Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-48">
                <label>48</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Ernst</surname>
                            <given-names>J</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Kellis</surname>
                            <given-names>M</given-names>
                        </name>
					</person-group>:
                    <article-title>ChromHMM: automating chromatin-state discovery and characterization.</article-title>
                    <source>
						
                        <italic toggle="yes">Nat Methods.</italic>
					</source>
                    <year>2012</year>;<volume>9</volume>(<issue>3</issue>):<fpage>215</fpage>&#x2013;<lpage>6</lpage>.
                    <pub-id pub-id-type="pmid">22373907</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nmeth.1906</pub-id>
                    <pub-id pub-id-type="pmcid">3577932</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-49">
                <label>49</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Mina</surname>
                            <given-names>E</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>van Roon-Mom</surname>
                            <given-names>W</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Verschure</surname>
                            <given-names>P</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Dataset 5 in: A putative role for genome-wide epigenetic regulatory mechanisms in Huntington&#x2019;s disease: A computational assessment.</article-title>
                    <source>
						
                        <italic toggle="yes">F1000Research.</italic>
					</source>
                    <year>2017</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.5256/f1000research.9703.d179472">Data Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-50">
                <label>50</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Blackledge</surname>
                            <given-names>NP</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Klose</surname>
                            <given-names>R</given-names>
                        </name>
					</person-group>:
                    <article-title>CpG island chromatin: a platform for gene regulation.</article-title>
                    <source>
						
                        <italic toggle="yes">Epigenetics.</italic>
					</source>
                    <year>2011</year>;<volume>6</volume>(<issue>2</issue>):<fpage>147</fpage>&#x2013;<lpage>52</lpage>.
                    <pub-id pub-id-type="pmid">20935486</pub-id>
                    <pub-id pub-id-type="doi">10.4161/epi.6.2.13640</pub-id>
                    <pub-id pub-id-type="pmcid">3278783</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-51">
                <label>51</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Teodoridis</surname>
                            <given-names>JM</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Strathdee</surname>
                            <given-names>G</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Brown</surname>
                            <given-names>R</given-names>
                        </name>
						</person-group>:
                    <article-title>Epigenetic silencing mediated by CpG island methylation: potential as a therapeutic target and as a biomarker.</article-title>
                    <source>
						
                        <italic toggle="yes">Drug Resist Updat.</italic>
					</source>
                    <year>2004</year>;<volume>7</volume>(<issue>4&#x2013;5</issue>):<fpage>267</fpage>&#x2013;<lpage>78</lpage>.
                    <pub-id pub-id-type="pmid">15533764</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.drup.2004.06.005</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-52">
                <label>52</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Mina</surname>
                            <given-names>E</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>van Roon-Mom</surname>
                            <given-names>W</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Verschure</surname>
                            <given-names>P</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Dataset 6 in: A putative role for genome-wide epigenetic regulatory mechanisms in Huntington&#x2019;s disease: A computational assessment.</article-title>
                    <source>
						
                        <italic toggle="yes">F1000Research.</italic>
					</source>
                    <year>2017</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.5256/f1000research.9703.d179473">Data Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-53">
                <label>53</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Mina</surname>
                            <given-names>E</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>van Roon-Mom</surname>
                            <given-names>W</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Verschure</surname>
                            <given-names>P</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Dataset 7 in: A putative role for genome-wide epigenetic regulatory mechanisms in Huntington&#x2019;s disease: A computational assessment.</article-title>
                    <source>
						
                        <italic toggle="yes">F1000Research.</italic>
					</source>
                    <year>2017</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.5256/f1000research.9703.d179474">Data Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-54">
                <label>54</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Mina</surname>
                            <given-names>E</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>van Roon-Mom</surname>
                            <given-names>W</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Verschure</surname>
                            <given-names>P</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Dataset 8 in: A putative role for genome-wide epigenetic regulatory mechanisms in Huntington&#x2019;s disease: A computational assessment.</article-title>
                    <source>
						
                        <italic toggle="yes">F1000Research.</italic>
					</source>
                    <year>2017</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.5256/f1000research.9703.d179475">Data Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-55">
                <label>55</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Mina</surname>
                            <given-names>E</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>van Roon-Mom</surname>
                            <given-names>W</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Verschure</surname>
                            <given-names>P</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Dataset 9 in: A putative role for genome-wide epigenetic regulatory mechanisms in Huntington&#x2019;s disease: A computational assessment.</article-title>
                    <source>
						
                        <italic toggle="yes">F1000Research.</italic>
					</source>
                    <year>2017</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.5256/f1000research.9703.d179476">Data Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-56">
                <label>56</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Sadri-Vakili</surname>
                            <given-names>G</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Cha</surname>
                            <given-names>JH</given-names>
                        </name>
					</person-group>:
                    <article-title>Mechanisms of disease: Histone modifications in Huntington's disease.</article-title>
                    <source>
						
                        <italic toggle="yes">Nat Clin Pract Neurol.</italic>
					</source>
                    <year>2006</year>;<volume>2</volume>(<issue>6</issue>):<fpage>330</fpage>&#x2013;<lpage>8</lpage>.
                    <pub-id pub-id-type="pmid">16932577</pub-id>
                    <pub-id pub-id-type="doi">10.1038/ncpneuro0199</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-57">
                <label>57</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Jia</surname>
                            <given-names>H</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Morris</surname>
                            <given-names>CD</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Williams</surname>
                            <given-names>RM</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>HDAC inhibition imparts beneficial transgenerational effects in Huntington's disease mice via altered DNA and histone methylation.</article-title>
                    <source>
						
                        <italic toggle="yes">Proc Natl Acad Sci U S A.</italic>
					</source>
                    <year>2015</year>;<volume>112</volume>(<issue>1</issue>):<fpage>E56</fpage>&#x2013;<lpage>E64</lpage>.
                    <pub-id pub-id-type="pmid">25535382</pub-id>
                    <pub-id pub-id-type="doi">10.1073/pnas.1415195112</pub-id>
                    <pub-id pub-id-type="pmcid">4291662</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-58">
                <label>58</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Ng</surname>
                            <given-names>CW</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Yildirim</surname>
                            <given-names>F</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Yap</surname>
                            <given-names>YS</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Extensive changes in DNA methylation are associated with expression of mutant huntingtin.</article-title>
                    <source>
						
                        <italic toggle="yes">Proc Natl Acad Sci U S A.</italic>
					</source>
                    <year>2013</year>;<volume>110</volume>(<issue>6</issue>):<fpage>2354</fpage>&#x2013;<lpage>9</lpage>.
                    <pub-id pub-id-type="pmid">23341638</pub-id>
                    <pub-id pub-id-type="doi">10.1073/pnas.1221292110</pub-id>
                    <pub-id pub-id-type="pmcid">3568325</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-59">
                <label>59</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Kohman</surname>
                            <given-names>RA</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Rhodes</surname>
                            <given-names>JS</given-names>
                        </name>
					</person-group>:
                    <article-title>Neurogenesis, inflammation and behavior.</article-title>
                    <source>
						
                        <italic toggle="yes">Brain Behav Immun.</italic>
					</source>
                    <year>2013</year>;<volume>27</volume>(<issue>1</issue>):<fpage>22</fpage>&#x2013;<lpage>32</lpage>.
                    <pub-id pub-id-type="pmid">22985767</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.bbi.2012.09.003</pub-id>
                    <pub-id pub-id-type="pmcid">3518576</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-60">
                <label>60</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Hettne</surname>
                            <given-names>KM</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Thompson</surname>
                            <given-names>M</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>van Haagen</surname>
                            <given-names>HH</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>The Implicitome: A Resource for Rationalizing Gene-Disease Associations.</article-title>
                    <source>
						
                        <italic toggle="yes">PLoS One.</italic>
					</source>
                    <year>2016</year>;<volume>11</volume>(<issue>2</issue>):<fpage>e0149621</fpage>.
                    <pub-id pub-id-type="pmid">26919047</pub-id>
                    <pub-id pub-id-type="doi">10.1371/journal.pone.0149621</pub-id>
                    <pub-id pub-id-type="pmcid">4769089</pub-id>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report30246">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.10459.r30246</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Pfenning</surname>
                        <given-names>Andreas R.</given-names>
                    </name>
                    <xref ref-type="aff" rid="r30246a1">1</xref>
                    <role>Referee</role>
                </contrib>
                <contrib contrib-type="author">
                    <name>
                        <surname>Kaplow</surname>
                        <given-names>Irene</given-names>
                    </name>
                    <xref ref-type="aff" rid="r30246a1">1</xref>
                    <role>Co-referee</role>
                </contrib>
                <aff id="r30246a1">
                    <label>1</label>Carnegie Mellon University, Pittsburgh, PA, USA</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>8</day>
                <month>2</month>
                <year>2018</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2018 Pfenning AR and Kaplow I</copyright-statement>
                <copyright-year>2018</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport30246" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.9703.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>reject</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>
                <underline>Summary</underline>:</p>
            <p> The purpose of this paper is to leverage publicly-available data to investigate the association between chromatin state and Huntington&#x2019;s Disease (HD). The authors do this by identifying genes that are differentially expressed in individuals with HD relative to healthy individuals and identifying the locations of these genes in the genome and the biological processes associated with these genes. They find that many of these genes&#x2019; promoters are in the active chromatin state in healthy individuals and in CpG islands. They also find that many of these genes are related to biological processes related to HD and that some are in chromatin modification biological processes. Although this study suggests that there may be an association between chromatin state and HD, the nature of that association remains unclear.</p>
            <p> 
                <underline>Major Comments</underline>: 
                <list list-type="order">
                    <list-item>
                        <p>I appreciate how the authors integrated existing literature with differential gene expression results to prioritize biological processes, diseases, genes, and molecular functions.&#x00a0;In addition, defining the similarity between concepts based on the number of shared concepts is similar to approaches that have been used for community detection in social networks (Blondel 
                            <italic>et al</italic>., 
                            <italic>Journal of Statistical Mechanics: Theory and Experiment</italic>, 2008) and, more recently, for clustering cells based on protein expression (Levine 
                            <italic>et al</italic>., 
                            <italic>Cell</italic>, 2015) (I do not think that the authors need to cite these papers), so I am not surprised that it worked well.&#x00a0;I hope that the author&#x2019;s use of this approach will inspire others to use such methods for comparing biological concepts in literature and encourage future researchers to directly integrate literature with differential gene expression.</p>
                    </list-item>
                    <list-item>
                        <p>I found many of the results difficult to interpret because the authors seem to have done all of the analyses on the set of all differentially expressed genes.&#x00a0;My expectations for up-regulated genes are different from those for down-regulated genes.&#x00a0;In the Minor Comments, I point out specific analyses for which I think that separating the genes based on the direction of the differential expression would be helpful.&#x00a0;If the authors did use only down-regulated or only up-regulated genes, it would be great if they could make this clear in the Methods section and include what fold-change cutoff they used.</p>
                    </list-item>
                    <list-item>
                        <p>I thought that some of the claims in the Discussion section were not well-supported by the results.&#x00a0;I have pointed out what these are in the Minor Comments.&#x00a0;Most concerns come from the lack of separation between down-regulated and up-regulated differentially expressed genes in the analyses in this paper.</p>
                    </list-item>
                    <list-item>
                        <p>Although there is no chromatin state data from anywhere in the brain in HD individuals, there are H3K27ac and PolII datasets in the striatum of HD and control mice (Achour 
                            <italic>et al</italic>., 
                            <italic>Human Molecular Genetics</italic>, 2015).&#x00a0;This paper would be more convincing if it included a comparison between the differentially-expressed genes in mouse HD individuals versus controls and the differential H3K27ac regions from this dataset.</p>
                    </list-item>
                    <list-item>
                        <p>I found much of the Methods section difficult to follow.&#x00a0;In the Minor Comments, I point out specific parts that I think should be re-ordered and specific details that I think should be added to make the Methods clearer.&#x00a0;The authors should also include the exact version and settings that they used for every publicly available software package so that others can reproduce the results.</p>
                    </list-item>
                </list> &#x00a0;</p>
            <p> 
                <underline>Minor Comments</underline>:</p>
            <p> 
                <italic>Introduction</italic>:</p>
            <p> Page 3: Although the authors clearly describe literature suggesting that epigenetic mechanisms may be involved in HD, there is also some evidence against the role of epigenetics in HD. For example, a recent study profiled methylation in the cortex of HD individuals and controls using the Illumina HumanMethylation450K BeadChip array and found that there are no significantly differentially-methylated regions between HD individuals and controls (De Souza 
                <italic>et al</italic>., 
                <italic>Human Molecular Genetics</italic>, 2016). The authors should cite this paper and explain why it does not demonstrate that epigenetics is not involved in HD (the assay used was not genome-wide, methylation is not the only component of transcriptional regulation, etc.).</p>
            <p> Page 3: It is not clear to me why transcriptional dysregulation in HD would be associated with differentially expressed genes in regions that are not normally associated with active chromatin states. My understanding from the literature cited in the introduction is that many of the genes that are differentially expressed in HD individuals have 
                <italic>lower</italic> expression in HD individuals than they do in control individuals. I would therefore expect that these genes would fall in regions that are normally associated with active chromatin states but may not be associated with active chromatin states in individuals with HD. It would be great if the authors could clarify the motivation behind this hypothesis.</p>
            <p> Page 3: At the end of the introduction, the computational method is introduced as an approach to intelligently select which experiments to do. However, I was not sure from the introduction what types of experiments this method is designed to guide. It would be great if the authors could add a more detailed explanation of this earlier in the manuscript.</p>
            <p> 
                <italic>Methods</italic>:</p>
            <p> Page 3: It would be easier to understand the advantages of the vector space model if they were listed after the description of the vector space model instead of before it.</p>
            <p> Page 3: It would be helpful if the authors could describe how the subset of PubMed records are selected for genes or cite a previous paper that uses the same method that they used.</p>
            <p> Page 3: It would be helpful if the authors could define &#x201c;symmetric uncertainty coefficient.&#x201d;</p>
            <p> Page 4: It would be helpful if the authors could list exactly what publicly available datasets were used for each differential expression test before describing the differential expression test.</p>
            <p> Page 4: It would be helpful if the authors could state what microarrays were used to generate the gene expression data before describing how the differential expression analysis was done.</p>
            <p> Page 4: It seems like the authors did not account for potential confounding factors that were available, such as sex, age, and brain tissue, in the differential expression analysis. I am concerned that these confounding factors may affect the results.</p>
            <p> Page 4: The authors state which human and mouse assemblies they used, but they had not previously stated that their analysis included data from mouse. It would be helpful if the authors could state exactly what species they are using for each part of their analysis earlier in the manuscript.</p>
            <p> Page 4: The authors state that they used a Kolmogorov-Smirnov test to compare p-value distributions. It was not clear to me where these p-values come from. Are they the p-values for differential expression of the genes corresponding to the promoters? It would be helpful if the authors could clarify this.</p>
            <p> Page 4: It was not clear how many Kolmogorov-Smirnov tests were done. The authors said that they rejected the null hypothesis if the p-value was &lt; 0.05. If they did more than one test, then they should do multiple hypothesis correction.</p>
            <p> Page 4: It would be helpful if the authors could clarify the purpose of the concept ids.</p>
            <p> Page 4: It would be helpful if the authors could provide a more detailed explanation of how the concept linking is done.</p>
            <p> Page 5: It would be helpful if the authors could define the HTT concept and explain why they used it for prioritizing genes.</p>
            <p> Page 5: It would be helpful if the authors could explain why they decided to prioritize genes based on the &#x201c;epigene&#x201d; concept. It seems like the authors are interested in genes that affect epigenetics, such as demethylases or histone modifiers. It is not clear to me how this selection relates to the hypothesis that was described in the introduction.</p>
            <p> Page 5: It would be helpful if the authors could clarify exactly what differential expression tests were done with the human brain data and what the categories were for each test.</p>
            <p> Page 5: It would be helpful if the authors could clarify whether the human brain data described here was the only human data used for differential expression analysis and, if it was not, what other data was used.</p>
            <p> Page 5: It would be helpful if the authors could briefly describe how the CpG island data fits into the rest of the analysis.</p>
            <p> Page 5: It would be helpful if the authors could explain why they selected the two cell types and four chromatin states that they used in the Methods section.</p>
            <p> Page 5: I think that it might make sense to incorporate additional chromatin states, such as quiescent, weak repressed Polycomb, and enhancer, as strong repression is not always the cause of a promoter&#x2019;s inactivity.</p>
            <p> Page 5: It would be helpful if the authors could clarify why they used only the mouse data from animals treated with the vehicle. My intuition is that it would make more sense to use the animals that did not receive the HDACi 4b inhibitor since the human subjects did not receive any kind of treatment. It is possible that I misunderstood the purpose of the mouse analysis.</p>
            <p> 
                <italic>Results</italic>:</p>
            <p> Page 6: It is not clear to me why a difference in distribution of expression levels between genes overlapping a chromatin state and genes not overlapping that chromatin state implies that chromatin state has an effect on HD. I think that the authors mean that, if the 
                <italic>difference</italic> in gene expression between individuals with and without HD is higher for genes overlapping a specific chromatin state than overlapping other chromatin states, then there is an 
                <italic>association</italic> between the chromatin state and HD.</p>
            <p> Page 6: It would be helpful to split Figure 1 into two parts, one for genes that have higher expression in people with HD and another for genes that have lower expression for people with HD. My intuition is that most of the differences in p-value distribution are coming from the second category because, since the chromatin state data comes from people without HD, I would expect that genes in an active chromatin state would have higher expression in healthy individuals. Adding onto that, regions of closed chromatin cannot decrease because the genes are not expressed. Regions of open chromatin could either increase or decrease, potentially leading to more variability.</p>
            <p> Page 6: It would be helpful to have a supplemental figure with all chromatin states because it is not clear from Figure 1 if the differences occur for TSS&#x2019;s in all active chromatin states (including inactive genes that are acting as enhancers for other genes) or only from genes that are transcribed.</p>
            <p> Page 6: It would be helpful if the authors could clarify if the overlaps in Figure 1 are done using the entire gene, only the TSS, or the gene&#x2019;s promoter.</p>
            <p> Page 7: For the biological process analyses, I think that using a tool for differential enrichment between the two groups of genes would provide more interpretable results than comparing the top hits from CPA because such a tool looks for terms that are significantly enriched in one gene set relative to another. An example of such as tool is CompGO (Waardenberg 
                <italic>et al</italic>., 
                <italic>BMC Bioinformatics</italic>, 2015).</p>
            <p> Page 8: It would be helpful if the authors clarified what they mean by &#x201c;top novel protein.&#x201d; Does novel mean that the gene had not been associated with HD in a previous paper?</p>
            <p> Page 8: It was not clear why Figure 3 shows that CPA is able to prioritize true associations with huntington as measured by a gene expression experiment and why combining differential expression measurements and literature evidence enables the selection of even more specific HD signatures. It would be great if the authors could clarify this.</p>
            <p> Page 8: It would be helpful if the authors could include the direction of the CPA score shifts for the different groups of differentially expressed genes.</p>
            <p> Page 8: The authors say that &#x201c;the top 100 and top 1000 differ significantly.&#x201d; It would be helpful if they stated the way in which these gene sets differ.</p>
            <p> Page 11: It would be helpful if the authors could clarify what x is in Figure 3.</p>
            <p> 
                <italic>Discussion</italic>:</p>
            <p> Page 11: I am not sure if the paper provides a lack of evidence for genome-wide re-localization of gene activity to repressed chromatin states. The paper combined all of the up-regulated and down-regulated genes instead of separating them. If the paper had shown that the genes that are up-regulated in people with HD are not found in repressive chromatin states in healthy individuals, then I would be more convinced of this lack of re-localization. However, I would not be fully convinced because changes in chromatin state do not always cause changes in gene expression. For example, a previous study showed that most single nucleotide polymorphisms associated with histone modifications are not associated with transcription, suggesting that histone modification differences between individuals do not always correspond to gene expression differences (Grubert 
                <italic>et al</italic>., 
                <italic>Cell</italic>, 2015). Thus, it is possible that there are chromatin state differences between HD individuals and controls in parts of the genome where there are no differentially-expressed genes.</p>
            <p> Page 11: The authors suggest that HD is not associated with the disruption of chromatin states at a large scale. To investigate the association of HD with chromatin state using existing data, the authors would need determine if genes that are up-regulated in people with HD tend to fall in repressive chromatin states and if those that are down-regulated in people with HD tend to fall in active chromatin states. Because there do not seem to be separate evaluations of up-regulated and down-regulated genes, I do not think that the results in this paper can be used to evaluate the relationship between chromatin state disruptions and HD.</p>
            <p> Page 12: I think that CPA&#x2019;s high ranking of chromatin-related concepts for differentially expressed genes suggests an association between chromatin reorganization and HD. If differentially expressed genes near CpG islands include genes involved in chromatin structure, that suggests that there is 
                <italic>cis</italic>-regulatory change in the regulation of those genes, which could have a downstream effect on chromatin organization.</p>
            <p> Page 12: Although the paper shows that there are more differentially expressed genes in the active chromatin state in healthy individuals, I am not sure that there is sufficient evidence to conclude that most important changes in HD are occurring in the active chromatin state. For example, if the majority of differentially expressed genes are down-regulated in individuals with HD, then the findings in this paper would match my expectations, even if the most important differentially-expressed genes are up-regulated and are not found in the active chromatin state in healthy individuals.</p>
            <p> 
                <italic>Supplemental Datasets</italic>
            </p>
            <p> Dataset 1: Some of the line breaks seem to be missing.</p>
            <p> Dataset 2: The column breaks seem to be missing.</p>
            <p> Dataset 8: The column breaks seem to be missing for the top 100 differentially expressed genes.</p>
            <p> Supplementary File 1: The first word in the first figure caption seems like it should be &#x201c;Illustration.&#x201d;</p>
            <p> Supplementary File 1: It would be great if the authors could clarify what they mean by &#x201c;x2.&#x201d;</p>
            <p> Supplementary File 1: It would be great if the authors could explain why they are using the HMEC and NHEK cell lines.</p>
            <p> 
                <bold>Is the work clearly and accurately presented and does it cite currently literature?</bold>
            </p>
            <p> The authors seem to clearly describe what they do and cite most of the relevant literature. However, as I mentioned in the fifth major comment, I found parts of the Methods section difficult to follow.</p>
            <p> 
                <bold>Is the study design appropriate and is the work technically sound?</bold>
            </p>
            <p> As I mentioned in the first major comment, I do not think that the authors can test their hypothesis with their study design because they combine the up-regulated and down-regulated genes.</p>
            <p> 
                <bold>Are sufficient details of methods and analysis provided to allow replication by others?</bold>
            </p>
            <p> The authors provide publicly available workflows for almost everything they did. However, as I mentioned in my fifth major comment, the lack of clarity in parts of the Methods section might make reproducing some of the results difficult.</p>
            <p> 
                <bold>If applicable, is the statistical analysis and its interpretation appropriate?</bold>
            </p>
            <p> Most of the statistical analysis seems appropriate, but most of the interpretation does not make sense because the up-regulated and down-regulated genes were combined.</p>
            <p> 
                <bold>Are all the source data underlying the results available to ensure full reproducibility?</bold>
            </p>
            <p> Yes.</p>
            <p> 
                <bold>Are the conclusions drawn adequately supported by the results?</bold>
            </p>
            <p> As I mentioned in my second major comment, I think that many of the conclusions are not supported by the results.</p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Partly</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>Partly</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Partly</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Partly</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>Neurobiology, epigenetics, computational biology</p>
            <p>We confirm that we have read this submission and believe that we have an appropriate level of expertise to state that we do not consider it to be of an acceptable scientific standard, for reasons outlined above.</p>
        </body>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report27321">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.10459.r27321</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Queralt-Rosinach</surname>
                        <given-names>N&#x00fa;ria</given-names>
                    </name>
                    <xref ref-type="aff" rid="r27321a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-0169-8159</uri>
                </contrib>
                <aff id="r27321a1">
                    <label>1</label>Department of Integrative Structural and Computational Biology, Scripps Research Institute, La Jolla, CA, USA</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>24</day>
                <month>11</month>
                <year>2017</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2017 Queralt-Rosinach N</copyright-statement>
                <copyright-year>2017</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport27321" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.9703.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>Summary: This is a computational study devoted to investigate the hypothesis that epigenetic mechanisms dysregulate transcription at genome-wide scale in Huntington's disease (HD). The authors designed their experiments to evaluate two regulatory mechanisms: 1) change of the chromatin state, and 2) CpG island methylation. By means of statistical analysis of experimental data they evaluated the association of dysregulated genes in HD with these two processes, and they provided results and a literature-based semantic analysis for functional interpretation. Furthermore, by application of a semantic analysis they provided a list of prioritized proteins based on their newly predicted association with HTT and epigenetics. Lastly, they evaluated their semantic analysis for gene prioritization. The main conlusions are that their findings do not support the hypothesis of a massive transcriptional dysregulation in HD is linked to large-scale relocation of gene activity, thus the authors speculate that epigenetic effects might be more closely related to dysregulation of individual genes. Finally, the authors claim that their methodology for hypothesis generation can be of great value for the scientific community as it helps in narrowing down the key associations and the evidence underlying them.</p>
            <p> </p>
            <p> </p>
            <p> Reviewer notes: 
                <list list-type="bullet">
                    <list-item>
                        <p>This is an interesting work on the possible epigenetic mechanisms that contribute to transcription dysregulation in Huntington Disease (HD). It is very well written in a clear and accurate manner. They based their hypothesis on a very cited current literature. I have only detected one typo in the whole manuscript: limma vs LIMMA, the authors should be consistent in the format.</p>
                    </list-item>
                    <list-item>
                        <p>Introduction: I would like to highlight that this study is the first time that is done, so is novel and relevant. Good literature citation.</p>
                    </list-item>
                    <list-item>
                        <p>Methods: The experiments are properly designed and the methods used are well established and based on state-of-the-art approaches. All is very FAIR: data, workflows and webservices used. The study is based on public and open resources (data and software). The methodology is well described but i still missed some information: lack of the statement about the R version, and lack of the specific parameters used in the statistical analyses that other scientists would need to replicate the experiments. Why the authors chose these statistical tests? Regarding CPA- co-occurrence can come from refuting evidence of association, is this taken into account in the concept profile? if does, it is that reflected in the evidence graph downweighting some edges? state the version of CPA database of relations used.</p>
                    </list-item>
                    <list-item>
                        <p>Design: They tested two regulatory mechanisms: 1) via changes in the chromatin state, 2) via methylation in CpG island content. Both using the promotor gene region to 1) overlap with the chromatin state, 2) to overlap with the CpG content of deregulated genes. They assessed the association via KS statistic test, why this test?&#x00a0;As the authors said, there are more regulatory regions in the DNA that could be target of epigenetic regulation, could a cumulative dysregulation in all these regions derive to a large-scale?</p>
                    </list-item>
                    <list-item>
                        <p>Results: All the resulted data is available for reproducibility check. Highlight they reproduce previous published results. I am wondering if they had issues accessing, pre-processing the data to adapt it for their analysis workflows. An explanation of these issues and if their workflows help the community on this regard facilitating to overcome these issues in a systematic, reproducible and traceable manner would be of importance.</p>
                    </list-item>
                    <list-item>
                        <p>Importantly, their analyses integrate experimental data and knowledge and evidence from the literature. Regarding the text mined noisy/literature-biased knowledge that may come from their CPA approach, inclusion of ontologies could be benefitial by leveraging the intrinsic knowledge using automated logical reasoning. Have the authors any plans on this regard?</p>
                    </list-item>
                    <list-item>
                        <p>Discussion: I agree with the authors that the combination of experimental data analyses with Literature-based functional interpretation of the results (CPA) is relevant and add value to the results. In the second paragraph, can the authors suggest next steps to try to explain the unexpected results?</p>
                    </list-item>
                    <list-item>
                        <p>Conclusions: The conclusions are supported by their results. In the conclusions section I&#x00a0;missed the conclusion about the title of the paper is investigating, which is the plausibility of a genome-wide scale epigenetic dysregulation of transcription in HD although are stated in the abstract and discussion.</p>
                    </list-item>
                </list> </p>
            <p> I would emphasize the relevance of their approach performing synergystic research work between computational and wet lab researchers. This interdisciplinary research approach seems to me the way to go in an innovative and efficient big data driven research.</p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Yes</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>I cannot comment. A qualified statistician is required.</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Yes</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Yes</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>Data science</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
</article>
