<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.139476.2</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Research Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>Epigenetic germline variants predict cancer prognosis and risk and distribute uniquely in topologically associating domains</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 2; peer review: 2 approved, 2 approved with reservations]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Goudarzi</surname>
                        <given-names>Shervin</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Pagadala</surname>
                        <given-names>Meghana</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Klie</surname>
                        <given-names>Adam</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <xref ref-type="aff" rid="a2">2</xref>
                    <xref ref-type="aff" rid="a3">3</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Talwar</surname>
                        <given-names>James V</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <xref ref-type="aff" rid="a2">2</xref>
                    <xref ref-type="aff" rid="a3">3</xref>
                </contrib>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Carter</surname>
                        <given-names>Hannah</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-1729-2463</uri>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a3">3</xref>
                    <xref ref-type="aff" rid="a4">4</xref>
                    <xref ref-type="aff" rid="a5">5</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>Canyon Crest Academy, San Diego, California, 92130, USA</aff>
                <aff id="a2">
                    <label>2</label>Biomedical Sciences Program, University of California San Diego, La Jolla, California, 92093, USA</aff>
                <aff id="a3">
                    <label>3</label>Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, California, 92093, USA</aff>
                <aff id="a4">
                    <label>4</label>Medicine, University of California San Diego, La Jolla, California, 92093, USA</aff>
                <aff id="a5">
                    <label>5</label>Moores Cancer Center, La Jolla, California, CA 92093, USA</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:hkcarter@ucsd.edu">hkcarter@ucsd.edu</email>
                </corresp>
                <fn fn-type="conflict">
                    <p>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>24</day>
                <month>7</month>
                <year>2025</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2023</year>
            </pub-date>
            <volume>12</volume>
            <elocation-id>1083</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>22</day>
                    <month>7</month>
                    <year>2025</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2025 Goudarzi S et al.</copyright-statement>
                <copyright-year>2025</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/12-1083/pdf"/>
            <abstract>
                <sec>
                    <title>Background</title>
                    <p>Methylation quantitative trait loci (meQTLs) associate with different levels of local DNA methylation in cancers. Here, we investigated whether the distribution of cancer meQTLs reflected functional organization of the genome in the form of chromatin topologically associated domains (TADs) and evaluated whether cancer meQTLs near known driver genes have the potential to influence cancer risk or progression.</p>
                </sec>
                <sec>
                    <title>Methods</title>
                    <p>Published cancer meQTLs were analyzed according to their location in transcriptionally active or inactive TADs and TAD boundary regions. Cancer meQTLs near known cancer genes were analyzed for association with cancer risk in the UKBioBank , and prognosis in The Cancer Genome Atlas (TCGA).</p>
                </sec>
                <sec>
                    <title>Results</title>
                    <p>In TAD boundary regions, the density of cancer meQTLs was higher near inactive TADs. Furthermore, we observed an enrichment of cancer meQTLs in active TADs near tumor suppressors, whereas there was a depletion of such meQTLs near oncogenes. Several meQTLs were associated with cancer risk in the UKBioBank, and we were able to reproduce breast cancer risk associations in the DRIVE cohort. Survival analysis in TCGA implicated a number of meQTLs in 13 tumor types. In 10 of these, polygenic cancer meQTL scores were associated with increased hazard in a CoxPH analysis. Risk and survival-associated meQTLs tended to affect cancer genes involved in DNA damage repair and cellular adhesion and reproduced cancer-specific associations reported in prior literature.</p>
                </sec>
                <sec>
                    <title>Conclusions</title>
                    <p>This study provides evidence that genetic variants that influence local DNA methylation are affected by chromatin structure and can impact tumor evolution.</p>
                </sec>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>meQTLs</kwd>
                <kwd>TAD</kwd>
                <kwd>Cancer</kwd>
                <kwd>Polygenic Risk Score</kwd>
                <kwd>XGBoost</kwd>
                <kwd>Machine learning</kwd>
            </kwd-group>
            <funding-group>
                <award-group id="fund-1">
                    <funding-source>National Institutes of Health, National Cancer Institute</funding-source>
                    <award-id>R01CA269919</award-id>
                </award-group>
                <award-group id="fund-2" xlink:href="http://dx.doi.org/10.13039/100000057">
                    <funding-source>National Institute of General Medical Sciences</funding-source>
                    <award-id>2P41GM103504</award-id>
                </award-group>
                <funding-statement>This work was supported by an NIH National Cancer Institute grant (R01CA269919) to HC and a National Institute of General Medical Sciences infrastructure grant (2P41GM103504-11). </funding-statement>
            </funding-group>
        </article-meta>
        <notes>
            <sec sec-type="version-changes">
                <label>Revised</label>
                <title>Amendments from Version 1</title>
                <p>This version of the manuscript includes clarifications requested by the reviewers. This resulted to updates of figures 1,2 and 5 and the addition of 2 supplementary figures, and some added text and references.</p>
            </sec>
        </notes>
    </front>
    <body>
        <sec id="sec1" sec-type="intro">
            <title>Introduction</title>
            <p>Cancer is a heterogeneous disease and common treatments like chemotherapy have only a 55% response rate.
                <sup>
                    <xref ref-type="bibr" rid="ref1">1</xref>
                </sup> Precision medicine and biomarker analysis can tailor treatment options and optimize outcomes. Genetic factors, such as germline and somatic mutations, contribute to heterogeneous disease risk and progression. For example, germline variants in the 
                <italic toggle="yes">BRCA2</italic> gene can greatly increase the risk of developing breast and ovarian cancer.
                <sup>
                    <xref ref-type="bibr" rid="ref2">2</xref>
                </sup> Epigenetic factors including DNA methylation, histone modification, and acetylation also play a key role in cancer progression. Recently, promising therapeutics have been developed that inhibit DNA methyltransferases (DNMTs), reducing tumor growth in breast cancer and highlighting the importance of DNA methylation and other epigenetic factors in carcinogenesis.
                <sup>
                    <xref ref-type="bibr" rid="ref2">2</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref3">3</xref>
                </sup> However, the interplay between epigenetics and genetics in cancer risk and progression remains mostly elusive.</p>
            <p>Methylation quantitative trait loci, or meQTLs, are single nucleotide polymorphisms (SNPs) that significantly correlate with DNA methylation at CpG sites. These SNPs provide a bridge between genetic variation and corresponding epigenetic effects shown to correlate with cancer risk.
                <sup>
                    <xref ref-type="bibr" rid="ref4">4</xref>
                </sup> Disruptions in DNA methylation are well-known in the context of cancer; DNA is frequently hypermethylated at promoter regions of tumor suppressor genes while hypomethylated at the promoters of oncogenes, and there is an inverse correlation with gene expression.
                <sup>
                    <xref ref-type="bibr" rid="ref5">5</xref>
                </sup> Promoter hyper- and hypo-methylation has been of specific interest due to its role in regulating the expression of cancer genes including suppression of tumor suppressor genes like BRCA
                <sup>
                    <xref ref-type="bibr" rid="ref6">6</xref>
                </sup> and the expression of oncogenes like L1NE1.
                <sup>
                    <xref ref-type="bibr" rid="ref7">7</xref>
                </sup> Subsequently, germline SNPs that acted as meQTLs were shown to predict risk in many cancer types like breast and lung, regulating expression and methylation of genes like FBXO-18.
                <sup>
                    <xref ref-type="bibr" rid="ref4">4</xref>
                </sup>
            </p>
            <p>The organization of the genome into 3D structures may further modify the potential of genetic variants to interact with epigenetic factors in a disease specific manner.
                <sup>
                    <xref ref-type="bibr" rid="ref8">8</xref>
                </sup> Topologically associating domains (TADs) are isolated regions of highly-interacting and folded chromatin separated by insulator proteins. TADs are important for maintaining controlled patterns of local gene regulation and provide a framework for transcriptionally similar genes and SNPs to interact with one another.
                <sup>
                    <xref ref-type="bibr" rid="ref9">9</xref>
                </sup> In fact, because TADs have been found to be highly stable across tissue types, they provide valuable context for understanding the genome&#x2019;s functional landscape allowing the study of genetic variation in the context of 3D chromatin structure.
                <sup>
                    <xref ref-type="bibr" rid="ref10">10</xref>
                </sup> Mutational burden of somatic mutations within the context of cancer demonstrated correlation with TADs.
                <sup>
                    <xref ref-type="bibr" rid="ref11">11</xref>
                </sup> In addition, genes within TADs demonstrate correlated gene expression and histone modification,
                <sup>
                    <xref ref-type="bibr" rid="ref12">12</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref13">13</xref>
                </sup> allowing us to group similar acting genes and SNPs, narrowing a search for potentially cancer related SNPs.</p>
            <p>In this study, we integrate genetic correlates of DNA methylation across 23 cancer types (i.e. cancer meQTLs) and TAD domains to better understand how 3-D chromatin structure might determine the potential of meQTLs to influence cancer risk and survival. We focus on meQTLs near TADs containing key cancer-related genes. Analyzing the location and distribution of such variants across the genome, we find that methylation-related germline variants, or meQTLs, in cancer do not lie uniformly across the genome and the occurrence of TAD boundaries correlates with significant cancer meQTL presence. In addition, meQTLs closely related to cancer progression show specific nonrandom distribution in TAD domains. Then we assessed whether meQTLs near cancer genes could predict cancer survival and risk and found significant prediction power of these meQTLs across multiple cancer types. Our study suggests that the potential of meQTLs to contribute to cancer risk and progression depends in part on local genome architecture and chromatin state.</p>
        </sec>
        <sec id="sec2" sec-type="results">
            <title>Results</title>
            <sec id="sec3">
                <title>Active TADs are associated with less DNA methylation at cancer meQTLs</title>
                <p>We identified 1100 TADs shared across 5 cell lines (GM12878, HMEC, HUVEC, IMR90, and NHEK) and categorized them into &#x201c;Mixed&#x201d;, &#x201c;Inactive-1&#x201d;, &#x201c;Inactive-2&#x201d;, &#x201c;Active-1&#x201d;, and &#x201c;Active-2&#x201d; groups using chromatin state information (
                    <xref ref-type="fig" rid="f1">
Figure 1A</xref>). Combining the active and inactive groups resulted in 222 active, 626 inactive and 252 mixed TADs. DNA methylation is linked with TAD activity via nucleosome positioning and chromatin condensation
                    <sup>
                        <xref ref-type="bibr" rid="ref14">14</xref>
                    </sup> as well as to regulation of gene expression, where promoter CpG methylation is associated with gene silencing.
                    <sup>
                        <xref ref-type="bibr" rid="ref15">15</xref>
                    </sup> We compared our categorization of TAD activity with genome-wide DNA methylation in promoter regions defined based on the ENCODE Screen Pipeline. Promoters in active TADs showed overall lower levels of methylation whereas those in inactive TADs had a higher level of methylation (Kruskal-Wallis, p-value&lt;0.001) (
                    <xref ref-type="fig" rid="f1">
Figure 1B</xref>), supporting that promoter methylation silencing aligns with categorization of TADs into transcriptionally different groups, namely into &#x201c;active&#x201d; and &#x201c;inactive&#x201d;.</p>
                <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                    <label>
Figure 1. </label>
                    <caption>
                        <title>Evaluating DNA methylation and meQTL burden in topologically associated domains (TADs).</title>
                        <p>(A) 5 state-based K-Means clustering of common TAD domains (n=1100) between 5 human cell lines (GM12878, HMEC, HUVEC, IMR90, and NHEK). Shared TAD domains are on the y-axis (n=1100) and are grouped according to 15 chromatin states (x-axis). Purple indicates TADs classified as a &#x201c;Mixed&#x201d;, Gray as &#x201c;Inactive-1&#x201d;, Light Blue as &#x201c;Active-1&#x201d;, Orange as &#x201c;Active-2&#x201d;, and Red as &#x201c;Inactive-2&#x201d;. Combining active and inactive categories leads to 222 Active, 626 Inactive, and 252 Mixed TADs. (B) On average, inactive TADs have higher DNA methylation levels than active TADs (Kruskal-Wallis test, p-value&lt;0.001). These results are supported by previous literature concerning promoter methylation and transcriptional activity. (C) Number of meQTLs across inactive TADs versus active TADs are shown. meQTL counts per TAD were normalized by TAD length in base pairs. Active TADs show on average a larger normalized burden of meQTLs than inactive TADs (Student-t Test, p&lt;0.05).</p>
                    </caption>
                    <graphic id="gr1" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/185383/8128d842-6ae8-46b1-95f2-3f857541e4f7_figure1.gif"/>
                </fig>
            </sec>
            <sec id="sec4">
                <title>Cancer meQTLs are more abundant in inactive domains</title>
                <p>Next we measured the overall burden of independent cancer meQTLs (i.e. meQTLs deemed to represent distinct haplotypes based on the level of linkage disequilibrium; LD) across TAD categories, normalized by TAD length in base pairs. To obtain independent meQTLs, we clumped related meQTLs from Gong 
                    <italic toggle="yes">et al.</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref46">16</xref>
                    </sup> based on linkage disequilibrium using PLINK. Out of the 1.2 million SNPs, 60,602 remained after LD pruning (
                    <xref ref-type="table" rid="T1">
Table 1</xref>). We observed a slightly increased number of cancer meQTLs in inactive domains relative to active regions (Student T-test, p-value&lt;0.05; 
                    <xref ref-type="fig" rid="f1">
Figure 1C</xref>).</p>
                <table-wrap id="T1" orientation="portrait" position="float">
                    <label>
Table 1. </label>
                    <caption>
                        <title>General Information on meQTL number across TADs and multiple analyses.</title>
                        <p>Each row shows the total number of meQTLs after each analysis across each TAD type. The rows are as follows: all meQTLs without filtration, meQTLs in LD from PLINK clumping software (p&lt;1&#x00d7;10
                            <sup>-5</sup>) and meQTLs in LD with CpG probe in cancer driver gene promoter region. Other indicates meQTLs that are in the inter-TAD region but do not fall within the boundary region as defined.</p>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">
meQTL filtration methods</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">
Active TAD meQTLs</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">
Inactive TAD meQTLs</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">
Boundary meQTLs</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">
Other</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">
Total</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">All meQTLs</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">30,210</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">70,101</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">56,304</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">1,079,527</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">1,236,142</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Clumped meQTLs</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">1,159</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">4,490</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">2,763</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">52,190</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">60602</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Cancer gene-related clumped meQTLs</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">21</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">8</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">20</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">107</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">156</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <p>We also evaluated cancer meQTLs at TAD boundaries, considering four categories of boundary based on the category of the flanking TADs: &#x201c;Active-Boundary-Active&#x201d;, &#x201c;Inactive-Boundary-Inactive&#x201d;, &#x201c;Active-Boundary-Inactive&#x201d;, and &#x201c;Inactive-Boundary-Active&#x201d;. To allow aggregation across variable length regions, we divided each boundary region into 40 equal genomic bins and calculated the number of meQTLs in each. We then compared the observed density of meQTLs to that obtained by randomizing flanking TAD categories 100 times. Comparing the density of meQTLs in each boundary category to the randomized equivalent, the active-active (student t-test, p&lt;0.01), active-inactive (p&lt;0.01), and inactive-active boundaries (p&lt;0.01) all showed difference in distribution from random, while inactive-inactive (p=0.089) did not (
                    <xref ref-type="fig" rid="f2">
Figure 2A-D</xref>). Distributions suggested an increase in density of clumped meQTLs when transitioning from active to inactive regions, and conversely, a decrease from inactive to active regions (Kruskal-Wallis ANOVA, p-value&lt;0.05) when compared to the randomly shuffled distribution, but no shift in density for Active-Boundary-Active and Inactive-Boundary-Inactive categories (
                    <xref ref-type="fig" rid="f2">
Figure 2B-D</xref>).</p>
                <fig fig-type="figure" id="f2" orientation="portrait" position="float">
                    <label>
Figure 2. </label>
                    <caption>
                        <title>Normalized burden of meQTLs in adjacent TADs.</title>
                        <p>The binned average normalized meQTL burden distribution is shown across boundaries between consecutive TADs, grouped by transition category: active to active, active to inactive, inactive to active, and inactive to inactive. The start/end of the TADs for both active and inactive are shown red and blue, respectively. Distributions are smoothened by rolling average for visualization purposes. The graphs represent a unique distribution of meQTL burden across consecutive TADs as opposed to an even spread. The dotted brown line represents the distribution for shuffled random TADs to act as control. (A) Active-active (p=3.51&#x00d7;10
                            <sup>-10</sup>), (B) active-inactive (p=3.45&#x00d7;10
                            <sup>-46</sup>), and (C) inactive-active (p=1.65&#x00d7;10
                            <sup>-25</sup>) boundaries all showed clear difference in distribution from random, while (D) inactive-inactive (p=0.089) did not.</p>
                    </caption>
                    <graphic id="gr2" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/185383/8128d842-6ae8-46b1-95f2-3f857541e4f7_figure2.gif"/>
                </fig>
            </sec>
            <sec id="sec5">
                <title>Oncogene and tumor suppressor gene-related cancer meQTLs cluster differentially in TADs</title>
                <p>Clumped cancer meQTLs were further narrowed to those whose corresponding affected CpG probes were within the promoter regions of cancer driver genes including oncogenes and tumor suppressor genes (TSGs) from the COSMIC database.
                    <sup>
                        <xref ref-type="bibr" rid="ref16">17</xref>
                    </sup> In total, 103 oncogenes and 223 TSGs were used for this analysis, where only 67 of them contained meQTL-associated CpG probes in their promoter regions (i.e. 49 TSGs and 18 oncogenes). Out of the 60,602 clumped meQTLs, 156 of them significantly affected CpG probes located in promoter regions of cancer driver genes (driver meQTLs; 
                    <xref ref-type="table" rid="T1">
Table 1</xref>). Overall, we saw an overwhelming bias for driver meQTLs to occur in active regions, followed by boundary, and inactive (
                    <xref ref-type="fig" rid="f3">
Figure 3A</xref>). To understand whether the observed distribution of driver meQTLs was expected, we selected equivalent numbers of meQTLs at random and evaluated their distribution across region types. We did this separately for meQTLs associated with oncogenes versus TSGs, as meQTLs might have different implications in the context of selection for gain versus loss of function. In the oncogene case, meQTLs were depleted relative to random in active TADs, and enriched relative to random in inactive TADs, with no difference in boundary regions. Conversely, for TSGs, there was a significant enrichment of cancer-related meQTLs in active TADs and boundary regions, but a depletion in inactive TADs (
                    <xref ref-type="fig" rid="f3">
Figure 3B-C</xref>). These opposing trends could suggest genes with the potential to be oncogenes or tumor suppressors (i.e. growth promoting versus limiting) are under different constraints with respect to the propensity for methylation to accumulate in their promoter regions.</p>
                <fig fig-type="figure" id="f3" orientation="portrait" position="float">
                    <label>
Figure 3. </label>
                    <caption>
                        <title>Expected versus observed occurrence of Driver meQTLs for oncogenes and TSGs by region type.</title>
                        <p>(A) The number of driver meQTLs per MB are plotted, divided according to the category of TAD they are located in. Normalization was conducted by the total region size in each category. (B-C) Randomization analysis for burden of non-cancer meQTLs normalized by number of base pairs in each region was conducted to obtain the expected number of cancer meQTLs per MB. To model random expectation (B) 54 non-cancer meQTLs (i.e. number of oncogene-proximal meQTLs) and (C) 102 non-cancer meQTLs (i.e. number of TSG-proximal meQTLs) were sampled 1000 times for oncogenes and TSGs respectively. Bar graphs are drawn with standard errors. The actual observed cancer meQTL burden is shown as a red dot.</p>
                    </caption>
                    <graphic id="gr3" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/185383/8128d842-6ae8-46b1-95f2-3f857541e4f7_figure3.gif"/>
                </fig>
            </sec>
            <sec id="sec6">
                <title>Assessment of driver meQTL association with cancer risk and overall survival across tumor types</title>
                <p>We next evaluated the potential for driver meQTLs to have clinical relevance. A principal component analysis (PCA) was first conducted on the 156 driver meQTLs across individuals in the TCGA (Extended Data Figure 1).
                    <sup>
                        <xref ref-type="bibr" rid="ref61">61</xref>
                    </sup> The principal components (PCs) that explained more than 1% of the variance were assessed for association with clinical covariates by linear regression. We noted some association of PCs with tumor type, age at diagnosis and tumor stage at diagnosis, suggesting that cancer meQTLs could have tumor-type specific implications for risk and prognosis. Interestingly, further examining the 10 meQTLs with the strongest loadings in PCs correlated with tumor type, we found that the meQTLs disproportionately affected oncogenes, suggesting that tumor types differ more in oncogene effects than in tumor suppressor effects of DNA methylation.</p>
                <p>We first evaluated the driver meQTLs for cancer risk associations using the UKBioBank. In total, 86 of the 155 (1 SNP was not in the UKBioBank registry) driver meQTLs in the initial PheWAS analysis from UKBioBank patients showed a nominal association with one or more cancer ICD10 codes (p-value&lt;0.05) with 5 SNPs passing a Benjamini-Hochberg FDR threshold of 0.05 (
                    <xref ref-type="table" rid="T2">
Table 2</xref>). In total, meQTLs were associated with risk of 15 different cancer types as described by ICD10 codes (
                    <xref ref-type="table" rid="T3">
Table 3</xref>). We focused on C50-C50 (malignant neoplasm of the breast) as this tumor type had a large sample size in UKBioBank (n=11,188) and other large cohorts exist to support validation studies.</p>
                <table-wrap id="T2" orientation="portrait" position="float">
                    <label>
Table 2. </label>
                    <caption>
                        <title>List of meQTLs significantly affecting risk and survival in a pan-cancer model (Benjamini-Hochberg FDR&lt;0.05).</title>
                        <p>The beta value is the correlation coefficient of the meQTLs with DNA methylation at the promoter region of the probe gene. The TAD type that the meQTL resides is also represented.</p>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">rsid</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">SNP</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">p-value
</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">TAD type</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Probe gene</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">
Risk/Survival</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">rs6500442</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">16:89828862:T:C</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Active</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">FANCA</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Survival</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">rs36083956</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">16:74679883:C:T</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Boundary-Active
</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">RFWD3</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Survival</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">rs1163248</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">10:104896563:A:G</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Neither</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">NT5C2</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Survival</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">rs1006548</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">16:89844043:T:C</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Active</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">FANCA</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Survival</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">rs8047581</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">16:89884502:C:T</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Active</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">FANCA</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Survival</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">rs17581498</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">17:73794047:G:T</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Inter-TAD
</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">H3F3B</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Survival</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">rs62051918</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">16:74613781:T:C</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Active</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">RFWD3</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Survival</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">rs3935784</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">16:74604841:G:A</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Active</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">RFWD3</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Survival</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">rs8046036</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">16:74552127:C:T</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0.00000000252</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Active</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">RFWD3</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Survival</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">rs36030784</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">2:178119204:A:C</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Inter-TAD
</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">NFE2L2</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Survival</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">rs1407920</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">9:10389328:C:G</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0.00000628</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Inter-TAD
</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">PTPRD</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Survival</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">rs1725213</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">7:5584599:A:G</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0.00000952</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Active</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">RAC1</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Survival</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">rs11859725</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">16:74384296:C:T</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Inter-TAD
</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">RFWD3</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Survival</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">rs4265826</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">16:74723707:A:G</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0.0000000519</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Neither</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">RFWD3</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Survival</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">rs12441344</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">15:67447895:A:G</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0.000000629</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Inter-TAD
</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">SMAD3</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Survival</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">rs200282</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">16:74222799:C:G</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Inter-TAD
</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">RFWD3</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Survival</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">rs6679323</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">1:15914135:A:G</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Active</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">CASP9</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Survival</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">rs3743861</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">16:89818340:G:C</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0.0000000349</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Active</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">FANCA</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Survival</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">rs10999617</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">10:72723176:G:A</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0.00000886</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Inactive</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">PRF1</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Risk</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">rs12597188</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">16:68814826:G:A</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Inter-TAD
</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">CDH1</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Risk</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">rs7554885</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">1:18247811:G:T</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0.000000215</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Inter-TAD
</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">SDHB</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Risk</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">rs10845664</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">12:13043119:C:T</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0.000000288</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Inter-TAD
</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">CDKN1B</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Risk</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">rs741482</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">3:185903412:C:G</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">0.0000052</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Inter-TAD
</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">MAP 3K13</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Risk</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <table-wrap id="T3" orientation="portrait" position="float">
                    <label>
Table 3. </label>
                    <caption>
                        <title>The ICD 10 code.</title>
                        <p>The ICD 10 code used by UKBioBank is shown alongside their definitions for the risk analysis.</p>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">ICD 10 Codes</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">
Definitions</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">C00-C14</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Malignant neoplasms of lip, oral cavity and pharynx</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">C15-C26</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Malignant neoplasms of digestive organs</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">C30-C39</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Malignant neoplasms of respiratory and intrathoracic organs</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">C40-C41</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Malignant neoplasms of bone and articular cartilage</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">C43-C44</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Melanoma and other malignant neoplasms of skin</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">C45-C49</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Malignant neoplasms of mesothelial and soft tissue</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">C50-C50</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Malignant neoplasms of breast</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">C51-C58</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Malignant neoplasms of female genital organs</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">C60-C63</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Malignant neoplasms of male genital organs</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">C64-C68</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Malignant neoplasms of urinary tract</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">C69-C72</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Malignant neoplasms of eye, brain and other parts of central nervous system</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">C73-C75</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Malignant neoplasms of thyroid and other endocrine glands</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">C76-C80</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Malignant neoplasms of ill-defined, other secondary and unspecified sites</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">C7A-C7A</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Malignant neuroendocrine tumors</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="middle">C7B-C7B</td>
                                <td align="left" colspan="1" rowspan="1" valign="middle">Secondary neuroendocrine tumors</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <p>To further assess the relevance of driver meQTLs to cancer risk, we used them to predict breast cancer status alongside clinical covariates using the approach described by Elgart 
                    <italic toggle="yes">et al</italic>.
                    <sup>
                        <xref ref-type="bibr" rid="ref17">18</xref>
                    </sup> We first performed feature selection by LASSO on nominally significant driver meQTLs and available clinical factors (age, ancestry as represented by the top 10 genotype-derived PCs); LASSO regularization removed ancestry and some meQTLs. Selected features were then used to train an XGBoost classifier on 189,022 examples derived from UKBioBank breast cancer cases and non-cancer controls (
                    <xref ref-type="sec" rid="sec8">Methods</xref>). The score resulting from the trained XGBoost model was used as the PRS. We applied the trained model to predict breast cancer status for individuals in the DRIVE dataset, comprising 26,374 breast cancer cases and 32,428 controls (ROC AUC: 0.5534, 95%CI [0.5505, 0.5563]). The distribution of PRS values across cases was significantly higher than controls for the breast cancer outcome, as expected (Mann-Whitney
 U, p-value&lt;0.001) (
                    <xref ref-type="fig" rid="f4">
Figure 4A</xref>). In both UKBioBank and DRIVE datasets, the incidence of breast cancer was significantly higher among individuals in the upper 20% percentile of the PRS score versus the bottom 20% percentile (Fisher&#x2019;s exact test, UKBioBank: p=4.25&#x00d7;10
                    <sup>-7</sup>&lt;0.001, DRIVE: p=1.47&#x00d7;10
                    <sup>-13</sup>&lt;0.001), suggesting that a higher burden of meQTLs impacts breast cancer risk (
                    <xref ref-type="fig" rid="f4">
Figure 4B-C</xref>).</p>
                <fig fig-type="figure" id="f4" orientation="portrait" position="float">
                    <label>
Figure 4. </label>
                    <caption>
                        <title>XGBoost validation of breast cancer risk in DRIVE dataset.</title>
                        <p>(A) An XGBoost classifier trained to predict incidence of breast cancer in the UKBioBank, was applied to predict cancer risk in the DRIVE cohort. PRS scores provided by the model were higher for individuals diagnosed with breast cancer (Mann-Whitney U p=2.4&#x00d7;10
                            <sup>-19</sup>). (B-C) Plots showing the odds ratio of a breast cancer diagnosis across 10% quantiles of the XGBoost predicted PRS in the UKBioBank and DRIVE cohorts respectively. Risk increased from a hazards ratio of ~0.8 to ~1.1 between 0th and 90th PRS percentiles, supporting that cancer meQTLs impact breast cancer risk. C50-C50: ICD10 code for malignant neoplasms of the breast.</p>
                    </caption>
                    <graphic id="gr4" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/185383/8128d842-6ae8-46b1-95f2-3f857541e4f7_figure4.gif"/>
                </fig>
                <p>We extracted feature importances from the UKBioBank-trained PRS to better understand the driver meQTLs underlying breast cancer risk (
                    <xref ref-type="fig" rid="f5">
Figure 5A</xref>). Overall, cancer meQTLs near 29 cancer genes were included in the model. The most predictive driver meQTL was associated MSH2, a gene associated with Lynch syndrome and increased risk of breast cancer.
                    <sup>
                        <xref ref-type="bibr" rid="ref18">19</xref>
                    </sup> Polymorphic variation affecting the expression of EZH2, the second most informative feature, has also been linked to breast cancer risk.
                    <sup>
                        <xref ref-type="bibr" rid="ref19">20</xref>
                    </sup> ASXL2 may be required for estrogen receptor alpha (ERa) activation in ERa positive breast cancers.
                    <sup>
                        <xref ref-type="bibr" rid="ref20">21</xref>
                    </sup> Notably, EZH2 overexpression has been linked more strongly to triple negative breast cancer
                    <sup>
                        <xref ref-type="bibr" rid="ref21">22</xref>
                    </sup> suggesting that the model includes features predictive of multiple subtypes. More direct mechanistic insight might be gained by studying expression, genotype and methylation in healthy and pre-cancerous breast tissues and cell types. Studying the average expression of MSH2, EZH2, and ASXL2 within TCGA patients stratified by meQTL risk PRS suggested a potential decrease in expression of ASXL2 and EZH2 from in the highest PRS quantile relative to the lowest while MSH2 did not show much difference (
                    <xref ref-type="fig" rid="f5">Figure 5B</xref>). However, this difference needs to be studied further with more specific tumor sub-type stratification and cell type-specific expression. Indeed, classic polygenic risk scores for breast cancer have shown bias for predicting certain subtypes.
                    <sup>
                        <xref ref-type="bibr" rid="ref58">23</xref>
                    </sup> Lakeman 
                    <italic toggle="yes">et al.</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref59">24</xref>
                    </sup> demonstrated that women in the highest 1% of risk showed a 4.37-fold increased risk for ER-positive disease but only a 2.78-fold increased risk for ER-negative disease compared to the middle quintile showing bias in certain subtypes.</p>
                <fig fig-type="figure" id="f5" orientation="portrait" position="float">
                    <label>
Figure 5. </label>
                    <caption>
                        <title>Feature importances for breast cancer risk classifier.</title>
                        <p>A) Features are ranked according to their contribution to classifier predictive performance. Total importances sum to 1. B) Average expression of ASXL2, EZH2 and MSH2 in TCGA breast cancer samples, stratified by PRS quantile.</p>
                    </caption>
                    <graphic id="gr5" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/185383/8128d842-6ae8-46b1-95f2-3f857541e4f7_figure5.gif"/>
                </fig>
                <p>Finally, we evaluated the implications of driver meQTLs for prognosis. We first removed one meQTL, 2:209220238:C:G, that had a minor allele frequency &lt;1% across TCGA samples, then conducted a Kaplan-Meier analysis for the remaining meQTLs separately for each tumor type with at least 100 samples. Out of the 155 SNPs, 21 passed the Benjamini-Hochberg adjusted FDR of less than 0.05 (
                    <xref ref-type="table" rid="T2">
Table 2</xref>). To assess overall contribution of driver meQTLs to survival, we built polygenic survival scores (PSS) using XGBoost and incorporated them into Cox proportional hazards (PH) models alongside relevant covariates. Here we only evaluated tumor types that had at least 5 SNPs implicated as nominally significant by Kaplan-Meier analysis (n=23 tumor types). Nominally significant driver meQTLs for each tumor type were subjected to selection by LASSO and used to train XGBoost models to predict binary survival outcome (binarized based on median time to an event) separately for each tumor type. Out of the 23 tumor types, 13 had a higher XGBoost classification AUC value when both SNPs and clinical covariates were combined as compared with using only clinical covariates. These included BLCA, BRCA, PAAD, PRAD, UCEC, OV, STAD, SKCM, PCPG, LUSC, KIRC, HNSC and ESCA. This suggests that for these cases, meQTLs contributed survival-relevant information beyond the covariates (
                    <italic toggle="yes">i.e.</italic> age, sex, tumor stage in some cases). For these tumor types, we trained XGBoost models using only meQTLs to obtain tumor-type specific polygenic survival scores (PSS) that were then included alongside covariates (tumor stage, age at diagnosis and sex) in Cox PH models to predict overall survival time in months (
                    <xref ref-type="sec" rid="sec8">Methods</xref>).</p>
                <p>PSS values made a significant contribution to predicting overall survival time for all cancer types except BRCA and SKCM (
                    <xref ref-type="fig" rid="f6">
Figure 6</xref>). PSS had the highest hazard ratios compared to other covariates for most cancer types, including: ESCA, BLCA, KIRC, LUSC, OV, PAAD, PCPG, PRAD, STAD, UCEC. PSS was also predictive of disease free interval in KIRC, PCPG, LUSC, HNSC and UCEC (Extended Data Figure 2).
                    <sup>
                        <xref ref-type="bibr" rid="ref62">62</xref>
                    </sup> Most covariates behaved as expected in the analysis with tumor stage having one of the highest odds ratios. However, it is difficult to assess the generalizability of the estimated effect sizes in the absence of independent validation cohorts with both genotype and survival measured in the same cancer types. Nonetheless, to further investigate the prognostic implications of driver meQTLs, we analyzed their feature importances in their respective XGBoost models (
                    <xref ref-type="fig" rid="f7">
Figure 7</xref>). The number of meQTLs contributing to tumor type specific PSS ranged from 2 to 12, often with 1 or 2 meQTLs dominating the model.</p>
                <fig fig-type="figure" id="f6" orientation="portrait" position="float">
                    <label>
Figure 6. </label>
                    <caption>
                        <title>CoxPH Hazard Ratios and 95% confidence interval of PSS and covariates in TCGA overall survival.</title>
                        <p>The hazard ratios and 95% confidence intervals associated with various covariates are shown across 13 cancer types: BLCA, BRCA, PAAD, PRAD, UCEC, OV, STAD, SKCM, PCPG, LUSC, KIRC, HNSC, ESCA. Due to limitations in availability of data some tumor types lacked covariates like tumor stage. Sex was excluded for tumors that only occur in males or females. ER: Estrogen receptor, PR: Progesterone Receptor.</p>
                    </caption>
                    <graphic id="gr6" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/185383/8128d842-6ae8-46b1-95f2-3f857541e4f7_figure6.gif"/>
                </fig>
                <fig fig-type="figure" id="f7" orientation="portrait" position="float">
                    <label>
Figure 7. </label>
                    <caption>
                        <title>Feature importance of SNPs in XGBoost polygenic survival scores.</title>
                        <p>A heatmap of the feature importances of SNPs for the cancer type specific XGBoost survival classifiers is shown. For each model across the 13 tumor types, the feature importances sum to 1 with red demonstrating larger importance of a SNP and blue demonstrating lesser importance.</p>
                    </caption>
                    <graphic id="gr7" orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/185383/8128d842-6ae8-46b1-95f2-3f857541e4f7_figure7.gif"/>
                </fig>
                <p>Focusing on the most informative tumor type-associated meQTLs, we investigated the relevance of the associated oncogenes to cancer progression. In many cases, the identified genes were supported by previous studies. For example, PTPRD loss in melanoma was shown to cause disruption of desmosomes, resulting in increased invasive potential.
                    <sup>
                        <xref ref-type="bibr" rid="ref22">25</xref>
                    </sup> Polymorphisms in exonuclease ERCC2 have also been found to modify melanoma prognosis
                    <sup>
                        <xref ref-type="bibr" rid="ref23">26</xref>
                    </sup> and have been linked to prostate cancer progression as well.
                    <sup>
                        <xref ref-type="bibr" rid="ref24">27</xref>
                    </sup> In pancreatic cancer, RFWD3 expression quantitative trait loci (eQTLs) are associated with survival.
                    <sup>
                        <xref ref-type="bibr" rid="ref25">28</xref>
                    </sup> RFWD3 is an E3 protein ubiquitin ligase important for DNA damage and has been shown to stabilize TP53 in response to DNA damage.
                    <sup>
                        <xref ref-type="bibr" rid="ref26">29</xref>
                    </sup> We note that RFWD3 meQTLs were among the informative features for many other tumor types as well (
                    <xref ref-type="fig" rid="f7">
Figure 7</xref>). RAC1 has previously been shown to determine the metastatic potential of renal cell carcinoma (KIRC).
                    <sup>
                        <xref ref-type="bibr" rid="ref27">30</xref>
                    </sup> Reduced expression of CDKN1B is a known risk factor for PCPG and is common in this disease but usually cannot be explained by somatic alterations, though cases of allelic imbalance have been noted.
                    <sup>
                        <xref ref-type="bibr" rid="ref28">31</xref>
                    </sup> CASP9 promoter polymorphisms confer increased risk of breast cancer
                    <sup>
                        <xref ref-type="bibr" rid="ref29">32</xref>
                    </sup> and higher expression of CASP9 was associated with better survival.
                    <sup>
                        <xref ref-type="bibr" rid="ref30">33</xref>
                    </sup> Downregulation of ERCC5 is associated with longer progression free survival in ovarian cancer treated with platinum therapy
                    <sup>
                        <xref ref-type="bibr" rid="ref31">34</xref>
                    </sup> as is the case for OV in TCGA. In head and neck cancer, the most informative driver meQTL was associated with ETNK1, a cancer gene more commonly associated with myeloid neoplasms
                    <sup>
                        <xref ref-type="bibr" rid="ref32">35</xref>
                    </sup> though there is increasing evidence that it may contribute to dysregulation of phospholipid metabolism in multiple tumor types.
                    <sup>
                        <xref ref-type="bibr" rid="ref33">36</xref>
                    </sup>
                </p>
            </sec>
        </sec>
        <sec id="sec7" sec-type="discussion">
            <title>Discussion</title>
            <p>There is an increasing appreciation that both genome structure
                <sup>
                    <xref ref-type="bibr" rid="ref34">37</xref>
                </sup>
                <sup>&#x2013;</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref38">41</xref>
                </sup> and common genetic variants
                <sup>
                    <xref ref-type="bibr" rid="ref46">16</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref39">42</xref>
                </sup>
                <sup>&#x2013;</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref45">48</xref>
                </sup> modify to the potential for carcinogenesis. However, the interplay between these factors is not well understood. To start to understand this, we investigated the relationship between the cancer meQTLs recently reported by Gong 
                <italic toggle="yes">et al.</italic>, and 3D genome structure in the form of TADs. To determine the relevance to cancer, we further investigated cancer meQTLs near driver genes for potential to modify cancer risk and progression. We took advantage of a recently introduced modeling strategy that first performs feature selection on a set of nominally associated SNPs, then trains a non-linear XGBoost model based on those features.
                <sup>
                    <xref ref-type="bibr" rid="ref17">18</xref>
                </sup> Feature importances can be extracted from the trained model to gain insight as to which features were most influential, suggesting biological hypotheses that can be further investigated.</p>
            <p>We observed higher levels of promoter methylation in inactive versus active TADs, slightly more meQTLs in active TADs and higher densities of meQTLs in boundary regions proximal to inactive versus active TADs. Furthermore, analyzing meQTL distribution across TAD boundaries revealed a non-uniform pattern, suggesting that TAD boundaries affected distributional burden of meQTLs. It is of note that TAD boundaries conserved across cell types are reportedly highly enriched for evolutionary constraint and complex trait heritability.
                <sup>
                    <xref ref-type="bibr" rid="ref10">10</xref>
                </sup> Our data suggest that variability in gene expression due to meQTLs is also evolutionarily more constrained in and around active TADs and their boundaries, consistent with these TAD boundaries playing a critical role in development.
                <sup>
                    <xref ref-type="bibr" rid="ref47">49</xref>
                </sup> These results may suggest that TAD boundaries play a role in making the recruitment of regulatory machinery more specific, particularly as it pertains to DNA methylation.</p>
            <p>Interestingly, we found that meQTLs associated with driver genes showed patterns of enrichment or depletion in a manner dependent on the activity state of the TAD in which the meQTLs occurred. Investigating cancer meQTLs, which are polymorphic sites that associate with differences in the level of DNA methylation found in tumors, showed depletion for germline meQTLs affecting oncogenes but enrichment for such meQTLs affecting tumor suppressor genes in active TADs. This could suggest that the potential to modulate tumor suppressor gene expression through methylation is evolutionarily advantageous whereas modulating oncogene expression by promoter methylation may be less so. These trends point to evolutionary constraints on the distribution of meQTLs imposed by 3D genome architectures and that could set the stage for genomic vulnerabilities to later malignancy.</p>
            <p>Focusing on meQTLs near known driver genes, we evaluated the potential of meQTLs to modify cancer risk or progression. We found a number of meQTLs associated with survival in the UKBioBank and were able to validate a polygenic score constructed from these meQTLs in the independent DRIVE cohort. The inclusion of genes linked to distinct breast cancer subtypes among the features that most contributed to classifier performance suggests that cancer meQTLs may differentially affect risk of developing different forms of breast cancer and raises the possibility that subtype-specific meQTL-based risk classifiers may outperform a generic model. The meQTLs most strongly predictive of prognosis tended to occur near cancer genes that were also associated with risk or prognosis in the same tumor type. However, we saw cases such as ETNK1 in head and neck cancer, where meQTLs implicated a gene that has not been considered a factor promoting progression. This could point to a new therapeutic opportunity in this disease. Further studies are merited to determine whether the observed associations result from meQTLs being in linkage with eQTLs or coding variants that contribute to risk or progression, or whether meQTLs themselves make it easier or more difficult for genes to be modulated through DNA methylation. Interestingly, we noted multiple independent meQTLs for the same cancer gene were informative in predictive models. This suggests that at least in some cases, the cumulative burden of meQTLs near driver genes could further alter gene function to exacerbate risk or progression. While we focused on cancer genes, other studies have more broadly implicated meQTLs in cancer survival, supporting expanded analyses in the future.</p>
            <p>There are a few limitations for this study. First, the meQTLs utilized for this study are derived from a study of tumors
                <sup>
                    <xref ref-type="bibr" rid="ref46">16</xref>
                </sup> which could be biased toward detecting meQTLs associated with DNA methylation events that are positively selected in tumors. For risk prediction, we focused on meQTLs and their corresponding CpG probes that are overlapping the promoter regions of known cancer genes, however we cannot be sure that these meQTLs are not also affecting other genes in the region, for example through effects on enhancer activity. Second, once focusing on specific tumor types, the number of samples available to predict prognosis is relatively small, and some samples were missing tumor stage or age at diagnosis data, key clinical features for survival prediction. In addition, we lacked independent cohorts to validate the generalizability of polygenic survival scores based on meQTLs, which could lead to overfitting in some of our results as suggested by the large hazard ratios observed in CoxPH analysis. This validation should be a priority as suitable data sets become available. We also made a few assumptions. We only considered common TADs across multiple human cell lines which could have potentially removed some important cell-type specific TAD domains, though our methodology follows what other studies
                <sup>
                    <xref ref-type="bibr" rid="ref11">11</xref>
                </sup>
                <sup>,</sup>
                <sup>
                    <xref ref-type="bibr" rid="ref47">49</xref>
                </sup> have done. For predicting prognosis, we made the assumption that TAD domains from healthy human cell lines would also apply to cancer patients and thus avoided events where TAD structure could change. We justified our decision through previous studies determining TAD domains are overwhelmingly similar across cancer and noncancer patients.
                <sup>
                    <xref ref-type="bibr" rid="ref47">49</xref>
                </sup> In future studies, it would be of interest to study meQTL trends in normal tissue samples to see if enrichment patterns associated with cancer genes are driven by selection in tumors, or highlight evolutionary constraints more broadly associated with human health that coincidentally are advantageous for tumor development.</p>
            <p>This study investigated the relationship between epigenetic factors like chromatin structure and DNA methylation and genetic variation in the context of cancer, and established the potential for cancer gene associated meQTLs to uncover cancer-specific modifiers of risk and progression. There are also a number of non-genetic risk factors that act by modifying DNA methylation levels and which could interact with genetic regulation. These include aging, exercise, stress, diet and obesity, and a broad variety of environmental exposures. In our analysis, age had the highest impact on DNA methylation modulation, however, as age and sex were the only clinical factors for the majority of our study, future analysis of other non-genetic factors in relation to genetic regulators of DNA methylation are merited. Future efforts could integrate dynamic methylation changes due to these non-genetic factors with static polygenic scores such as we describe here to provide a more accurate estimate of risk. This type of approach could benefit in particular from non-invasive biomarkers, such as cell free DNA methylation from blood, though studies will be needed to establish the cumulative effect of dynamic exposures and the extent to which they can be accurately evaluated from cell free DNA.
                <sup>
                    <xref ref-type="bibr" rid="ref60">50</xref>
                </sup>
            </p>
        </sec>
        <sec id="sec8" sec-type="methods">
            <title>Methods</title>
            <sec id="sec9">
                <title>TCGA and promoter data</title>
                <p>TCGA meQTLs data were obtained from Gong 
                    <italic toggle="yes">et al.</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref46">16</xref>
                    </sup> TCGA outcome and survival data alongside RNA-seq expression data were obtained from the pan-can atlas, Liu 
                    <italic toggle="yes">et al.</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref48">51</xref>
                    </sup> Illumina 450k DNA methylation data were also obtained from the TCGA pan-cancer atlas.
                    <sup>
                        <xref ref-type="bibr" rid="ref48">51</xref>
                    </sup> The promoter data was obtained from the ENCODE Screen pipeline.
                    <sup>
                        <xref ref-type="bibr" rid="ref49">52</xref>
                    </sup>
                    <sup>,</sup>
                    <sup>
                        <xref ref-type="bibr" rid="ref50">53</xref>
                    </sup>
                </p>
            </sec>
            <sec id="sec10">
                <title>UKBioBank data</title>
                <p>Genotypes and ICD10 codes were obtained for 394,034 samples across 40 ICD 10 codes from the UK BioBank.
                    <sup>
                        <xref ref-type="bibr" rid="ref51">54</xref>
                    </sup> For the C50-C50 analysis, only exclusive cases and controls were considered: patients who were only diagnosed with the breast neoplasm were compared with controls who were not diagnosed for any neoplasm. This reduced the sample size to 189,022 for the breast cancer risk analysis.</p>
            </sec>
            <sec id="sec11">
                <title>DRIVE breast cancer data</title>
                <p>Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE) (dbGaP Study Accession: phs001265.v1.p1)
                    <sup>
                        <xref ref-type="bibr" rid="ref52">55</xref>
                    </sup> was used to validate the risk outcome analysis of our XGBoost model. There were 60,231 breast cancer cases and controls with genotype data alongside outcome, age, and ancestry principal components.</p>
            </sec>
            <sec id="sec12">
                <title>TAD identification and clustering based on chromHMM and DNA methylation</title>
                <p>Topologically associating domain (TAD) regions from the GM12878, HMEC, HUVEC, IMR90, and NHEK cell lines were downloaded from Rao 
                    <italic toggle="yes">et al.</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref12">12</xref>
                    </sup> and only common TAD domains using a 20% overlap algorithm described previously across all 5 cancer cell lines were considered for the rest of the analysis. TAD domains were characterized into 5 clusters: &#x201c;Active-1&#x201d;, &#x201c;Active-2&#x201d;, &#x201c;Inactive-1&#x201d;, &#x201c;Inactive-2&#x201d;, and &#x201c;Mixed&#x201d; through K-means clustering and use of a 15-chromatin state model derived from the Roadmap Epigenomics Project.
                    <sup>
                        <xref ref-type="bibr" rid="ref53">56</xref>
                    </sup> For most of the analysis, the two active and two inactive groups were combined for simpler visualization and mixed regions were ignored due to their biological ambiguity. The boundary of each TAD was considered as the 50 kb region upstream and downstream of TAD endpoints (i.e. 100 kb long boundaries) with the exception of consecutive TADs that had a region in between smaller than 100k base pairs. For those cases, the boundary was considered as the proximal half of the region for each of the two TADs. This TAD boundary definition using a 100 kb boundary &#x00b1;50 kb upstream and downstream from the start and end of a TAD-is supported by previous literature.
                    <sup>
                        <xref ref-type="bibr" rid="ref10">10</xref>
                    </sup>
                </p>
                <p>DNA methylation levels were compared to TAD domains as follows. DNA methylation levels were summarized at promoters identified by the ENCODE&#x2019;s SCREEN pipeline for in human hg38. We compared the methylation beta values (i.e. the proportion of methylated region) using TCGA&#x2019;s DNA methylation data, and averaged these beta values for all promoter regions across Active 1, Active 2, Inactive 1, Inactive 2, and Mixed regions. The hypothesis that methylation levels in promoter regions of actively transcribed TADs would be lower than in inactive TADs was tested by a Kruskal-Wallis
 test.</p>
            </sec>
            <sec id="sec13">
                <title>meQTL distribution within TADs</title>
                <p>We retrieved 1,236,142 unique cis-meQTLs across 23 cancer types from the Pancan-meQTL database.
                    <sup>
                        <xref ref-type="bibr" rid="ref46">16</xref>
                    </sup> meQTLs were further clumped by linkage-disequilibrium (LD) to obtain independent associations using the PLINK
                    <sup>
                        <xref ref-type="bibr" rid="ref54">57</xref>
                    </sup> clumping function using association p-values derived from the Pancan-meQTL database as input and default parameters (p1=0.0001, p2=0.01, r
                    <sup>2</sup>=0.5, kb=250). These clumped, independent meQTLs were used for all subsequent analyses. First, the burden of clumped meQTLs across Active, Inactive, and Mixed TAD regions was measured. The burden was normalized by the length in base pairs of each region. To understand how meQTLs are distributed across the genome and whether TADs have an effect on the distribution of meQTLs, we analyzed the distributional burden of meQTLs within consecutive TADs. We compared the average meQTL density across different TAD transitions (i.e. Active-Boundary-Active, Active-Boundary-Inactive, Inactive-Boundary-Active and Inactive-Boundary-Inactive) by binning the genome between two TADs into 40 equal-sized bins and calculating average burden of meQTLs within these bins normalized by the bin size in base pairs. Resulting graphs were smoothed by a rolling average for visualization purposes. To evaluate whether the distribution reflected an association with transitions in TAD activity status, we shuffled the labels (i.e. &#x201c;Active&#x201d;, &#x201c;Inactive&#x201d;, etc.) of the TADs while preserving the number of transition categories (i.e. &#x201c;Active-Active&#x201d;, &#x201c;Inactive-Active&#x201d;, etc.) 100 times and ran the distribution analysis again on these randomly shuffled TADs by taking an average over all trials. Significance was assessed by comparing the observed difference in density between the TADs to the 100 average randomized trials using a student t-test.</p>
            </sec>
            <sec id="sec14">
                <title>Randomized distribution of cancer-gene-clumped meQTLs</title>
                <p>Clumped meQTLs were annotated according to LD with CpG probes located in the promoter regions of cancer driver genes including oncogenes and tumor suppressor genes (TSGs) from the COSMIC database.
                    <sup>
                        <xref ref-type="bibr" rid="ref16">17</xref>
                    </sup> A total of 231 oncogenes and TSGs were used for this analysis and promoter regions used were those identified by ENCODE&#x2019;s SCREEN pipeline.
                    <sup>
                        <xref ref-type="bibr" rid="ref55">58</xref>
                    </sup> To evaluate whether active/inactive TADs or boundary regions harboring cancer genes showed enrichment or depletion for meQTLs, we conducted a randomization analysis with 1000 trials. In each trial, we chose a random sample of meQTLs associated with non-cancer genes with matching minor allele frequency (&#x00b1;5%) to the set cancer-gene associated meQTLs, while also matching the number of randomly sampled meQTLs. We then mapped genes with meQTLs to active or inactive TADs and TAD boundaries, summed the meQTLs in each and normalized by the size of the region. The standard error was plotted alongside the true burden to see if the burden across TADs is significantly different from random.</p>
            </sec>
            <sec id="sec15">
                <title>Correlation of meQTL profiles with clinical characteristics in TCGA</title>
                <p>We conducted a principal component analysis of TCGA genotype at the 156 meQTLs in European ancestry samples (n=8217), evaluating association of meQTL genotype-based PCs with clinical covariates. meQTL SNPs were quantified by the number of minor alleles carried (0, 1, 2). PCs explaining more than 1% of the genotypic variance across individuals were regressed with clinical variables including sex, age at diagnosis, tumor stage, and tumor type.</p>
            </sec>
            <sec id="sec16">
                <title>Machine-learning for meQTL-based risk and survival prediction</title>
                <p>For both risk and survival analysis, we used a synthesis of LASSO regularization as a feature selector and XGBoost classifier as the machine learning predictor, described fully in Elgart 
                    <italic toggle="yes">et al.</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref17">18</xref>
                    </sup> Specifically, after a preliminary association analysis, SNPs achieving a nominal p-value&lt;0.05 were further selected by LASSO, and the selected SNPs were used to train an XGBoost model on a predictive task (e.g. cancer versus no cancer for risk, or high survival or low survival at median overall survival time), using a set of training samples. The probabilities achieved from the XGBoost classifier were then used to create a polygenic risk score (PRS) or polygenic survival score (PSS). Predictive performance was evaluated using cross validation for survival analysis and using an independent cohort of matched tumor types for the risk analysis.</p>
            </sec>
            <sec id="sec17">
                <title>UKBioBank risk</title>
                <p>To determine the association of meQTLs with risk of developing cancer, we conducted a phenome-wide association study (PheWAS) for each meQTL using the PLATO
                    <sup>
                        <xref ref-type="bibr" rid="ref56">59</xref>
                    </sup> software. The genotype and phenotype data of 487,409 patients harboring the 156 cancer-related clumped meQTLs was retrieved from the UKBioBank
                    <sup>
                        <xref ref-type="bibr" rid="ref51">54</xref>
                    </sup> and genotype at each meQTL was evaluated for association with all cancer phenotypes while controlling for covariates including age and ancestry. Individuals with multiple cancer diagnoses were excluded from the analysis, leaving 189,022 patients for risk analysis.</p>
            </sec>
            <sec id="sec18">
                <title>UKBioBank PRS construction and breast cancer drive validation</title>
                <p>Nominally significant SNPs (p-value&lt;0.05) were used for polygenic risk modeling with LASSO plus XGBoost. Out of the resulting tumor types where meQTLs were associated with risk we pursued breast (ICD-10: C50-C50) due to the abundance of validation data. Of the 189,022 UKBioBank individuals analyzed, 177,834 and 11,188 patients were non-cancer controls and breast cancer cases, respectively. An initial 10% quantile plot from the PheWAS analysis in UKBioBank was created using the PRS with the odds ratio for C50-C50 to compare the odds ratio of the 0th quantile PRS group to the 90th quantile PRS group.</p>
                <p>To create a polygenic risk score (PRS) we utilized the approach described above under &#x201c;Machine-learning for meQTL-based risk and survival prediction&#x201d; section. Out of the tumor types that had nominally significant (p&lt;0.05) risk-related SNPs (i.e.C64-C68, C40-C41, C69-C72, C00-C14, C15-C26, C81-C96, C50-C50, C43-C44, C45-C49, C76-C80, C60-C63, C51-C58, C97-C97, C73-C75, C30-C39), we chose to validate this relationship on an external cohort, DRIVE, on the C50-C50 or the breast cancer outcome due to an abundance of validation data. Similar to the survival analysis, we considered SNPs nominally associated with cancer risk using the associations from the PheWAS (p&lt;0.05) for the rest of the analysis. We included other covariates including age and the first 10 principal components to represent population substructure in UKBioBank. Due to the class imbalance of the UKBioBank cohort (10,840 cases, 94,871 controls), we oversampled the cases to obtain a 1:1 case control ratio, resulting in a dataset size of 189,742 rows. Furthermore, we only included samples without any neoplasm diagnosis as controls to minimize confounding by other tumor types.</p>
                <p>We first trained our XGBoost classification model on the entirety of the UKBioBank dataset. First the UKBioBank cohort (i.e. training cohort) was inputted into a LASSO regression model with 
                    <inline-formula>

                        <mml:math display="inline">
                            <mml:mi>&#x03b1;</mml:mi>
                        </mml:math>
</inline-formula>=0.001 (based on Ref. 
                    <xref ref-type="bibr" rid="ref17">18</xref>) to predict the intended phenotype. SNPs were further filtered to remove those that had a LASSO coefficient of 0. The modified cohort was used to train an XGBoost model on the filtered feature set using the entire UKBioBank cohort (n_estimators=500, learning_rate=0.1, max_depth=9). The probability of trees voting for either class (i.e. 0: no cancer, 1:cancer) was used as a polygenic risk score. We validated the breast cancer risk association of meQTLs alongside covariates using the Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE
                    <sup>
                        <xref ref-type="bibr" rid="ref52">55</xref>
                    </sup>) validation cohort. This validation cohort consists of 32,428 controls and 26,374 breast cancer cases for a total of 58,802 patients. Before validating, we mapped the MAF values of the SNPs in UKBioBank and DRIVE, and removed SNPs with MAF values of 2 standard deviations away from one another. PRS scores were predicted based on individual genotypes in DRIVE using the UKBioBank-trained XGBoost model (as described in Ref. 
                    <xref ref-type="bibr" rid="ref17">18</xref>). We compared score distributions across case and control in DRIVE using a Mann-Whitney U test. We also compared the incidence of breast cancer by partitioning the UKBioBank and DRIVE probabilities into 10% quantiles on PRS score. We plotted the 10% quantiles using the min-max normalized XGBoost-derived PRS scores.</p>
            </sec>
            <sec id="sec19">
                <title>Prediction of survival time in TCGA tumor types</title>
                <p>Survival was modeled separately for each of 20 tumor types in TCGA (BLCA, CESC, KIRC, KIRP, PAAD, BRCA, HNSC, LGG, SKCM, PRAD, OV, UCEC, THCA, LUAD, LUSC, COAD, STAD, LIHC, SARC). Cancer meQTLs were included in predictive modeling if they were present with at least 1% minor allele frequency in the specific tumor type, and nominally significant in Kaplan-Meier analysis. Tumor types where fewer than 5 meQTLs showed a nominal association with overall survival or had less than 100 patients in TCGA were excluded from the analysis. For the remaining tumor types, we divided the analysis into three categories: clinical group containing only clinical features including sex, age, and tumor stage in certain cancer types (i.e. only cancer types &gt;100 patients with non-null tumor stage contained stage as a covariate), control group and SNPs, and SNPs exclusively. For each of the categories, SNPs were selected by LASSO then used the complete dataset to train an XGBoost model, using 5-fold cross validation to estimate the generalization error and generate an AUC value. Specifically, for each individual we simplified the genotypes to a binary feature valued 1 if the patient had the heterozygous or homozygous meQTL allele and 0 if they didn&#x2019;t. Binarized genotypes were then z-score normalized and input into a LASSO regularization model (&#x03b1;=0.001). Features with a LASSO coefficient of 0 (i.e. non-informative features) were removed and the LASSO-filtered SNP set was used to train an XGBoost classifier (n_estimators=500, learning_rate=0.1, max_depth=9) to predict binarized median overall survival (OS, 1=low survival&lt;median survival, 0=high survival&gt;median survival). Cancer types with a higher AUC value in the clinical+SNP group compared to the clinical group were only considered for the SNP only analysis. A higher AUC on the combined group could suggest that SNPs bring additive information. The output of the SNP-only XGBoost model used a non-linear polygenic survival score (PSS). Before inputting into the Cox, the PSS was scaled using the min-max algorithm and outliers were removed using a 1.5*(interquartile range) threshold.</p>
            </sec>
            <sec id="sec20">
                <title>Cox proportional hazard using PSS</title>
                <p>We used Cox proportional hazards models to evaluate the meQTL-based PSS as a predictor of survival interval across tumor types in TCGA. We combined the PSS with clinical features including sex, age at diagnosis and tumor stage in a multivariable Cox-proportional hazards model to predict OS, and evaluated the hazard ratios and 95% confidence intervals for each covariate. We repeated this for disease free interval (DFI).</p>
            </sec>
        </sec>
        <sec id="sec22">
            <title>Author contributions</title>
            <p>Original concept by SG and MP. HC supervised the project. SG performed computational data processing and analysis. MP, AK, JT provided support with data set preparation and contributed to computer code. SG, HC wrote the manuscript.</p>
        </sec>
    </body>
    <back>
        <sec id="sec25" sec-type="data-availability">
            <title>Data availability</title>
            <sec id="sec26">
                <title>Source data</title>
                <p>Data were obtained from public sources including The Cancer Genome Atlas (TCGA; dbGaP: 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000178.v11.p8">phs000178.v11.p8</ext-link>) and Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE; dbGaP: 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001265.v1.p1">phs001265.v1.p1</ext-link>). dbGaP requires an application to access data; applicants will need to create an eRA Commons account and begin a project request. Senior Investigators and NIH Investigators are eligible to apply to access.</p>
                <p>We use data from the 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ukbiobank.ac.uk/">UKBiobank</ext-link> resource under application number 37671 for this work. All bona fide researchers can apply to use the UK Biobank resource for health-related research that is in the public interest. Further information on the application process is available from the UK Biobank website.</p>
                <p>meQTLs were obtained from Gong 
                    <italic toggle="yes">et al.</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref46">16</xref>
                    </sup> (
                    <ext-link ext-link-type="uri" xlink:href="http://bioinfo.life.hust.edu.cn/Pancan-meQTL/">http://bioinfo.life.hust.edu.cn/Pancan-meQTL/</ext-link>). TADs were obtained from Rao 
                    <italic toggle="yes">et al.</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref12">12</xref>
                    </sup> (
                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.cell.2014.11.021">https://doi.org/10.1016/j.cell.2014.11.021</ext-link>).</p>
            </sec>
        </sec>
        <sec id="sec21">
            <title>Software availability</title>
            <p>Source code available from: 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/cartercompbio/meQTLs">https://github.com/cartercompbio/meQTLs</ext-link>.</p>
            <p>Archived source code at time of publication: 
                <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.8168488">https://doi.org/10.5281/zenodo.8168488</ext-link>.
                <sup>
                    <xref ref-type="bibr" rid="ref57">60</xref>
                </sup>
            </p>
            <p>License: 
                <ext-link ext-link-type="uri" xlink:href="https://opensource.org/licenses/MIT">MIT</ext-link>.</p>
            <sec id="sec1.1">
                <title>Extended data</title>
                <p>Extended data Figure 1 can be found in Figshare at 
                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.6084/m9.figshare.29610992.v1">https://doi.org/10.6084/m9.figshare.29610992.v1</ext-link>.
                    <sup>
                        <xref ref-type="bibr" rid="ref61">61</xref>
                    </sup>
                </p>
                <p>Extended data Figure 2 can be found in Figshare at 
                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.6084/m9.figshare.29610998.v1">https://doi.org/10.6084/m9.figshare.29610998.v1</ext-link>.
                    <sup>
                        <xref ref-type="bibr" rid="ref62">62</xref>
                    </sup>
                </p>
                <p>Data are available under the terms of the 
                    <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/deed.en">Creative Commons Attribution International License</ext-link> (CC BY 4.0)</p>
            </sec>
        </sec>
        <ack>
            <title>Acknowledgements</title>
            <p>We would like to acknowledge Rany M Salem for providing access to UKBioBank data and TJ Sears for helpful scientific discussion. This research has been conducted using the UK Biobank Resource under Application Number 37671. The results shown here are also based upon data generated by the TCGA Research Network: 
                <ext-link ext-link-type="uri" xlink:href="https://www.cancer.gov/tcga">https://www.cancer.gov/tcga</ext-link>. OncoArray genotyping and phenotype data harmonization for the Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE) breast-cancer case control samples was supported by X01 HG007491 and U19 CA148065 and by Cancer Research UK (C1287/A16563). Genotyping was conducted by the Center for Inherited Disease Research (CIDR), Centre for Cancer Genetic Epidemiology, University of Cambridge, and the National Cancer Institute. The following studies contributed germline DNA from breast cancer cases and controls: the Two Sister Study (2SISTER), Breast Oncology Galicia Network (BREOGAN), Copenhagen General Population Study (CGPS), Cancer Prevention Study 2 (CPSII), The European Prospective Investigation into Cancer and Nutrition (EPIC), Melbourne Collaborative Cohort Study (MCCS), Multiethnic Cohort (MEC), Nashville Breast Health Study (NBHS), Nurses Health Study (NHS), Nurses Health Study 2 (NHS2), Polish Breast Cancer Study (PBCS), Prostate Lung Colorectal and Ovarian Cancer Screening Trial (PLCO), Studies of Epidemiology and Risk Factors in Cancer Heredity (SEARCH), The Sister Study (SISTER), Swedish Mammography Cohort (SMC), Women of African Ancestry Breast Cancer Study (WAABCS), Women&#x2019;s Health Initiative (WHI).</p>
        </ack>
        <ref-list>
            <title>References</title>
            <ref id="ref1">
                <label>1</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Iyer</surname>
                            <given-names>JG</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Response rates and durability of chemotherapy among 62 patients with metastatic Merkel cell carcinoma.</article-title>
                    <source>

                        <italic toggle="yes">Cancer Med.</italic>
</source>
                    <year>2016</year>;<volume>5</volume>:<fpage>2294</fpage>&#x2013;<lpage>2301</lpage>.
                    <pub-id pub-id-type="pmid">27431483</pub-id>
                    <pub-id pub-id-type="doi">10.1002/cam4.815</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5055152</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref2">
                <label>2</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Gayther</surname>
                            <given-names>SA</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Variation of risks of breast and ovarian cancer associated with different germline mutations of the BRCA2 gene.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Genet.</italic>
</source>
                    <year>1997</year>;<volume>15</volume>:<fpage>103</fpage>&#x2013;<lpage>105</lpage>.
                    <pub-id pub-id-type="pmid">8988179</pub-id>
                    <pub-id pub-id-type="doi">10.1038/ng0197-103</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref3">
                <label>3</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Chequin</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Antitumoral activity of liraglutide, a new DNMT inhibitor in breast cancer cells in vitro and in vivo.</article-title>
                    <source>

                        <italic toggle="yes">Chem. Biol. Interact.</italic>
</source>
                    <year>2021</year>;<volume>349</volume>:<fpage>109641</fpage>.
                    <pub-id pub-id-type="pmid">34534549</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.cbi.2021.109641</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref4">
                <label>4</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Heyn</surname>
                            <given-names>H</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Linkage of DNA methylation quantitative trait loci to human cancer risk.</article-title>
                    <source>

                        <italic toggle="yes">Cell Rep.</italic>
</source>
                    <year>2014</year>;<volume>7</volume>:<fpage>331</fpage>&#x2013;<lpage>338</lpage>.
                    <pub-id pub-id-type="pmid">24703846</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.celrep.2014.03.016</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref5">
                <label>5</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Irizarry</surname>
                            <given-names>RA</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The human colon cancer methylome shows similar hypo- and hypermethylation at conserved tissue-specific CpG island shores.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Genet.</italic>
</source>
                    <year>2009</year>;<volume>41</volume>:<fpage>178</fpage>&#x2013;<lpage>186</lpage>.
                    <pub-id pub-id-type="pmid">19151715</pub-id>
                    <pub-id pub-id-type="doi">10.1038/ng.298</pub-id>
                    <pub-id pub-id-type="pmcid">PMC2729128</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref6">
                <label>6</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Esteller</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Promoter hypermethylation and BRCA1 inactivation in sporadic breast and ovarian tumors.</article-title>
                    <source>

                        <italic toggle="yes">J. Natl. Cancer Inst.</italic>
</source>
                    <year>2000</year>;<volume>92</volume>:<fpage>564</fpage>&#x2013;<lpage>569</lpage>.
                    <pub-id pub-id-type="pmid">10749912</pub-id>
                    <pub-id pub-id-type="doi">10.1093/jnci/92.7.564</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref7">
                <label>7</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wolff</surname>
                            <given-names>EM</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Hypomethylation of a LINE-1 promoter activates an alternate transcript of the MET oncogene in bladders with cancer.</article-title>
                    <source>

                        <italic toggle="yes">PLoS Genet.</italic>
</source>
                    <year>2010</year>;<volume>6</volume>:<fpage>e1000917</fpage>.
                    <pub-id pub-id-type="pmid">20421991</pub-id>
                    <pub-id pub-id-type="doi">10.1371/journal.pgen.1000917</pub-id>
                    <pub-id pub-id-type="pmcid">PMC2858672</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref8">
                <label>8</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Jablonski</surname>
                            <given-names>KP</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Contribution of 3D genome topological domains to genetic risk of cancers: a genome-wide computational study.</article-title>
                    <source>

                        <italic toggle="yes">Hum. Genomics.</italic>
</source>
                    <year>2022</year>;<volume>16</volume>:<fpage>2</fpage>.
                    <pub-id pub-id-type="pmid">35016721</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s40246-022-00375-2</pub-id>
                    <pub-id pub-id-type="pmcid">PMC8753905</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref9">
                <label>9</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Dixon</surname>
                            <given-names>JR</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Topological domains in mammalian genomes identified by analysis of chromatin interactions.</article-title>
                    <source>

                        <italic toggle="yes">Nature.</italic>
</source>
                    <year>2012</year>;<volume>485</volume>:<fpage>376</fpage>&#x2013;<lpage>380</lpage>.
                    <pub-id pub-id-type="pmid">22495300</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nature11082</pub-id>
                    <pub-id pub-id-type="pmcid">PMC3356448</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref10">
                <label>10</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>McArthur</surname>
                            <given-names>E</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Capra</surname>
                            <given-names>JA</given-names>
                        </name>
</person-group>:
                    <article-title>Topologically associating domain boundaries that are stable across diverse cell types are evolutionarily constrained and enriched for heritability.</article-title>
                    <source>

                        <italic toggle="yes">Am. J. Hum. Genet.</italic>
</source>
                    <year>2021</year>;<volume>108</volume>:<fpage>269</fpage>&#x2013;<lpage>283</lpage>.
                    <pub-id pub-id-type="pmid">33545030</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.ajhg.2021.01.001</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7895846</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref11">
                <label>11</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Akdemir</surname>
                            <given-names>KC</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Somatic mutation distributions in cancer genomes vary with three-dimensional chromatin structure.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Genet.</italic>
</source>
                    <year>2020</year>;<volume>52</volume>:<fpage>1178</fpage>&#x2013;<lpage>1188</lpage>.
                    <pub-id pub-id-type="pmid">33020667</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41588-020-0708-0</pub-id>
                    <pub-id pub-id-type="pmcid">PMC8350746</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref12">
                <label>12</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Rao</surname>
                            <given-names>SSP</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping.</article-title>
                    <source>

                        <italic toggle="yes">Cell.</italic>
</source>
                    <year>2014</year>;<volume>159</volume>:<fpage>1665</fpage>&#x2013;<lpage>1680</lpage>.
                    <pub-id pub-id-type="pmid">25497547</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.cell.2014.11.021</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5635824</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref13">
                <label>13</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Nora</surname>
                            <given-names>EP</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Spatial partitioning of the regulatory landscape of the X-inactivation centre.</article-title>
                    <source>

                        <italic toggle="yes">Nature.</italic>
</source>
                    <year>2012</year>;<volume>485</volume>:<fpage>381</fpage>&#x2013;<lpage>385</lpage>.
                    <pub-id pub-id-type="pmid">22495304</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nature11049</pub-id>
                    <pub-id pub-id-type="pmcid">PMC3555144</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref14">
                <label>14</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Li</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Peng</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Panchenko</surname>
                            <given-names>AR</given-names>
                        </name>
</person-group>:
                    <article-title>DNA methylation: Precise modulation of chromatin structure and dynamics.</article-title>
                    <source>

                        <italic toggle="yes">Curr. Opin. Struct. Biol.</italic>
</source>
                    <year>2022</year>;<volume>75</volume>:<fpage>102430</fpage>.
                    <pub-id pub-id-type="pmid">35914496</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.sbi.2022.102430</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref15">
                <label>15</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Curradi</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Izzo</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Badaracco</surname>
                            <given-names>G</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Molecular mechanisms of gene silencing mediated by DNA methylation.</article-title>
                    <source>

                        <italic toggle="yes">Mol. Cell. Biol.</italic>
</source>
                    <year>2002</year>;<volume>22</volume>:<fpage>3157</fpage>&#x2013;<lpage>3173</lpage>.
                    <pub-id pub-id-type="pmid">11940673</pub-id>
                    <pub-id pub-id-type="doi">10.1128/MCB.22.9.3157-3173.2002</pub-id>
                    <pub-id pub-id-type="pmcid">PMC133775</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref46">
                <label>16</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Gong</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Pancan-meQTL: a database to systematically evaluate the effects of genetic variants on methylation in human cancer.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2019</year>;<volume>47</volume>:<fpage>D1066</fpage>&#x2013;<lpage>D1072</lpage>.
                    <pub-id pub-id-type="pmid">30203047</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gky814</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6323988</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref16">
                <label>17</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Tate</surname>
                            <given-names>JG</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>COSMIC: the Catalogue Of Somatic Mutations In Cancer.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2019</year>;<volume>47</volume>:<fpage>D941</fpage>&#x2013;<lpage>D947</lpage>.
                    <pub-id pub-id-type="pmid">30371878</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gky1015</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6323903</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref17">
                <label>18</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Elgart</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Non-linear machine learning models incorporating SNPs and PRS improve polygenic prediction in diverse human populations.</article-title>
                    <source>

                        <italic toggle="yes">Commun. Biol.</italic>
</source>
                    <year>2022</year>;<volume>5</volume>:<fpage>856</fpage>.
                    <pub-id pub-id-type="pmid">35995843</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s42003-022-03812-z</pub-id>
                    <pub-id pub-id-type="pmcid">PMC9395509</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref18">
                <label>19</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Sheehan</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Investigating the Link between Lynch Syndrome and Breast Cancer.</article-title>
                    <source>

                        <italic toggle="yes">Eur. J. Breast Health.</italic>
</source>
                    <year>2020</year>;<volume>16</volume>:<fpage>106</fpage>&#x2013;<lpage>109</lpage>.
                    <pub-id pub-id-type="pmid">32285031</pub-id>
                    <pub-id pub-id-type="doi">10.5152/ejbh.2020.5198</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7138356</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref19">
                <label>20</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ma</surname>
                            <given-names>S-J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Liu</surname>
                            <given-names>Y-M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zhang</surname>
                            <given-names>Y-L</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Correlations of and gene polymorphisms with breast cancer susceptibility and prognosis.</article-title>
                    <source>

                        <italic toggle="yes">Biosci. Rep.</italic>
</source>
                    <year>2018</year>;<volume>38</volume>.
                    <pub-id pub-id-type="pmid">29089464</pub-id>
                    <pub-id pub-id-type="doi">10.1042/BSR20170656</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5794497</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref20">
                <label>21</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Park</surname>
                            <given-names>U-H</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>ASXL2 promotes proliferation of breast cancer cells by linking ER&#x03b1; to histone methylation.</article-title>
                    <source>

                        <italic toggle="yes">Oncogene.</italic>
</source>
                    <year>2016</year>;<volume>35</volume>:<fpage>3742</fpage>&#x2013;<lpage>3752</lpage>.
                    <pub-id pub-id-type="pmid">26640146</pub-id>
                    <pub-id pub-id-type="doi">10.1038/onc.2015.443</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref21">
                <label>22</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>X</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Clinical and prognostic relevance of EZH2 in breast cancer: A meta-analysis.</article-title>
                    <source>

                        <italic toggle="yes">Biomed. Pharmacother.</italic>
</source>
                    <year>2015</year>;<volume>75</volume>:<fpage>218</fpage>&#x2013;<lpage>225</lpage>.
                    <pub-id pub-id-type="pmid">26271144</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.biopha.2015.07.038</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref58">
                <label>23</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Mavaddat</surname>
                            <given-names>N</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Michailidou</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Dennis</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Polygenic risk scores for prediction of breast cancer and breast cancer subtypes.</article-title>
                    <source>

                        <italic toggle="yes">Am. J. Hum. Genet.</italic>
</source>
                    <year>2019 Jan 3</year>;<volume>104</volume>(<issue>1</issue>):<fpage>21</fpage>&#x2013;<lpage>34</lpage>.
                    <pub-id pub-id-type="pmid">30554720</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.ajhg.2018.11.002</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6323553</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref59">
                <label>24</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Lakeman</surname>
                            <given-names>IMM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Rodr&#x00ed;guez-Girondo</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lee</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Validation of the BOADICEA model and a 313-variant polygenic risk score for breast cancer risk prediction in a Dutch prospective cohort.</article-title>
                    <source>

                        <italic toggle="yes">Genet. Med.</italic>
</source>
                    <year>2020</year>;<volume>22</volume>:<fpage>1803</fpage>&#x2013;<lpage>1811</lpage>.
                    <pub-id pub-id-type="pmid">32624571</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41436-020-0884-4</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7605432</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref22">
                <label>25</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Walia</surname>
                            <given-names>V</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Mutational and functional analysis of the tumor-suppressor PTPRD in human melanoma.</article-title>
                    <source>

                        <italic toggle="yes">Hum. Mutat.</italic>
</source>
                    <year>2014</year>;<volume>35</volume>:<fpage>1301</fpage>&#x2013;<lpage>1310</lpage>.
                    <pub-id pub-id-type="pmid">25113440</pub-id>
                    <pub-id pub-id-type="doi">10.1002/humu.22630</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref23">
                <label>26</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Schrama</surname>
                            <given-names>D</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>ERCC5 p.Asp1104His and ERCC2 p.Lys751Gln polymorphisms are independent prognostic factors for the clinical course of melanoma.</article-title>
                    <source>

                        <italic toggle="yes">J. Invest. Dermatol.</italic>
</source>
                    <year>2011</year>;<volume>131</volume>:<fpage>1280</fpage>&#x2013;<lpage>1290</lpage>.
                    <pub-id pub-id-type="pmid">21390047</pub-id>
                    <pub-id pub-id-type="doi">10.1038/jid.2011.35</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref24">
                <label>27</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Henr&#x00ed;quez-Hern&#x00e1;ndez</surname>
                            <given-names>LA</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Single nucleotide polymorphisms in DNA repair genes as risk factors associated to prostate cancer progression.</article-title>
                    <source>

                        <italic toggle="yes">BMC Med. Genet.</italic>
</source>
                    <year>2014</year>;<volume>15</volume>:<fpage>143</fpage>.
                    <pub-id pub-id-type="pmid">25540025</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s12881-014-0143-0</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4316399</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref25">
                <label>28</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Zhu</surname>
                            <given-names>Y</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Systematic analysis on expression quantitative trait loci identifies a novel regulatory variant in ring finger and WD repeat domain 3 associated with prognosis of pancreatic cancer.</article-title>
                    <source>

                        <italic toggle="yes">Chin. Med. J.</italic>
</source>
                    <year>2022</year>;<volume>135</volume>:<fpage>1348</fpage>&#x2013;<lpage>1357</lpage>.
                    <pub-id pub-id-type="pmid">35830250</pub-id>
                    <pub-id pub-id-type="doi">10.1097/CM9.0000000000002180</pub-id>
                    <pub-id pub-id-type="pmcid">PMC9433068</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref26">
                <label>29</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Fu</surname>
                            <given-names>X</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>RFWD3-Mdm2 ubiquitin ligase complex positively regulates p53 stability in response to DNA damage.</article-title>
                    <source>

                        <italic toggle="yes">Proc. Natl. Acad. Sci. U. S. A.</italic>
</source>
                    <year>2010</year>;<volume>107</volume>:<fpage>4579</fpage>&#x2013;<lpage>4584</lpage>.
                    <pub-id pub-id-type="pmid">20173098</pub-id>
                    <pub-id pub-id-type="doi">10.1073/pnas.0912094107</pub-id>
                    <pub-id pub-id-type="pmcid">PMC2842028</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref27">
                <label>30</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Dasgupta</surname>
                            <given-names>P</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>LncRNA CDKN2B-AS1/miR-141/cyclin D network regulates tumor progression and metastasis of renal cell carcinoma.</article-title>
                    <source>

                        <italic toggle="yes">Cell Death Dis.</italic>
</source>
                    <year>2020</year>;<volume>11</volume>:<fpage>660</fpage>.
                    <pub-id pub-id-type="pmid">32814766</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41419-020-02877-0</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7438482</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref28">
                <label>31</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Pellegata</surname>
                            <given-names>NS</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Human pheochromocytomas show reduced p27Kip1 expression that is not associated with somatic gene mutations and rarely with deletions.</article-title>
                    <source>

                        <italic toggle="yes">Virchows Arch.</italic>
</source>
                    <year>2007</year>;<volume>451</volume>:<fpage>37</fpage>&#x2013;<lpage>46</lpage>.
                    <pub-id pub-id-type="doi">10.1007/s00428-007-0431-6</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref29">
                <label>32</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Theodoropoulos</surname>
                            <given-names>GE</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Caspase 9 promoter polymorphisms confer increased susceptibility to breast cancer.</article-title>
                    <source>

                        <italic toggle="yes">Cancer Genet.</italic>
</source>
                    <year>2012</year>;<volume>205</volume>:<fpage>508</fpage>&#x2013;<lpage>512</lpage>.
                    <pub-id pub-id-type="pmid">22981751</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.cancergen.2012.08.001</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref30">
                <label>33</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Rodriguez-Ruiz</surname>
                            <given-names>ME</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Apoptotic caspases inhibit abscopal responses to radiation and identify a new prognostic biomarker for breast cancer patients.</article-title>
                    <source>

                        <italic toggle="yes">Oncoimmunology.</italic>
</source>
                    <year>2019</year>;<volume>8</volume>:<fpage>e1655964</fpage>.
                    <pub-id pub-id-type="pmid">31646105</pub-id>
                    <pub-id pub-id-type="doi">10.1080/2162402X.2019.1655964</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6791460</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref31">
                <label>34</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Walsh</surname>
                            <given-names>CS</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>ERCC5 is a novel biomarker of ovarian cancer prognosis.</article-title>
                    <source>

                        <italic toggle="yes">J. Clin. Oncol.</italic>
</source>
                    <year>2008</year>;<volume>26</volume>:<fpage>2952</fpage>&#x2013;<lpage>2958</lpage>.
                    <pub-id pub-id-type="doi">10.1200/JCO.2007.13.5806</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref32">
                <label>35</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Shuai</surname>
                            <given-names>W</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>ETNK1 mutation occurs in a wide spectrum of myeloid neoplasms and is not specific for atypical chronic myeloid leukemia.</article-title>
                    <source>

                        <italic toggle="yes">Cancer.</italic>
</source>
                    <year>2023</year>;<volume>129</volume>:<fpage>878</fpage>&#x2013;<lpage>889</lpage>.
                    <pub-id pub-id-type="pmid">36583229</pub-id>
                    <pub-id pub-id-type="doi">10.1002/cncr.34616</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref33">
                <label>36</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Stoica</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ferreira</surname>
                            <given-names>AK</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hannan</surname>
                            <given-names>K</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Bilayer Forming Phospholipids as Targets for Cancer Therapy.</article-title>
                    <source>

                        <italic toggle="yes">Int. J. Mol. Sci.</italic>
</source>
                    <year>2022</year>;<volume>23</volume>.
                    <pub-id pub-id-type="pmid">35563655</pub-id>
                    <pub-id pub-id-type="doi">10.3390/ijms23095266</pub-id>
                    <pub-id pub-id-type="pmcid">PMC9100777</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref34">
                <label>37</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ahmed</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>CRISPRi screens reveal a DNA methylation-mediated 3D genome dependent causal mechanism in prostate cancer.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Commun.</italic>
</source>
                    <year>2021</year>;<volume>12</volume>:<fpage>1781</fpage>.
                    <pub-id pub-id-type="pmid">33741908</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41467-021-21867-0</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7979745</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref35">
                <label>38</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Xia</surname>
                            <given-names>J-H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wei</surname>
                            <given-names>G-H</given-names>
                        </name>
</person-group>:
                    <article-title>Enhancer Dysfunction in 3D Genome and Disease.</article-title>
                    <source>

                        <italic toggle="yes">Cells.</italic>
</source>
                    <year>2019</year>;<volume>8</volume>.
                    <pub-id pub-id-type="pmid">31635067</pub-id>
                    <pub-id pub-id-type="doi">10.3390/cells8101281</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6830074</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref36">
                <label>39</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Fudenberg</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Pollard</surname>
                            <given-names>KS</given-names>
                        </name>
</person-group>:
                    <article-title>Chromatin features constrain structural variation across evolutionary timescales.</article-title>
                    <source>

                        <italic toggle="yes">Proc. Natl. Acad. Sci. U. S. A.</italic>
</source>
                    <year>2019</year>;<volume>116</volume>:<fpage>2175</fpage>&#x2013;<lpage>2180</lpage>.
                    <pub-id pub-id-type="pmid">30659153</pub-id>
                    <pub-id pub-id-type="doi">10.1073/pnas.1808631116</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6369792</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref37">
                <label>40</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Rovirosa</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ramos-Morales</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Javierre</surname>
                            <given-names>BM</given-names>
                        </name>
</person-group>:
                    <article-title>The Genome in a Three-Dimensional Context: Deciphering the Contribution of Noncoding Mutations at Enhancers to Blood Cancer.</article-title>
                    <source>

                        <italic toggle="yes">Front. Immunol.</italic>
</source>
                    <year>2020</year>;<volume>11</volume>:<fpage>592087</fpage>.
                    <pub-id pub-id-type="pmid">33117405</pub-id>
                    <pub-id pub-id-type="doi">10.3389/fimmu.2020.592087</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7575776</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref38">
                <label>41</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Valton</surname>
                            <given-names>A-L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Dekker</surname>
                            <given-names>J</given-names>
                        </name>
</person-group>:
                    <article-title>TAD disruption as oncogenic driver.</article-title>
                    <source>

                        <italic toggle="yes">Curr. Opin. Genet. Dev.</italic>
</source>
                    <year>2016</year>;<volume>36</volume>:<fpage>34</fpage>&#x2013;<lpage>40</lpage>.
                    <pub-id pub-id-type="pmid">27111891</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.gde.2016.03.008</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4880504</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref39">
                <label>42</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Pagadala</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Germline modifiers of the tumor immune microenvironment implicate drivers of cancer risk and immunotherapy response.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Commun.</italic>
</source>
                    <year>2023</year>;<volume>14</volume>:<fpage>2744</fpage>.
                    <pub-id pub-id-type="pmid">37173324</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41467-023-38271-5</pub-id>
                    <pub-id pub-id-type="pmcid">PMC10182072</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref40">
                <label>43</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Zhang</surname>
                            <given-names>P</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Germline and Somatic Genetic Variants in the p53 Pathway Interact to Affect Cancer Risk, Progression, and Drug Response.</article-title>
                    <source>

                        <italic toggle="yes">Cancer Res.</italic>
</source>
                    <year>2021</year>;<volume>81</volume>:<fpage>1667</fpage>&#x2013;<lpage>1680</lpage>.
                    <pub-id pub-id-type="pmid">33558336</pub-id>
                    <pub-id pub-id-type="doi">10.1158/0008-5472.CAN-20-0177</pub-id>
                    <pub-id pub-id-type="pmcid">PMC10266546</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref41">
                <label>44</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Sayaman</surname>
                            <given-names>RW</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Germline genetic contribution to the immune landscape of cancer.</article-title>
                    <source>

                        <italic toggle="yes">Immunity.</italic>
</source>
                    <year>2021</year>;<volume>54</volume>:<fpage>367</fpage>&#x2013;<lpage>386.e8</lpage>.
                    <pub-id pub-id-type="pmid">33567262</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.immuni.2021.01.011</pub-id>
                    <pub-id pub-id-type="pmcid">PMC8414660</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref42">
                <label>45</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Carter</surname>
                            <given-names>H</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Interaction Landscape of Inherited Polymorphisms with Somatic Events in Cancer.</article-title>
                    <source>

                        <italic toggle="yes">Cancer Discov.</italic>
</source>
                    <year>2017</year>;<volume>7</volume>:<fpage>410</fpage>&#x2013;<lpage>423</lpage>.
                    <pub-id pub-id-type="pmid">28188128</pub-id>
                    <pub-id pub-id-type="doi">10.1158/2159-8290.CD-16-1045</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5460679</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref43">
                <label>46</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Dworkin</surname>
                            <given-names>AM</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Germline variation controls the architecture of somatic alterations in tumors.</article-title>
                    <source>

                        <italic toggle="yes">PLoS Genet.</italic>
</source>
                    <year>2010</year>;<volume>6</volume>:<fpage>e1001136</fpage>.
                    <pub-id pub-id-type="pmid">20885788</pub-id>
                    <pub-id pub-id-type="doi">10.1371/journal.pgen.1001136</pub-id>
                    <pub-id pub-id-type="pmcid">PMC2944791</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref44">
                <label>47</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Li</surname>
                            <given-names>Q</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Expression QTL-based analyses reveal candidate causal genes and loci across five tumor types.</article-title>
                    <source>

                        <italic toggle="yes">Hum. Mol. Genet.</italic>
</source>
                    <year>2014</year>;<volume>23</volume>:<fpage>5294</fpage>&#x2013;<lpage>5302</lpage>.
                    <pub-id pub-id-type="pmid">24907074</pub-id>
                    <pub-id pub-id-type="doi">10.1093/hmg/ddu228</pub-id>
                    <pub-id pub-id-type="pmcid">PMC4215106</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref45">
                <label>48</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Li</surname>
                            <given-names>W</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>
                        <italic toggle="yes">Cis</italic>- and 
                        <italic toggle="yes">Trans</italic>-Acting Expression Quantitative Trait Loci of Long Non-Coding RNA in 2,549 Cancers With Potential Clinical and Therapeutic Implications.</article-title>
                    <source>

                        <italic toggle="yes">Front. Oncol.</italic>
</source>
                    <year>2020</year>;<volume>10</volume>:<fpage>602104</fpage>.
                    <pub-id pub-id-type="pmid">33194770</pub-id>
                    <pub-id pub-id-type="doi">10.3389/fonc.2020.602104</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7604522</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref47">
                <label>49</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Akdemir</surname>
                            <given-names>KC</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Disruption of chromatin folding domains by somatic genomic rearrangements in human cancer.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Genet.</italic>
</source>
                    <year>2020</year>;<volume>52</volume>:<fpage>294</fpage>&#x2013;<lpage>305</lpage>.
                    <pub-id pub-id-type="pmid">32024999</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41588-019-0564-y</pub-id>
                    <pub-id pub-id-type="pmcid">PMC7058537</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref60">
                <label>50</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Yousefi</surname>
                            <given-names>PD</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Suderman</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Langdon</surname>
                            <given-names>R</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>DNA methylation-based predictors of health: applications and statistical considerations.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Rev. Genet.</italic>
</source>
                    <year>2022</year>;<volume>23</volume>:<fpage>369</fpage>&#x2013;<lpage>383</lpage>.
                    <pub-id pub-id-type="pmid">35304597</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41576-022-00465-w</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref48">
                <label>51</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Liu</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics.</article-title>
                    <source>

                        <italic toggle="yes">Cell.</italic>
</source>
                    <year>2018</year>;<volume>173</volume>:<fpage>400</fpage>&#x2013;<lpage>416.e11</lpage>.
                    <pub-id pub-id-type="pmid">29625055</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.cell.2018.02.052</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6066282</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref49">
                <label>52</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kazachenka</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Identification, Characterization, and Heritability of Murine Metastable Epialleles: Implications for Non-genetic Inheritance.</article-title>
                    <source>

                        <italic toggle="yes">Cell.</italic>
</source>
                    <year>2018</year>;<volume>175</volume>:<fpage>1717</fpage>.
                    <pub-id pub-id-type="pmid">30500541</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.cell.2018.11.017</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6269165</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref50">
                <label>53</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Inoue</surname>
                            <given-names>F</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A systematic comparison reveals substantial differences in chromosomal versus episomal encoding of enhancer activity.</article-title>
                    <source>

                        <italic toggle="yes">Genome Res.</italic>
</source>
                    <year>2017</year>;<volume>27</volume>:<fpage>38</fpage>&#x2013;<lpage>52</lpage>.
                    <pub-id pub-id-type="pmid">27831498</pub-id>
                    <pub-id pub-id-type="doi">10.1101/gr.212092.116</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5204343</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref51">
                <label>54</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Bycroft</surname>
                            <given-names>C</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The UK Biobank resource with deep phenotyping and genomic data.</article-title>
                    <source>

                        <italic toggle="yes">Nature.</italic>
</source>
                    <year>2018</year>;<volume>562</volume>:<fpage>203</fpage>&#x2013;<lpage>209</lpage>.
                    <pub-id pub-id-type="pmid">30305743</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41586-018-0579-z</pub-id>
                    <pub-id pub-id-type="pmcid">PMC6786975</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref52">
                <label>55</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Amos</surname>
                            <given-names>CI</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The OncoArray Consortium: A Network for Understanding the Genetic Architecture of Common Cancers.</article-title>
                    <source>

                        <italic toggle="yes">Cancer Epidemiol. Biomark. Prev.</italic>
</source>
                    <year>2017</year>;<volume>26</volume>:<fpage>126</fpage>&#x2013;<lpage>135</lpage>.
                    <pub-id pub-id-type="pmid">27697780</pub-id>
                    <pub-id pub-id-type="doi">10.1158/1055-9965.EPI-16-0106</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5224974</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref53">
                <label>56</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <collab>Roadmap Epigenomics Consortium</collab>

                        <etal/>
</person-group>:
                    <article-title>Integrative analysis of 111 reference human epigenomes.</article-title>
                    <source>

                        <italic toggle="yes">Nature.</italic>
</source>
                    <year>2015</year>;<volume>518</volume>:<fpage>317</fpage>&#x2013;<lpage>330</lpage>.</mixed-citation>
            </ref>
            <ref id="ref54">
                <label>57</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Purcell</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>PLINK: a tool set for whole-genome association and population-based linkage analyses.</article-title>
                    <source>

                        <italic toggle="yes">Am. J. Hum. Genet.</italic>
</source>
                    <year>2007</year>;<volume>81</volume>:<fpage>559</fpage>&#x2013;<lpage>575</lpage>.
                    <pub-id pub-id-type="pmid">17701901</pub-id>
                    <pub-id pub-id-type="doi">10.1086/519795</pub-id>
                    <pub-id pub-id-type="pmcid">PMC1950838</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref55">
                <label>58</label>
                <mixed-citation publication-type="journal">
                    <collab>ENCODE Project Consortium</collab>:
                    <article-title>An integrated encyclopedia of DNA elements in the human genome.</article-title>
                    <source>

                        <italic toggle="yes">Nature.</italic>
</source>
                    <year>2012</year>;<volume>489</volume>:<fpage>57</fpage>&#x2013;<lpage>74</lpage>.
                    <pub-id pub-id-type="pmid">22955616</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nature11247</pub-id>
                    <pub-id pub-id-type="pmcid">PMC3439153</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref56">
                <label>59</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hall</surname>
                            <given-names>MA</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>PLATO software provides analytic framework for investigating complexity beyond genome-wide association studies.</article-title>
                    <source>

                        <italic toggle="yes">Nat. Commun.</italic>
</source>
                    <year>2017</year>;<volume>8</volume>:<fpage>1167</fpage>.
                    <pub-id pub-id-type="pmid">29079728</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41467-017-00802-2</pub-id>
                    <pub-id pub-id-type="pmcid">PMC5660079</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref57">
                <label>60</label>
                <mixed-citation publication-type="other">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Goudarzi</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hcarter</surname>
                        </name>
</person-group>:
                    <article-title>cartercompbio/meQTLs: Initial release (v1.0.0).</article-title>
                    <source>

                        <italic toggle="yes">Zenodo.</italic>
</source>
                    <year>2023</year>.
                    <pub-id pub-id-type="doi">10.5281/zenodo.8168488</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref61">
                <label>61</label>
                <mixed-citation publication-type="data">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Carter</surname>
                            <given-names>H</given-names>
                        </name>
</person-group>:
                    <data-title>Extended_Data_Figure_1.pdf. figshare.</data-title>
                    <source>

                        <italic toggle="yes">Figure.</italic>
</source>
                    <year>2025</year>.
                    <pub-id pub-id-type="doi">10.6084/m9.figshare.29610992.v1</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref62">
                <label>62</label>
                <mixed-citation publication-type="data">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Carter</surname>
                            <given-names>H</given-names>
                        </name>
</person-group>:
                    <data-title>Extended Data Figure 2. figshare.</data-title>
                    <source>

                        <italic toggle="yes">Figure.</italic>
</source>
                    <year>2025</year>.
                    <pub-id pub-id-type="doi">10.6084/m9.figshare.29610998.v1</pub-id>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report446700">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.185383.r446700</article-id>
            <title-group>
                <article-title>Reviewer response for version 2</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Yamaguchi</surname>
                        <given-names>Kosuke</given-names>
                    </name>
                    <xref ref-type="aff" rid="r446700a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-2926-9444</uri>
                </contrib>
                <aff id="r446700a1">
                    <label>1</label>Molecular Cell Engineering Laboratory, National Institute of Genetics, Misima, Shizuoka, Japan</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>16</day>
                <month>1</month>
                <year>2026</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2026 Yamaguchi K</copyright-statement>
                <copyright-year>2026</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport446700" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.139476.2"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>Methylation quantitative trait loci (meQTLs) are single nucleotide polymorphisms (SNPs) that are statistically associated with variation in DNA methylation levels at specific CpG sites. Such genetic-epigenetic interactions have attracted considerable interest in cancer research, as altered DNA methylation patterns can influence the expression of tumor suppressor genes and oncogenes. In addition, higher-order genome organization, including topologically associating domains (TADs), may further modulate the functional impact of genetic variants on epigenetic regulation in a context- and disease-specific manner.</p>
            <p> </p>
            <p> In this study, the authors integrate cancer-associated meQTLs with TAD organization. They report that transcriptionally active TADs tend to exhibit relative DNA hypomethylation, whereas inactive TADs are associated with DNA hypermethylation (Fig. 1). Furthermore, the authors observe that meQTLs are enriched at TAD boundaries, particularly at active-boundary-active, inactive-boundary-active, and active-boundary-inactive configurations (Fig. 2). The authors also report that oncogene-associated meQTLs are preferentially located within inactive TADs, whereas tumor suppressor gene-associated meQTLs are more frequently found in active TADs, based on randomization analyses (Fig. 3).</p>
            <p> </p>
            <p> The authors further identify 156 cancer gene-related clustered meQTLs (Table 1), among which 23 are reported to be significantly associated with cancer risk and survival across multiple cancer types in a pan-cancer model (Table 2). Breast cancer is highlighted as a representative example, supported by relatively large sample sizes from the UK Biobank and additional cohorts (Table 2, C50-C50). Using an XGBoost-trained model, the authors calculate a polygenic risk score (PRS) and suggest that this score may be associated with cancer risk (Fig. 4). The authors additionally attempt to isolate key contributing factors (MSH2, EZH2, and ASXL2) for cancer risk prediction; however, the expression levels of these genes do not show a significant correlation with PRS values (Fig. 5). Finally, the study proposes polygenic survival scores (PSS) derived from these meQTLs, suggesting that these scores may stratify cancer risk and survival beyond conventional clinical parameters (Fig. 6 and 7).</p>
            <p> </p>
            <p> Overall, this study provides conceptually interesting insights into the integration of genetic variation, DNA methylation, and three-dimensional genome organization in cancer risk assessment. However, several conclusions are not fully supported by the presented data, and important controls or additional analyses appear to be lacking. This reviewer therefore suggests that additional experiments and/or analyses would substantially strengthen the manuscript.</p>
            <p> </p>
            <p> </p>
            <p> Major Comments</p>
            <p> 1. Figure and table labeling issues:</p>
            <p> Several figures and tables are not correctly labeled or lack essential information. The authors should carefully review all figures and tables to ensure clarity and completeness. Examples identified by this reviewer include: 
                <list list-type="bullet">
                    <list-item>
                        <p>Figure 1A: The meaning of the X-axis label is unclear and should be explicitly defined.</p>
                    </list-item>
                    <list-item>
                        <p>Table 2: No false discovery rate (FDR) or q-value information is provided; only p-values associated with ICD-10 codes are shown. If these p-values are intended to represent FDR-adjusted values, they should be clearly labeled as such.</p>
                    </list-item>
                    <list-item>
                        <p>Table 2: The table legend describes beta values as correlation coefficients between meQTLs and promoter DNA methylation; however, it is not sufficiently clear from the table how these beta values should be interpreted in relation to the reported cancer risk and survival associations. Additional clarification would improve the readability of the table.</p>
                    </list-item>
                    <list-item>
                        <p>Figure 5B: All Y-axis labels are shown as "MSH2 Expression (Z-score)," which appears to be incorrect.</p>
                    </list-item>
                </list> </p>
            <p> 2. DNA methylation source in Fig. 1B:</p>
            <p> In Fig. 1B, the authors analyze DNA methylation levels across 1,100 TADs shared among five cell lines (GM12878, HMEC, HUVEC, IMR90, and NHEK). As all these cell lines are derived from normal tissues, it is critical to clarify whether normal tissue DNA methylation data were used in this analysis. This reviewer requests explicit confirmation that normal (non-tumor) DNA methylation data from the TCGA database were used, rather than cancer-derived samples.</p>
            <p> </p>
            <p> 3. Interpretation of Table 2 and biological linkage:</p>
            <p> Table 2 presents a list of meQTLs reported to significantly affect cancer risk and survival in a pan-cancer model. This reviewer assumes that the authors aim to establish a mechanistic link between meQTLs, TAD organization, promoter DNA methylation of cancer-related genes, gene expression changes, and cancer risk or survival. To strengthen this interpretation, this reviewer suggests: 
                <list list-type="bullet">
                    <list-item>
                        <p>Adding gene expression comparisons between cancer and normal tissues for the genes listed in Table 2.</p>
                    </list-item>
                    <list-item>
                        <p>Explicitly annotating whether each gene in Table 2 is classified as a tumor suppressor gene or an oncogene.</p>
                    </list-item>
                </list> </p>
            <p> 4. Evaluation of PRS specificity in Fig. 4:</p>
            <p> In Fig. 4, the authors show that a PRS derived from 23 TAD-associated meQTLs predicts breast cancer risk. However, meQTLs themselves have been reported as cancer-associated variants independent of TAD context. To specifically demonstrate the added value of TAD information, this reviewer recommends performing parallel PRS analyses using: 
                <list list-type="bullet">
                    <list-item>
                        <p>Pan-cancer gene-related meQTLs, or</p>
                    </list-item>
                    <list-item>
                        <p>All identified meQTLs,</p>
                    </list-item>
                    <list-item>
                        <p>and directly comparing the prediction performance between TAD-associated meQTLs and these broader meQTL sets.</p>
                    </list-item>
                </list> </p>
            <p> 5. PSS comparison in Fig. 5:</p>
            <p> Similarly, for Fig. 5, the authors should calculate PSS values using pan-cancer gene-related meQTLs and compare their predictive performance with TAD-associated meQTL-based PSS. This comparison is necessary to demonstrate that incorporating TAD information provides added predictive value.</p>
            <p> </p>
            <p> 6. Discussion: relevance of DNA methylation-CTCF interactions:</p>
            <p> Recent studies have reported that DNA methylation can directly affect CTCF binding, a key regulator of 3D genome organization. Incorporating this literature would strengthen the conceptual framework of the manuscript. The following references are suggested for discussion: 
                <list list-type="bullet">
                    <list-item>
                        <p>PMID: 10839546</p>
                    </list-item>
                    <list-item>
                        <p>PMID: 10839547</p>
                    </list-item>
                    <list-item>
                        <p>PMID: 12461525</p>
                    </list-item>
                    <list-item>
                        <p>PMID: 26257180</p>
                    </list-item>
                    <list-item>
                        <p>PMID: 30948436</p>
                    </list-item>
                    <list-item>
                        <p>PMID: 39180406</p>
                    </list-item>
                </list> </p>
            <p> Minor Comments</p>
            <p> Introduction, second paragraph:</p>
            <p> "L1NE1" appears to be a typographical error and should be corrected to "LINE1."</p>
            <p> </p>
            <p> Figures 3B and 3C:</p>
            <p> The distinction between observed and random values is difficult to interpret. Using bar plots for both values with a clear legend would improve readability.</p>
            <p> </p>
            <p> Figure 4:</p>
            <p> The term polygenic risk score (PRS) is not defined prior to the use of the abbreviation. For readers outside the field, the full term should be introduced before abbreviation.</p>
            <p> </p>
            <p> Figure 4A:</p>
            <p> The X-axis labels "0" and "1" are not explained. It would be clearer to label these as "control" and "cancer," respectively.</p>
            <p> </p>
            <p> Text related to Fig. 5:</p>
            <p> Brief functional descriptions of MSH2 and EZH2 would improve clarity. MSH2 is a key component of DNA mismatch repair, and EZH2 is a core subunit of the PRC2 complex responsible for H3K27 methylation, both of which are closely linked to cancer biology.</p>
            <p> </p>
            <p> Page 9, Table 2 inconsistency:</p>
            <p> The manuscript states: "Out of the 155 SNPs, 21 passed the Benjamini-Hochberg adjusted FDR of less than 0.05 (Table 2)." However, Table 2 lists 23 SNPs. This discrepancy should be resolved. If an additional table exists, it should be included in the manuscript.</p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Partly</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>Partly</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Partly</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Partly</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Partly</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>Epigenetic, Cell biology, Molecular cell biology.</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report432966">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.185383.r432966</article-id>
            <title-group>
                <article-title>Reviewer response for version 2</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Unal</surname>
                        <given-names>Busra</given-names>
                    </name>
                    <xref ref-type="aff" rid="r432966a1">1</xref>
                    <role>Referee</role>
                </contrib>
                <aff id="r432966a1">
                    <label>1</label>Umraniye Training and Research Hospital, Istanbul, Turkey</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>7</day>
                <month>1</month>
                <year>2026</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2026 Unal B</copyright-statement>
                <copyright-year>2026</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport432966" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.139476.2"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The authors present an interesting and carefully conducted study. The manuscript is overall well written and provides valuable insights. However, several issues would benefit from clarification and further consideration</p>
            <p> 1.Across the manuscript, the terms &#x201c;epigenetic germline variants&#x201d; and &#x201c;cancer meQTLs&#x201d; appear to be used somewhat interchangeably. It is not always clear if the authors are referring to germline variants with stable methylation effects across tissues and disease states, or to tumor-context&#x2013;dependent meQTLs that may emerge within the altered epigenetic landscape of cancer. More consistent terminology would be helpful to clarify; which findings are reflecting underlying germline regulatory structure, and which findings are tumor-specific. This clarification is important because the manuscript&#x2019;s conclusions regarding cancer risk inference may have potential clinical implications.</p>
            <p> 2.The primary datasets used in this study (TCGA, UK Biobank, and DRIVE) are all substantially enriched for individuals of European ancestry. However, the manuscript does not evaluate if the reported findings generalize beyond these populations. Additionally, a discussion on how ancestry-related differences in linkage disequilibrium structure, baseline methylation landscapes, chromatin organization may influence both meQTL discovery and model performance is not fully addressed in the manuscript. Considering that epigenetic regulation and variant&#x2013;methylation coupling can differ across populations, the absence of ancestry-stratified analyses or a conceptual consideration of these issues limits the generalisability of the work. I would recommend that the authors either incorporate ancestry-aware analyses or provide a discussion of how population structure may affect meQTL architecture and the broader applicability of their conclusions.</p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Yes</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>I cannot comment. A qualified statistician is required.</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Partly</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Yes</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Yes</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>My area of research is hereditary cancer predisposition syndromes and disparities in genomic medicine application.</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report400300">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.185383.r400300</article-id>
            <title-group>
                <article-title>Reviewer response for version 2</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Mehta</surname>
                        <given-names>Charu</given-names>
                    </name>
                    <xref ref-type="aff" rid="r400300a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-8644-3333</uri>
                </contrib>
                <aff id="r400300a1">
                    <label>1</label>Basic Sciences Division, Fred Hutchinson Cancer Center, Seattle, Washington, USA</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>12</day>
                <month>8</month>
                <year>2025</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2025 Mehta C</copyright-statement>
                <copyright-year>2025</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport400300" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.139476.2"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>Thanks for your email. I have attached just a couple of minor comments.&#x00a0;</p>
            <p> </p>
            <p> In Fig 5A: meQTL associated with/near MSH2 has the highest &#x201c;relative importance&#x201d;. However, as the authors mention that MSH2 expression levels are not correlated with cancer risk in TCGA patients. Though the authors mention this in the discussion, it would be useful to mention the possibility of long range interactions wrt meQTL and target gene (e.g. through enhancer activity) here as well.</p>
            <p> Missing word on page 8: &#x201c;predictive dirver meQTL was associated with MSH2&#x2026;</p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Yes</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>I cannot comment. A qualified statistician is required.</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Partly</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Yes</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Partly</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>NA</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report233315">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.152752.r233315</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Herzog</surname>
                        <given-names>Chiara</given-names>
                    </name>
                    <xref ref-type="aff" rid="r233315a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-1572-498X</uri>
                </contrib>
                <aff id="r233315a1">
                    <label>1</label>Universitat Innsbruck, Innsbruck, Tyrol, Austria</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>C.H. is the author of a patent on epigenetics-based breast cancer risk prediction using meQTLs, the WID&#x2122;-qtBC test</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>8</day>
                <month>2</month>
                <year>2024</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2024 Herzog C</copyright-statement>
                <copyright-year>2024</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport233315" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.139476.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The study by Goudarzi et al. provides interesting and new insights into the association of meQTLs and TAD regions the genome, and investigates the capacity of meQTLs to predict cancer status and survival.</p>
            <p> </p>
            <p> Overall the study is well done and clearly presented but I have a few comments and suggestions for improvement.</p>
            <p> </p>
            <p> Major:</p>
            <p> - In the introduction and discussion the authors state &#x201c;This study investigated the relationship between epigenetic factors like chromatin structure and DNA methylation and genetic variation in the context of cancer&#x201d;. While the authors indeed investigate methylation and TADs in the first part of the manuscript, the majority of the predictors focus on genetic loci and not epigenetics or their interaction. Arguably, one of the most interesting aspects of meQTLs in their capacity for risk prediction are their modulated methylation levels and potential to reflect the integration of genetic and dynamic nonheritable factors (such as due to aging or lifestyle factors), but this was not looked at in detail. Could the authors comment on how meQTLs might be modulated by nonheritable factors as well as genetic factors, and e.g. look into methylation at these sites in cancers or samples preceding cancer?</p>
            <p> - Along these lines, it might in the future be interesting to develop dynamic cancer risk predictors as opposed to static tools (such as the PRSs), which might be enabled by nongenetic &#x2018;omics&#x2019;. Could the authors discuss the potential of these and how their findings might contribute to this (i.e. how meQTLs might contribute to dynamic risk monitoring)?</p>
            <p> </p>
            <p> Minor:</p>
            <p> </p>
            <p> - The authors describe a PCA but do not show any figures or supporting data. Could the authors either add a statement that no data are shown in the text, or (preferred) provide these data in the supplementary information?</p>
            <p> - Previous breast cancer PRSs (not based on meQTL) such as the PRS313 have already shown that they may be biased towards certain subtypes - it might be worth mentioning these prior models (e.g. Mavaddat et al 2019) when discussing the current study&#x2019;s findings in context.</p>
            <p> - What was the ROC AUC (and 95% CI) of the cancer risk score (Figure 4A)?</p>
            <p> - Can the authors explain the discrepancy of the more &#x2018;linear&#x2019; increase in risk in the UKBB compared to DRIVE (Figure 4b versus 4c)?</p>
            <p> - Figure 6/TCGA survival: It might also be interesting to look at recurrence-free survival in addition to overall survival.</p>
            <p> - In the section Oncogenes and tumor suppressor gene-related (&#x2026;): capitalise 
                <bold>L</bold> in &#x2018;Clumped cancer mQT
                <bold>l</bold>s&#x2019;</p>
            <p> - Figures:</p>
            <p> &#x00a0;&#x00a0;&#x00a0; - Figure 1b: Could the authors also indicate in the Figure legend that this is a Kruskal-Wallis p value.</p>
            <p> &#x00a0;&#x00a0;&#x00a0; - Figure 1c - For interpretability, it might be helpful to add at least y axis grid lines behind box plots. The effects may be significant and is visualised with the violin density plot, but is difficult to see using box plot.</p>
            <p> &#x00a0;&#x00a0;&#x00a0; - Figure 2 - takes some time to understand upon first reading. It might be helpful to label the blue bars in B and C with a legend &#x2018;expected&#x2019; and the red dot as &#x2018;observed&#x2019; to make it easier to grasp quickly.</p>
            <p> &#x00a0;&#x00a0;&#x00a0; - Figure 4: The text in the caption for A refers to a &#x2018;Figure 8&#x2019; that does not exist. Please check this.</p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Yes</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>Yes</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Yes</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Partly</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>epigenetics and cancer risk prediction</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <sub-article article-type="response" id="comment14252-233315">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Carter</surname>
                            <given-names>Hannah</given-names>
                        </name>
                        <aff>University of California, San Diego, USA</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>20</day>
                    <month>7</month>
                    <year>2025</year>
                </pub-date>
            </front-stub>
            <body>
                <p>The study by Goudarzi et al. provides interesting and new insights into the association of meQTLs and TAD regions of the genome, and investigates the capacity of meQTLs to predict cancer status and survival.</p>
                <p> Overall the study is well done and clearly presented but I have a few comments and suggestions for improvement.</p>
                <p> </p>
                <p> Major:</p>
                <p> - In the introduction and discussion the authors state &#x201c;This study investigated the relationship between epigenetic factors like chromatin structure and DNA methylation and genetic variation in the context of cancer&#x201d;. While the authors indeed investigate methylation and TADs in the first part of the manuscript, the majority of the predictors focus on genetic loci and not epigenetics or their interaction. Arguably, one of the most interesting aspects of meQTLs in their capacity for risk prediction are their modulated methylation levels and potential to reflect the integration of genetic and dynamic non heritable factors (such as due to aging or lifestyle factors), but this was not looked at in detail. Could the authors comment on how meQTLs might be modulated by non heritable factors as well as genetic factors, and e.g. look into methylation at these sites in cancers or samples preceding cancer?</p>
                <p> </p>
                <p> 
                    <italic>We agree with the reviewer that it is important to consider potential for interaction with non-heritable factors. We have modified our discussion to acknowledge the need to evaluate non-heritable factors as follows:</italic>
                </p>
                <p>
                    <italic> </italic>
                </p>
                <p>
                    <italic> &#x201c;There are also a number of non-genetic risk factors that act by modifying DNA methylation levels and which could interact with genetic regulation. These include aging, exercise, stress, diet and obesity, and a broad variety of environmental exposures. In our analysis, age had the highest impact on DNA methylation modulation, however, as age and sex were the only clinical factors for the majority of our study, future analysis of other non-genetic factors in relation to genetic regulators of DNA methylation are merited.&#x201d;</italic>
                </p>
                <p> </p>
                <p> </p>
                <p> - Along these lines, it might in the future be interesting to develop dynamic cancer risk predictors as opposed to static tools (such as the PRSs), which might be enabled by nongenetic &#x2018;omics&#x2019;. Could the authors discuss the potential of these and how their findings might contribute to this (i.e. how meQTLs might contribute to dynamic risk monitoring)?</p>
                <p> </p>
                <p> 
                    <italic>This is a nice suggestion. We have added the following to the discussion:&#x00a0;</italic>
                </p>
                <p>
                    <italic> </italic>
                </p>
                <p>
                    <italic> &#x201c;Future efforts could integrate dynamic methylation changes due to these non-genetic factors with static polygenic scores such as we describe here to provide a more accurate estimate of risk. This type of approach could benefit in particular from non-invasive biomarkers, such as cell free DNA methylation from blood, though studies will be needed to establish the cumulative effect of dynamic exposures and the extent to which they can be accurately evaluated from cell free DNA.&#x201d;</italic>
                </p>
                <p>
                    <italic> </italic>
                </p>
                <p>
                    <italic> Ref:</italic>
                </p>
                <p>
                    <italic> Yousefi, P.D., Suderman, M., Langdon, R. et al. DNA methylation-based predictors of health: applications and statistical considerations. Nat Rev Genet 23, 369&#x2013;383 (2022). https://doi.org/10.1038/s41576-022-00465-w</italic>
                </p>
                <p> </p>
                <p> </p>
                <p> Minor:</p>
                <p> - The authors describe a PCA but do not show any figures or supporting data. Could the authors either add a statement that no data are shown in the text, or (preferred) provide these data in the supplementary information?</p>
                <p> </p>
                <p> 
                    <italic>We have added the PCA figure as Supplementary Figure 1.&#x00a0;</italic>
                </p>
                <p> </p>
                <p> </p>
                <p> - Previous breast cancer PRSs (not based on meQTL) such as the PRS313 have already shown that they may be biased towards certain subtypes - it might be worth mentioning these prior models (e.g. Mavaddat et al 2019) when discussing the current study&#x2019;s findings in context.</p>
                <p> </p>
                <p> 
                    <italic>We have now included the following text:</italic>
                </p>
                <p>
                    <italic> </italic>
                </p>
                <p>
                    <italic> &#x201c;Indeed, classic polygenic risk scores for breast cancer have shown bias for predicting certain subtypes (Mavaddat et al). Lakeman et al 2020 demonstrated that women in the highest 1% of risk showed a 4.37-fold increased risk for ER-positive disease but only a 2.78-fold increased risk for ER-negative disease compared to the middle quintile showing bias in certain subtypes.&#x201d;</italic>
                </p>
                <p>
                    <italic> </italic>
                </p>
                <p>
                    <italic> Refs</italic>
                </p>
                <p>
                    <italic> Mavaddat N, Michailidou K, Dennis J, Lush M, Fachal L, Lee A, Tyrer JP, Chen TH, Wang Q, Bolla MK, Yang X. Polygenic risk scores for prediction of breast cancer and breast cancer subtypes. The American Journal of Human Genetics. 2019 Jan 3;104(1):21-34.</italic>
                </p>
                <p>
                    <italic> </italic>
                </p>
                <p>
                    <italic> Lakeman, I.M.M., Rodr&#x00ed;guez-Girondo, M., Lee, A. et al. Validation of the BOADICEA model and a 313-variant polygenic risk score for breast cancer risk prediction in a Dutch prospective cohort. Genet Med 22, 1803&#x2013;1811 (2020). https://doi.org/10.1038/s41436-020-0884-4</italic>
                </p>
                <p> </p>
                <p> </p>
                <p> - What was the ROC AUC (and 95% CI) of the cancer risk score (Figure 4A)?&#x00a0;</p>
                <p> </p>
                <p> 
                    <italic>THE DRIVE AUC 0.5534 with 95% CI between [0.5505, 0.5563]. This has now been added to the manuscript. We note that this PRS is based solely upon meQTLs near driver genes and is not expected to be a strong predictor of risk relative to more comprehensive breast cancer polygenic scores. Rather we sought to reproduce effects on risk attributable solely to driver meQTLs in an independent cohort.&#x00a0;</italic>
                </p>
                <p> </p>
                <p> </p>
                <p> - Can the authors explain the discrepancy of the more &#x2018;linear&#x2019; increase in risk in the UKBB compared to DRIVE (Figure 4b versus 4c)?</p>
                <p> </p>
                <p> 
                    <italic>This discrepancy could be due to inherent differences in the composition of the two datasets. UKBB comes predominantly from volunteers in the UK and is more representative of disease incidence in the general population. The DRIVE study was designed specifically to analyze breast cancer risk, and therefore individuals were included to either represent breast cancer or serve as non-breast cancer controls.Breast cancer status in the UKBB was defined based on ICD-10 codes, and we excluded individuals from the controls if they had any ICD10 code associated with neoplasms, but the number of cases relative to controls was unbalanced for UKBB whereas it was balanced for DRIVE.&#x00a0; The difference in linearities between the patients could be due to differences in diversity in genotype and phenotype in the UKBB cohort compared to the DRIVE cohort.&#x00a0; There could also be discrepancies in environmental risk factors between these cohorts.&#x00a0;</italic>
                </p>
                <p> </p>
                <p> </p>
                <p> - Figure 6/TCGA survival: It might also be interesting to look at recurrence-free survival in addition to overall survival.</p>
                <p> </p>
                <p> 
                    <italic>We now include an analysis of disease free intervals as Figure SX. The methods have been updated accordingly.</italic>
                </p>
                <p> </p>
                <p> </p>
                <p> - In the section Oncogenes and tumor suppressor gene-related (&#x2026;): capitalise L in &#x2018;Clumped cancer mQTls&#x2019;</p>
                <p> </p>
                <p> 
                    <italic>We have now corrected this.</italic>&#x00a0;</p>
                <p> </p>
                <p> </p>
                <p> - Figures:</p>
                <p> </p>
                <p> - Figure 1b: Could the authors also indicate in the Figure legend that this is a Kruskal-Wallis p value.</p>
                <p> </p>
                <p> 
                    <italic>We have updated the figure legend to reflect the test used.</italic>
                </p>
                <p> </p>
                <p> </p>
                <p> - Figure 1c - For interpretability, it might be helpful to add at least y axis grid lines behind box plots. The effects may be significant and is visualised with the violin density plot, but are difficult to see using box plots.</p>
                <p> </p>
                <p> 
                    <italic>We added gridlines as suggested.</italic>
                </p>
                <p> </p>
                <p> </p>
                <p> - Figure 2 - takes some time to understand upon first reading. It might be helpful to label the blue bars in B and C with a legend &#x2018;expected&#x2019; and the red dot as &#x2018;observed&#x2019; to make it easier to grasp quickly.</p>
                <p> </p>
                <p> 
                    <italic>We have updated the legends to read observed and expected.&#x00a0;</italic>
                </p>
                <p> </p>
                <p> </p>
                <p> - Figure 4: The text in the caption for A refers to a &#x2018;Figure 8&#x2019; that does not exist. Please check this.</p>
                <p> </p>
                <p> 
                    <italic>Thank you for catching this. The erroneous figure reference has been removed from the caption.</italic>
                </p>
            </body>
        </sub-article>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report233320">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.152752.r233320</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Mehta</surname>
                        <given-names>Charu</given-names>
                    </name>
                    <xref ref-type="aff" rid="r233320a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-8644-3333</uri>
                </contrib>
                <aff id="r233320a1">
                    <label>1</label>Basic Sciences Division, Fred Hutchinson Cancer Center, Seattle, Washington, USA</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>17</day>
                <month>1</month>
                <year>2024</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2024 Mehta C</copyright-statement>
                <copyright-year>2024</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport233320" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.139476.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>"To obtain independent meQTLs, we clumped related meQTLs based on linkage disequilibrium using PLINK.</p>
            <p> Out of the 1.2 million SNPs, 60,602 remained after LD pruning (Table 1)."</p>
            <p> </p>
            <p> 
                <underline>Comment: Cite the database used as a source of meQTLs?</underline>&#x00a0;In the discussion, authors cited a database (46) but it is unclear if this is the same database they used to identify meQTLs.</p>
            <p> </p>
            <p> </p>
            <p> "(A) 5 state-based K-Means clustering of common TAD domains (n=1100) between 5 human cell lines (GM12878,</p>
            <p> HMEC, HUVEC, IMR90, and NHEK). Purple indicates TADs classified as a &#x201c;Mixed&#x201d;, Gray as &#x201c;Inactive-1&#x201d;, Light Blue as</p>
            <p> &#x201c;Active-1&#x201d;, Orange as &#x201c;Active-2&#x201d;, and Red as &#x201c;Inactive-2&#x201d;. Combining active and inactive categories leads to 222 Active,</p>
            <p> 626 Inactive, and 252 Mixed TADs."</p>
            <p> </p>
            <p> </p>
            <p> Comment: what are the x- and y-axes labels in 1A? Where are the five cell types indicated?</p>
            <p> </p>
            <p> </p>
            <p> Comment: What does &#x2018;other&#x2019; mean in Table 1?</p>
            <p> </p>
            <p> </p>
            <p> "Distributions suggested an increase in density of clumped meQTLs when transitioning from active</p>
            <p> to inactive regions, and conversely, a decrease from inactive to active regions (Kruskal-Wallis ANOVA, p-value&lt;0.05)</p>
            <p> when compared to the randomly shuffled distribution, but no shift in density for Active-Boundary-Active and Inactive-</p>
            <p> Boundary-Inactive categories (Figure 2B-D)."</p>
            <p> </p>
            <p> </p>
            <p> Comment: So what does any of this suggest??? expand?</p>
            <p> </p>
            <p> </p>
            <p> "In total, 103 oncogenes and 223 TSGs were used for this analysis, where only 67 of them contained meQTL-affecting CpG probes in their promoter regions (i.e. 49 TSGs and 18 oncogenes)."</p>
            <p> </p>
            <p> </p>
            <p> Comment: This suggests CpGs affect meQTLs but it&#x2019;s the other way round.</p>
            <p> </p>
            <p> </p>
            <p> "Clumped cancer meQTls were further narrowed to those associated with the methylation status of CpG probes located within the promoter regions of cancer driver genes including oncogenes and tumor suppressor genes (TSGs) from the</p>
            <p> COSMIC database.&#x201d; Comment: Also clarify if the correlation between methylation status of CpG vs meQTLs is also observed in normal tissues or only cancer tissues?</p>
            <p> </p>
            <p> </p>
            <p> "Overall, cancer meQTLs near 29 cancer genes were included in the model. The most predictive driver meQTL was associated MSH2, a gene associated with Lynch syndrome and increased risk of breast cancer.</p>
            <p> </p>
            <p> </p>
            <p> Polymorphic variation affecting the expression of EZH2, the second most informative feature, has also been linked to breast cancer risk.</p>
            <p> </p>
            <p> </p>
            <p> ASXL2 may be required for estrogen receptor alpha (ERa) activation in ERa positive breast cancers. Notably, EZH2 overexpression has been linked more strongly to triple negative breast suggesting that the model includes features predictive of multiple subtypes.&#x201d;</p>
            <p> </p>
            <p> </p>
            <p> Comment: 1. Since some of these meQTLs lie close to genes involved in epigenetic modifications --- have you looked if these are in the enhancer or otherwise defined regulatory domains?</p>
            <p> 2. Are these genes (MSH2, EZH2, ASXL2) known to be upregulated or downregulated in these risk cases? Does that agree with the prediction according to meQTLs?</p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Yes</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>I cannot comment. A qualified statistician is required.</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Partly</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Yes</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Partly</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>My area of research is gene regulation. I am able to assess significant parts of this manuscript, however, I am not a statistician or computational biologist, so I cannot speak to the soundness of their methods.</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <sub-article article-type="response" id="comment14253-233320">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Carter</surname>
                            <given-names>Hannah</given-names>
                        </name>
                        <aff>University of California, San Diego, USA</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>20</day>
                    <month>7</month>
                    <year>2025</year>
                </pub-date>
            </front-stub>
            <body>
                <p>"To obtain independent meQTLs, we clumped related meQTLs based on linkage disequilibrium using PLINK. Out of the 1.2 million SNPs, 60,602 remained after LD pruning (Table 1)."</p>
                <p> </p>
                <p> Comment: Cite the database used as a source of meQTLs? In the discussion, authors cited a database (46) but it is unclear if this is the same database they used to identify meQTLs.</p>
                <p> </p>
                <p> 
                    <italic>The data used as a source of the meQTLs are the Pancan-meQTLs from the cited number 46 [Gong J, et al.: Pancan-meQTL: a database to systematically evaluate the effects of genetic variants on methylation in human cancer. Nucleic Acids Res. 2019; 47: D1066&#x2013;D1072]</italic>
                </p>
                <p>
                    <italic> </italic>
                </p>
                <p>
                    <italic> We have modified the text to clarify this as follows:&#x00a0;</italic>
                </p>
                <p>
                    <italic> </italic>
                </p>
                <p>
                    <italic> &#x201c;To obtain independent meQTLs, we clumped related meQTLs from Gong et al (46) based on linkage disequilibrium using PLINK. Out of the 1.2 million SNPs, 60,602 remained after LD pruning (Table 1).&#x201d;</italic>
                </p>
                <p> </p>
                <p> </p>
                <p> "(A) 5 state-based K-Means clustering of common TAD domains (n=1100) between 5 human cell lines (GM12878, HMEC, HUVEC, IMR90, and NHEK). Purple indicates TADs classified as a &#x201c;Mixed&#x201d;, Gray as &#x201c;Inactive-1&#x201d;, Light Blue as &#x201c;Active-1&#x201d;, Orange as &#x201c;Active-2&#x201d;, and Red as &#x201c;Inactive-2&#x201d;. Combining active and inactive categories leads to 222 Active, 626 Inactive, and 252 Mixed TADs."</p>
                <p> </p>
                <p> Comment: what are the x- and y-axes labels in 1A? Where are the five cell types indicated?</p>
                <p> </p>
                <p> The x-axis are the 15 chromatin states that are used to cluster the TADs. The y-axis are all 1100 TADs that were used for the analysis and the 5 cell types are indicated through the legend shown on the left. The different colors represent the &#x201c;Mixed&#x201d;, &#x201c;Inactive-1&#x201d;, &#x201c;Active-1&#x201d;, &#x201c;Active-2&#x201d;, &#x201c;Inactive-2&#x201d;.</p>
                <p> </p>
                <p> 
                    <italic>We have updated the figure legend as follows:</italic>
                </p>
                <p>
                    <italic> </italic>
                </p>
                <p>
                    <italic> &#x201c;(A) 5 state-based K-Means clustering of common TAD domains shared across 5 human cell lines (GM12878, HMEC, HUVEC, IMR90, and NHEK). Shared TAD domains are on the y-axis (n=1100) and are grouped according to 15 chromatin states (x-axis). K-means clusters are shown as a side bar along the y-axis. Purple indicates TADs classified as a &#x201c;Mixed&#x201d;, Gray as &#x201c;Inactive-1&#x201d;, Light Blue as &#x201c;Active-1&#x201d;, Orange as &#x201c;Active-2&#x201d;, and Red as &#x201c;Inactive-2&#x201d;. Combining active and inactive categories leads to 222 Active, 626 Inactive, and 252 Mixed TADs.&#x201d;</italic>
                </p>
                <p> </p>
                <p> </p>
                <p> Comment: What does &#x2018;other&#x2019; mean in Table 1?</p>
                <p> </p>
                <p> 
                    <italic>Other represents all meQTLs that are in the inter-TAD region that aren&#x2019;t technically in the boundary region, since we defined the boundary region as +/- 50kb around the TAD boundaries. We have added a note to the table legend to clarify this.</italic>
                </p>
                <p>
                    <italic> </italic>
                </p>
                <p>
                    <italic> &#x201c;Other indicates meQTLs that are in the inter-TAD region but do not fall within the boundary region as defined.&#x201d;</italic>
                </p>
                <p> </p>
                <p> </p>
                <p> "Distributions suggested an increase in density of clumped meQTLs when transitioning from active to inactive regions, and conversely, a decrease from inactive to active regions (Kruskal-Wallis ANOVA, p-value&lt;0.05) when compared to the randomly shuffled distribution, but no shift in density for Active-Boundary-Active and Inactive-Boundary-Inactive categories (Figure 2B-D)."</p>
                <p> </p>
                <p> Comment: So what does any of this suggest??? expand?</p>
                <p> </p>
                <p> 
                    <italic>We have tried to further expand our interpretation of this observation in the discussion as follows:&#x00a0;</italic>
                </p>
                <p>
                    <italic> </italic>
                </p>
                <p>
                    <italic> &#x201c;It is of note that TAD boundaries conserved across cell types are reportedly highly enriched for evolutionary constraint and complex trait heritability.10 Our data suggest that variability in gene expression due to meQTLs is also evolutionarily more constrained in and around active TADs and their boundaries, consistent with these TAD boundaries playing a critical role in development (47). These results may suggest that TAD boundaries play a role in making the recruitment of regulatory machinery more specific, particularly as it pertains to DNA methylation.&#x201d;</italic>
                </p>
                <p> </p>
                <p> </p>
                <p> "In total, 103 oncogenes and 223 TSGs were used for this analysis, where only 67 of them contained meQTL-affecting CpG probes in their promoter regions (i.e. 49 TSGs and 18 oncogenes)."</p>
                <p> &#x200b;&#x200b;</p>
                <p> Comment: This suggests CpGs affect meQTLs but it&#x2019;s the other way round.</p>
                <p> </p>
                <p> 
                    <italic>We changed &#x201c;meQTL-affecting&#x201d; to &#x201c;meQTL-associated&#x201d;.</italic>
                </p>
                <p> </p>
                <p> </p>
                <p> "Clumped cancer meQTls were further narrowed to those associated with the methylation status of CpG probes located within the promoter regions of cancer driver genes including oncogenes and tumor suppressor genes (TSGs) from the COSMIC database.&#x201d;&#x00a0;</p>
                <p> </p>
                <p> Comment: Also clarify if the correlation between methylation status of CpG vs meQTLs is also observed in normal tissues or only cancer tissues?</p>
                <p> </p>
                <p> 
                    <italic>The meQTLs generated by Gong et al were based on methylation status at CpG markers as measured in bulk tumor tissues which generally include a mixture of tumor and normal cells (stroma and immune infiltrates). Nonetheless, it is not clear whether the effects detected here are biased toward effects on methylation that are positively selected in tumors which might not be reflected in normal tissues. This limitation is described in the discussion:&#x00a0;</italic>
                </p>
                <p>
                    <italic> </italic>
                </p>
                <p>
                    <italic> &#x201c;First, the meQTLs utilized for this study are derived from a study of tumors 46 which could be biased toward detecting meQTLs associated with DNA methylation events that are positively selected in tumors. &#x201c;</italic>
                </p>
                <p>
                    <italic> </italic>
                </p>
                <p>
                    <italic> and&#x00a0;</italic>
                </p>
                <p>
                    <italic> </italic>
                </p>
                <p>
                    <italic> &#x201c;In future studies, it would be of interest to study meQTL trends in normal tissue samples to see if enrichment patterns associated with cancer genes are driven by selection in tumors, or highlight evolutionary constraints more broadly associated with human health that coincidentally are advantageous for tumor development.&#x201d;</italic>
                </p>
                <p> </p>
                <p> 
                    <italic>We also believe that our original phrasing was confusing here. We have rephrased as follows:</italic>
                </p>
                <p>
                    <italic> </italic>
                </p>
                <p>
                    <italic> "Clumped cancer meQTLs were further narrowed to those whose corresponding affected CpG probes were within the promoter regions of cancer driver genes including oncogenes and tumor suppressor genes (TSGs) from the COSMIC database.&#x201d;&#x00a0;</italic>
                </p>
                <p> </p>
                <p> </p>
                <p> "Overall, cancer meQTLs near 29 cancer genes were included in the model. The most predictive driver meQTL was associated MSH2, a gene associated with Lynch syndrome and increased risk of breast cancer.</p>
                <p> </p>
                <p> Polymorphic variation affecting the expression of EZH2, the second most informative feature, has also been linked to breast cancer risk. ASXL2 may be required for estrogen receptor alpha (ERa) activation in ERa positive breast cancers. Notably, EZH2 overexpression has been linked more strongly to triple negative breast suggesting that the model includes features predictive of multiple subtypes.&#x201d;</p>
                <p> </p>
                <p> Comment:</p>
                <p> 1. Since some of these meQTLs lie close to genes involved in epigenetic modifications --- have you looked if these are in the enhancer or otherwise defined regulatory domains?</p>
                <p> </p>
                <p> 
                    <italic>The challenge here is that we do not know if the tag meQTL SNP is actually the causal SNP. We focused on meQTLs that affect CpG probes within the promoter regions of cancer genes but that does not preclude the possibility that they are affecting an enhancer. We have added this as a limitation of our study:</italic>
                </p>
                <p>
                    <italic> </italic>
                </p>
                <p>
                    <italic> &#x201c;For risk prediction, we focused on meQTLs and their corresponding CpG probes that are overlapping the promoter regions of known cancer genes, however we cannot be sure that these meQTLs are not also affecting other genes in the region, for example through effects on enhancer activity. &#x201d;</italic>
                </p>
                <p> </p>
                <p> </p>
                <p> 2. Are these genes (MSH2, EZH2, ASXL2) known to be upregulated or downregulated in these risk cases?</p>
                <p> Does that agree with the prediction according to meQTLs?</p>
                <p> </p>
                <p> 
                    <italic>Establishing the mechanism by which meQTLs drive risk would require tissue-relevant gene expression, methylation measurements and genotypes in a group of individuals prior to them developing breast cancer. In established breast cancers in TCGA, the relationship of gene expression with methylation is confounded by somatic alterations and copy number events and general dysregulation of gene expression networks making it difficult to determine what proportion of gene expression is attributable to meQTLs. We have added a statement about the need to further investigate meQTL effects on oncogene and tumor suppressor gene expression in healthy breast tissue to establish how these effects relate to cancer risk.</italic>
                </p>
                <p> </p>
                <p> 
                    <italic>&#x201c;More direct mechanistic insight might be gained by studying expression, genotype and methylation in healthy and pre-cancerous breast tissues and cell types. Studying the average expression of MSH2, EZH2, and ASXL2 within TCGA patients stratified by meQTL risk PRS suggested a potential decrease in expression of ASXL2 and EZH2 from in the highest PRS quantile relative to the lowest while MSH2 did not show much difference (Figure 5B). However, this difference needs to be studied further with more specific tumor sub-type stratification and cell type-specific expression. &#x201d;</italic>
                </p>
                <p>
                    <italic> </italic>
                </p>
                <p>
                    <italic> Figure 5 and caption have been updated as follows:</italic>
                </p>
                <p> </p>
                <p> 
                    <italic>&#x201c;A) Features are ranked according to their contribution to classifier predictive performance. Total importances sum to 1. B) Average expression of ASXL2, EZH2 and MSH2 in TCGA breast cancer samples, stratified by PRS quantile.&#x201d;</italic>
                </p>
            </body>
        </sub-article>
    </sub-article>
</article>
