<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="other" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.13522.2</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Research Note</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>Purine-rich low complexity regions are potential RNA binding hubs in the human genome</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 2; peer review: 3 approved]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Antonov</surname>
                        <given-names>Ivan</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-8330-6907</uri>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Medvedeva</surname>
                        <given-names>Yulia A.</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Funding Acquisition</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-7587-1666</uri>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a2">2</xref>
                    <xref ref-type="aff" rid="a3">3</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>Institute of Bioengineering, Research Center of Biotechnology, Russian Academy of Sciences, Moscow, Russian Federation</aff>
                <aff id="a2">
                    <label>2</label>Department of Biological and Medical Physics, Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, Russian Federation</aff>
                <aff id="a3">
                    <label>3</label>Vavilov institute of General Genetics, Russian Academy of Sciences, Moscow, Russian Federation</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:ju.medvedeva@gmail.com">ju.medvedeva@gmail.com</email>
                </corresp>
                <fn fn-type="conflict">
                    <p>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>9</day>
                <month>5</month>
                <year>2019</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2018</year>
            </pub-date>
            <volume>7</volume>
            <elocation-id>76</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>30</day>
                    <month>4</month>
                    <year>2019</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2019 Antonov I and Medvedeva YA</copyright-statement>
                <copyright-year>2019</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/7-76/pdf"/>
            <abstract>
                <p>Many long noncoding RNAs are bound to the chromatin and some of these interactions are mediated by triple helices. It is usually assumed that a transcript can form triplexes with a distinct set of genomic loci also known as triplex target sites (TTSs). Here we performed computational analyses of the TTSs that have been experimentally identified for particular RNAs. To assess the ability of these TTSs to bind other transcripts we developed a method to estimate the statistical significance of the predicted number of triplexes for a given RNA-DNA pair. We demonstrated that each DNA set included a subset of sequences that have a potential to form a statistically significant (adjusted p-value &lt; 0.01) number of triplexes with the majority (&gt; 90%) of the analyzed transcripts. Due to the predicted ability of these DNA sequences to interact with a wide range of different RNAs, we called them &#x201d;universal TTSs&#x201d;. While the universal TTSs were quite rare in the human genome (around 0.5%), they were more frequent (&gt; 15%) among the MEG3 binding sites (ChOP-seq peaks) and especially among the shared Capture-seq peaks (40%). The universal TTSs were enriched with the purine-rich low complexity regions. Nowadays, the role of the chromatin bound RNAs in the formation of 3D chromatin structure is actively discussed. We speculated that such universal TTSs may contribute to establishing long-distance chromosomal contacts and may facilitate distal enhancer-promoter interactions. All the scripts and the data files related to this study are available at 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/vanya-antonov/universal_tts">https://github.com/vanya-antonov/universal_tts</ext-link>
                </p>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>MEG3 lncRNA</kwd>
                <kwd>triple helix</kwd>
                <kwd>triplex target sites (TTS)</kwd>
            </kwd-group>
            <funding-group>
                <award-group id="fund-1" xlink:href="http://dx.doi.org/10.13039/501100006769">
                    <funding-source>Russian Science Foundation</funding-source>
                    <award-id>14-15-30002</award-id>
                </award-group>
                <funding-statement>This work was supported by the Russian Science Foundation&#13;
[grant 14-15-30002].</funding-statement>
                <funding-statement>
                    <italic>The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</italic>
                </funding-statement>
            </funding-group>
        </article-meta>
        <notes>
            <sec sec-type="version-changes">
                <label>Revised</label>
                <title>Amendments from Version 1</title>
                <p>As it was suggested by the reviewers, we significantly increased the number of the RNA sequences (from 6 to 306) that were used for the analysis and used the exact locations for all the peaks (instead of 3kb bins that were originally used). Additionally, we developed a probabilistic approach to estimate the statistical significance of each RNA-DNA interaction. Finally, we added the analysis of the recently published data obtained using the Capture-seq experiment. &#x00a0; The data has been expanded and uploaded to 
                    <ext-link ext-link-type="uri" xlink:href="http://doi.org/10.5281/zenodo.2654800">Zenodo </ext-link>and 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/vanya-antonov/universal_tts/tree/v1.0.0">GitHub</ext-link>. &#x00a0; We have updated the title of the article to: &#x2018;Purine-rich low complexity regions are potential RNA binding hubs in the human genome&#x2019;. &#x00a0; Nevertheless, with all these changes our main conclusion remained the same. Namely, we proposed that the purine rich low complexity genomic regions may have an ability to interact with various different RNAs. &#x00a0; We also speculate about possible biological role of these special genomic loci.</p>
            </sec>
        </notes>
    </front>
    <body>
        <sec sec-type="intro">
            <title>Introduction</title>
            <p>Many human long noncoding RNAs are localized in the nucleus and can potentially participate in chromatin formation and remodeling
                <sup>
                    <xref ref-type="bibr" rid="ref-1">1</xref>
                </sup>. Recently, technologies such as ChIRP
                <sup>
                    <xref ref-type="bibr" rid="ref-2">2</xref>
                </sup>, ChRIP
                <sup>
                    <xref ref-type="bibr" rid="ref-3">3</xref>
                </sup>, ChOP
                <sup>
                    <xref ref-type="bibr" rid="ref-4">4</xref>
                </sup>, CHART
                <sup>
                    <xref ref-type="bibr" rid="ref-5">5</xref>
                </sup>, RAP
                <sup>
                    <xref ref-type="bibr" rid="ref-6">6</xref>
                </sup>, MARGI
                <sup>
                    <xref ref-type="bibr" rid="ref-7">7</xref>
                </sup> and GRID
                <sup>
                    <xref ref-type="bibr" rid="ref-8">8</xref>
                </sup> have been developed to map the genomic interacting sites of various lncRNAs. RNA can interact with chromatin by associating with DNA binding proteins, nascent transcripts, single-stranded or double-stranded DNA, forming R-loops or triple helices, respectively. Growing body of evidence shows that RNA-DNA triplex formation based on the Hoogsteen
                <sup>
                    <xref ref-type="bibr" rid="ref-9">9</xref>
                </sup> base pairing rules plays important role in RNA-chromatin interactions. Several studies have provided 
                <italic toggle="yes">in vitro</italic> and 
                <italic toggle="yes">in vivo</italic> evidence for the existence and biological relevance of triplexes, including pRNA
                <sup>
                    <xref ref-type="bibr" rid="ref-10">10</xref>
                </sup>, Fendrr
                <sup>
                    <xref ref-type="bibr" rid="ref-11">11</xref>
                </sup>, Khps1
                <sup>
                    <xref ref-type="bibr" rid="ref-12">12</xref>
                </sup>, PARTICLE
                <sup>
                    <xref ref-type="bibr" rid="ref-13">13</xref>
                </sup>, and MEG3
                <sup>
                    <xref ref-type="bibr" rid="ref-4">4</xref>
                </sup>.</p>
            <p>Computational analyses have revealed that a large population of triplex-forming motifs is present across the human genome with the majority of annotated genes containing at least one triplex target site (TTS), preferentially in regulatory gene regions
                <sup>
                    <xref ref-type="bibr" rid="ref-14">14</xref>
                </sup>,
                <sup>
                    <xref ref-type="bibr" rid="ref-15">15</xref>
                </sup>. Considering the large number of purine-rich sequences in the genome, triplex-mediated targeting of lncRNAs and associated proteins to distinct genomic loci is very likely a commonly used mechanism of gene regulation. Still, there are only a few bioinformatic studies of triplex-based RNA-DNA interactions on the genome-wide scale
                <sup>
                    <xref ref-type="bibr" rid="ref-16">16</xref>
                </sup>.</p>
            <p>Here we analyze the genomic regions that are known to interact with the MEG3 lncRNA (the ChOP-seq peaks) or three different short oligos, corresponding to the DNA binding domains (DBDs) of MEG3 and GATA6-AS lncRNAs (the Capture-seq peaks). The current literature usually assumes that the triplex-based interactions have high sequence specificity and each triplex forming oligonucleotide (a transcript region) has a distinct set of genomic binding sites ("triplexome"). We investigate whether all the DNA sites capable of triplex formation are specific enough to be regulated by one particular RNA only or whether different transcripts may have shared TTSs. Our computational analysis revealed a group of genomic regions that may have a very high propensity for triplex formation with a wide range of different RNAs. Therefore, we named such DNA sequences "universal TTSs". We also attempted to reveal the features of these sequences that may be responsible for the observed phenomenon.</p>
        </sec>
        <sec sec-type="methods">
            <title>Methods</title>
            <p>The genomic coordinates of the 6837 MEG3 binding sites
                <sup>
                    <xref ref-type="bibr" rid="ref-4">4</xref>
                </sup> (ChOP-seq peaks) were mapped from the hg19 to the hg38 human genome version using liftOver
                <sup>
                    <xref ref-type="bibr" rid="ref-17">17</xref>
                </sup> (Nov 7, 2017 version). Next, from the 6800 successfully converted peaks we removed two cases corresponding to the genomic regions with ambiguous base-pairs (N) keeping the 6798 ChOP-seq peaks for the analysis. Additionally, to simulate the genomic background 6798 control regions with the lengths matching the selected ChOP-seq peaks were randomly sampled from the human genome using the bedtools
                <sup>
                    <xref ref-type="bibr" rid="ref-18">18</xref>
                </sup> (version 2.27.1, see 
                <xref ref-type="other" rid="SF1">Supplementary Figure 1</xref>).</p>
            <p>Triplex-based interactions were predicted by the Triplexator
                <sup>
                    <xref ref-type="bibr" rid="ref-14">14</xref>
                </sup> (version 1.3.2) with the following parameters: 
                <monospace>-fr off -l 10 -e 10</monospace>. These values were optimized so that the tool could predict binding between all three RNA-DNA sequence pairs that have been validated 
                <italic toggle="yes">in vitro</italic> in the original study
                <sup>
                    <xref ref-type="bibr" rid="ref-4">4</xref>
                </sup> (
                <xref ref-type="other" rid="SF1">Supplementary Table 1</xref>). To detect the statistically significant RNA-DNA interactions we developed a method to estimate a p-value from the number of predicted triple helices. Since the MEG3 peaks have different lengths, the expected number of triplexes (i.e. the parameter of the Poisson distribution) is computed based on the lengths of the input RNA and DNA sequences (see below).</p>
            <p>The MEG3 binding sites have been identified in the triple negative breast cancer cell line BT-549
                <sup>
                    <xref ref-type="bibr" rid="ref-4">4</xref>
                </sup>. To identify all the genes expressed in this cell line we used the RNA-seq data from the control knockdown experiment (ERR652847). The reads were aligned to the human genome (hg38) using HISAT2
                <sup>
                    <xref ref-type="bibr" rid="ref-19">19</xref>
                </sup> (version 2.1.0) and the number of reads corresponding to each GENCODE
                <sup>
                    <xref ref-type="bibr" rid="ref-20">20</xref>
                </sup> (version 28) transcript was calculated by the HTSeq-count tool
                <sup>
                    <xref ref-type="bibr" rid="ref-21">21</xref>
                </sup> (version 0.10.0). Next, the RPKM values were computed as RPKM = 
                <italic toggle="yes">C /</italic>(
                <italic toggle="yes">N &#x00d7; L</italic>), where 
                <italic toggle="yes">C</italic> is the number of reads aligned to all the transcript exons, 
                <italic toggle="yes">N</italic> is the total number of mapped reads (in millions) and 
                <italic toggle="yes">L</italic> is the transcript length (in kilobases). The most highly expressed (in terms of RPKM) isoform of each gene was considered only. Next, 153 expressed transcripts with the length and GC content similar to the MEG3 lncRNA (NR_002766.2) were selected using the RANN (version 2.6) R package (
                <xref ref-type="other" rid="SF1">Supplementary Figure 2</xref>). Additionally, 153 random RNA sequences were obtained by di-nucleotide shuffling the original MEG3 transcript using the uShuffle tool
                <sup>
                    <xref ref-type="bibr" rid="ref-22">22</xref>
                </sup>.</p>
            <p>All the heatmaps were generated using the Complex-Heatmap
                <sup>
                    <xref ref-type="bibr" rid="ref-23">23</xref>
                </sup> R package. The alignment of the RNA oligo sequences was obtained by the MUSCLE
                <sup>
                    <xref ref-type="bibr" rid="ref-24">24</xref>
                </sup> (version 3.8). The locations of the RepeatMasker repeats in the human genome were downloaded from the UCSC Genome Browser
                <sup>
                    <xref ref-type="bibr" rid="ref-17">17</xref>
                </sup>.</p>
            <sec>
                <title>Calculation of the statistical significance of the predicted triplexes</title>
                <p>For a given pair of RNA and DNA sequences, Triplexator outputs all the possible triple helices that satisfy the user-defined thresholds. Notably, the number of the predicted triplexes increases with the lengths of input sequences (
                    <xref ref-type="other" rid="SF1">Supplementary Figure 3A</xref>). To account for this dependence the normalized number of triplexes (i.e. the "triplex potential" or 
                    <italic toggle="yes">t
                        <sub>pot</sub>
                    </italic>) is also computed by the Triplexator. Although this allows to compare triplexes predicted for RNA-DNA pairs with different lengths, it does not provide information about significance of these interactions.</p>
                <p>To estimate the probability to observe a particular number of triplexes by chance (e.g. from the random sequences with the same lengths) we analyzed the average number of predicted triplexes between random sequences of various lengths. Namely, we considered four different RNA lengths (
                    <italic toggle="yes">L
                        <sub>RNA</sub>
                    </italic> = {500, 1000, 1500, 2000}) and ten different DNA lengths (
                    <italic toggle="yes">L
                        <sub>DNA</sub>
                    </italic> = {200, 400, ..., 1800, 2000}). For each of the 40 combinations of (
                    <italic toggle="yes">L
                        <sub>RNA</sub>
                    </italic>, 
                    <italic toggle="yes">L
                        <sub>DNA</sub>
                    </italic>), 100 random sequences with the length of 
                    <italic toggle="yes">L
                        <sub>RNA</sub>
                    </italic> and 100 random sequences with the length of 
                    <italic toggle="yes">L
                        <sub>DNA</sub>
                    </italic> were generated (with the equal frequencies for all the four nucleotides).</p>
                <p>For each RNA-DNA pair triple helices were predicted by the Triplexator with the parameters optimized for MEG3 lncRNA (
                    <monospace>-fr off -l 10 -e 10</monospace>). Next, for every combination of (
                    <italic toggle="yes">L
                        <sub>RNA</sub>
                    </italic>, 
                    <italic toggle="yes">L
                        <sub>DNA</sub>
                    </italic>) the average number of predicted triplexes (
                    <italic toggle="yes">&#x03bb;</italic>) was computed from all the 10000 predictions (
                    <xref ref-type="other" rid="SF1">Supplementary Table 2</xref>). Finally, a linear regression model for 
                    <italic toggle="yes">&#x03bb;</italic> was fitted to all the obtained values (adjusted 
                    <italic toggle="yes">R</italic>
                    <sup>2</sup> = 87%, see 
                    <xref ref-type="other" rid="SF1">Supplementary Figure 3B</xref>): 
                    <disp-formula id="e1">
                        <mml:math display="block" id="math1">
                            <mml:mrow>
                                <mml:mi>&#x03bb;</mml:mi>
                                <mml:mo stretchy="false">(</mml:mo>
                                <mml:msub>
                                    <mml:mi>L</mml:mi>
                                    <mml:mrow>
                                        <mml:mi>R</mml:mi>
                                        <mml:mi>N</mml:mi>
                                        <mml:mi>A</mml:mi>
                                    </mml:mrow>
                                </mml:msub>
                                <mml:mo>,</mml:mo>
                                <mml:msub>
                                    <mml:mi>L</mml:mi>
                                    <mml:mrow>
                                        <mml:mi>D</mml:mi>
                                        <mml:mi>N</mml:mi>
                                        <mml:mi>A</mml:mi>
                                    </mml:mrow>
                                </mml:msub>
                                <mml:mo stretchy="false">)</mml:mo>
                                <mml:mo>=</mml:mo>
                                <mml:msub>
                                    <mml:mi>&#x03b8;</mml:mi>
                                    <mml:mn>0</mml:mn>
                                </mml:msub>
                                <mml:mo>+</mml:mo>
                                <mml:msub>
                                    <mml:mi>&#x03b8;</mml:mi>
                                    <mml:mn>1</mml:mn>
                                </mml:msub>
                                <mml:mo>&#x00d7;</mml:mo>
                                <mml:msub>
                                    <mml:mi>L</mml:mi>
                                    <mml:mrow>
                                        <mml:mi>R</mml:mi>
                                        <mml:mi>N</mml:mi>
                                        <mml:mi>A</mml:mi>
                                    </mml:mrow>
                                </mml:msub>
                                <mml:mo>+</mml:mo>
                                <mml:msub>
                                    <mml:mi>&#x03b8;</mml:mi>
                                    <mml:mn>2</mml:mn>
                                </mml:msub>
                                <mml:mo>&#x00d7;</mml:mo>
                                <mml:msub>
                                    <mml:mi>L</mml:mi>
                                    <mml:mrow>
                                        <mml:mi>D</mml:mi>
                                        <mml:mi>N</mml:mi>
                                        <mml:mi>A</mml:mi>
                                    </mml:mrow>
                                </mml:msub>
                                <mml:mspace width="12em"/>
                                <mml:mo stretchy="false">(</mml:mo>
                                <mml:mn>1</mml:mn>
                                <mml:mo stretchy="false">)</mml:mo>
                            </mml:mrow>
                        </mml:math>
                    </disp-formula> where 
                    <italic toggle="yes">&#x03b8;</italic>
                    <sub>0</sub> = &#x2212;0.688, 
                    <italic toggle="yes">&#x03b8;</italic>
                    <sub>1</sub> = 5.37 
                    <italic toggle="yes">&#x00d7;</italic> 10
                    <sup>&#x2212;4</sup> and 
                    <italic toggle="yes">&#x03b8;</italic>
                    <sub>2</sub> = 6.03 
                    <italic toggle="yes">&#x00d7;</italic> 10
                    <sup>&#x2212;4</sup>.</p>
                <p>Thus, the statistical significance of the number of triple helices 
                    <italic toggle="yes">N
                        <sub>tpx</sub>
                    </italic>, predicted between RNA of length 
                    <italic toggle="yes">L
                        <sub>RNA</sub>
                    </italic> and DNA of length 
                    <italic toggle="yes">L
                        <sub>DNA</sub>
                    </italic>, can be estimated as follows. First, the expected average number of predicted triplexes (
                    <italic toggle="yes">&#x03bb;</italic>) is computed from the 
                    <xref ref-type="other" rid="e1">equation (1)</xref>. Next, the expected distribution of the number of predicted triplexes (
                    <italic toggle="yes">H</italic>
                    <sub>0</sub>) is simulated by the Poisson distribution with the obtained value of 
                    <italic toggle="yes">&#x03bb;</italic> (
                    <xref ref-type="other" rid="SF1">Supplementary Figure 3C</xref>). Finally, the p-value of the observed number of triple helices (
                    <italic toggle="yes">N
                        <sub>tpx</sub>
                    </italic>) can be estimated as follows: 
                    <disp-formula>
                        <mml:math display="block" id="math2">
                            <mml:mrow>
                                <mml:mtext>P-value(</mml:mtext>
                                <mml:msub>
                                    <mml:mi>N</mml:mi>
                                    <mml:mrow>
                                        <mml:mi>t</mml:mi>
                                        <mml:mi>p</mml:mi>
                                        <mml:mi>x</mml:mi>
                                    </mml:mrow>
                                </mml:msub>
                                <mml:mtext>)</mml:mtext>
                                <mml:mo>=</mml:mo>
                                <mml:mi>P</mml:mi>
                                <mml:mo stretchy="false">(</mml:mo>
                                <mml:mi>X</mml:mi>
                                <mml:mo>&#x2265;</mml:mo>
                                <mml:msub>
                                    <mml:mi>N</mml:mi>
                                    <mml:mrow>
                                        <mml:mi>t</mml:mi>
                                        <mml:mi>p</mml:mi>
                                        <mml:mi>x</mml:mi>
                                    </mml:mrow>
                                </mml:msub>
                                <mml:mo>|</mml:mo>
                                <mml:mi>X</mml:mi>
                                <mml:mo>&#x223c;</mml:mo>
                                <mml:mtext>Pois(</mml:mtext>
                                <mml:mi>&#x03bb;</mml:mi>
                                <mml:mo stretchy="false">)</mml:mo>
                                <mml:mtext>)</mml:mtext>
                                <mml:mspace width="12em"/>
                                <mml:mo stretchy="false">(</mml:mo>
                                <mml:mn>2</mml:mn>
                                <mml:mo stretchy="false">)</mml:mo>
                            </mml:mrow>
                        </mml:math> </disp-formula>
                </p>
                <p>Importantly, the 
                    <italic toggle="yes">N
                        <sub>tpx</sub>
                    </italic> value is taken from the "Total (abs)" column of the 
                    <monospace>triplex_search.summary file</monospace>. The same value is used by the Triplexator to compute its "triplex potential" (
                    <italic toggle="yes">t
                        <sub>pot</sub>
                    </italic>). The &#x2019;Total (abs)&#x2019; is the total number of 
                    <italic toggle="yes">all possible</italic> triplexes that satisfy the user-defined thresholds (overlaps are allowed). Thus, for a single triplex longer than the minimal length (10 in our settings), the &#x2019;Total (abs)&#x2019; value may be greater than 1. For example, 11 bp DNA fragment 
                    <monospace>5&#x2019;-GAGAGAGAGAG-3&#x2019;</monospace> and 11 nt RNA oligo 
                    <monospace>5&#x2019;-GAGAGAGAGAG-3&#x2019;</monospace> can interact with each other forming one long anti-parallel triplex without mismatches. However, with the minimal allowed triplex length set to 10, the 
                    <italic toggle="yes">N
                        <sub>tpx</sub>
                    </italic> is equal to 3. This includes the long triplex of length 11 as well as the two triplexes of length 10 without the first or the last position of the long triplex. Therefore, a single long triplex is likely to produce a large 
                    <italic toggle="yes">N
                        <sub>tpx</sub>
                    </italic> value and, consequently, a statistically significant p-value.</p>
            </sec>
            <sec>
                <title>Calculation of purine and poly-purine contents</title>
                <p>Due to the properties of the Watson-Crick base pairing model, the GC content of a sequence corresponding to the forward (+) DNA strand is equal to the GC content of the reverse (-) DNA strand. However, the GA content is more important for triplex based interactions because RNA can only form triple helices with the purines in the DNA. In contrast to the GC content, the GA content can be different between the DNA strands. Moreover, if one strand is purine rich, the other strand is automatically purine poor. For example, for the DNA sequence 
                    <monospace>5&#x2019;-GGGGGAGA-3&#x2019;</monospace> the purine content of the direct strand is 100%, while the other strand (
                    <monospace>3&#x2019;-CCCCCTCT-5&#x2019;</monospace>) has no purines at all (i.e. its purine content is 0%).</p>
                <p>Thus, we define the purine content of a given DNA fragment as the maximum value between the two strands, i.e.: 
                    <disp-formula id="e3">
                        <mml:math display="block" id="math3">
                            <mml:mrow>
                                <mml:mtext>GA-content</mml:mtext>
                                <mml:mo>=</mml:mo>
                                <mml:mtext>&#x2009;</mml:mtext>
                                <mml:msubsup>
                                    <mml:mtext>&#x2009;</mml:mtext>
                                    <mml:mrow>
                                        <mml:mi mathsize="normal">s</mml:mi>
                                        <mml:mo mathsize="normal">=</mml:mo>
                                        <mml:mrow>
                                            <mml:mo mathsize="normal">{</mml:mo>
                                            <mml:mrow>
                                                <mml:mo mathsize="normal">+</mml:mo>
                                                <mml:mo mathsize="normal">,</mml:mo>
                                                <mml:mo mathsize="normal">-</mml:mo>
                                            </mml:mrow>
                                            <mml:mo mathsize="normal">}</mml:mo>
                                        </mml:mrow>
                                    </mml:mrow>
                                    <mml:mrow>
                                        <mml:mspace width="1em"/>
                                        <mml:mi mathsize="normal">max</mml:mi>
                                        <mml:mo>&#x2061;</mml:mo>
                                    </mml:mrow>
                                </mml:msubsup>
                                <mml:mfrac>
                                    <mml:mrow>
                                        <mml:mtext>NumPurines</mml:mtext>
                                        <mml:mo stretchy="false">(</mml:mo>
                                        <mml:msup>
                                            <mml:mrow>
                                                <mml:mtext>DNA</mml:mtext>
                                            </mml:mrow>
                                            <mml:mi>s</mml:mi>
                                        </mml:msup>
                                        <mml:mo stretchy="false">)</mml:mo>
                                    </mml:mrow>
                                    <mml:mrow>
                                        <mml:mtext>Length</mml:mtext>
                                        <mml:mo stretchy="false">(</mml:mo>
                                        <mml:mtext>DNA</mml:mtext>
                                        <mml:mo stretchy="false">)</mml:mo>
                                    </mml:mrow>
                                </mml:mfrac>
                                <mml:mspace width="12em"/>
                                <mml:mo stretchy="false">(</mml:mo>
                                <mml:mn>3</mml:mn>
                                <mml:mo stretchy="false">)</mml:mo>
                            </mml:mrow>
                        </mml:math> </disp-formula> where 
                    <italic toggle="yes">DNA</italic>
                    <sup>+</sup> (
                    <italic toggle="yes">DNA</italic>
                    <sup>-</sup>) denotes the sequence corresponding to the forward (reverse) DNA strand and 
                    <italic toggle="yes">NumPurines(DNA
                        <sup>s</sup>
                    </italic>) is the total number of G or A nucleotides present in the DNA strand 
                    <italic toggle="yes">s</italic>.</p>
                <p>It should be noted that the purine content computed by the formula (3) is always 
                    <italic toggle="yes">&#x2265;</italic> 50%. To work with a measure that is defined from 0% to 100%, we introduce the 
                    <italic toggle="yes">poly-purine content</italic>. We define a poly-purine element 
                    <italic toggle="yes">P
                        <sup>s</sup>
                    </italic> as a continuous stretch of 10 or more purines located on the DNA strand 
                    <italic toggle="yes">s</italic> (where 
                    <italic toggle="yes">s</italic> = {+, &#x2212;}) . For a given DNA sequence that has 
                    <italic toggle="yes">N</italic>
                    <sup>+</sup> poly-purine elements on the forward strand and 
                    <italic toggle="yes">N</italic>
                    <sup>&#x2212;</sup> poly-purine elements on the reverse strand, the poly-purine content is computed as follows: 
                    <disp-formula>
                        <mml:math display="block" id="math4">
                            <mml:mrow>
                                <mml:mtext>Poly-GA</mml:mtext>
                                <mml:mspace width="0.3em"/>
                                <mml:mtext>content</mml:mtext>
                                <mml:mo>=</mml:mo>
                                <mml:mtext>&#x2009;</mml:mtext>
                                <mml:msubsup>
                                    <mml:mtext>&#x2009;</mml:mtext>
                                    <mml:mrow>
                                        <mml:mi mathsize="normal">s</mml:mi>
                                        <mml:mo mathsize="normal">=</mml:mo>
                                        <mml:mrow>
                                            <mml:mo mathsize="normal">{</mml:mo>
                                            <mml:mrow>
                                                <mml:mo mathsize="normal">+</mml:mo>
                                                <mml:mo mathsize="normal">,</mml:mo>
                                                <mml:mo mathsize="normal">-</mml:mo>
                                            </mml:mrow>
                                            <mml:mo mathsize="normal">}</mml:mo>
                                        </mml:mrow>
                                    </mml:mrow>
                                    <mml:mrow>
                                        <mml:mspace width="1em"/>
                                        <mml:mi mathsize="normal">max</mml:mi>
                                        <mml:mo>&#x2061;</mml:mo>
                                    </mml:mrow>
                                </mml:msubsup>
                                <mml:mfrac>
                                    <mml:mrow>
                                        <mml:mstyle displaystyle="true">
                                            <mml:msubsup>
                                                <mml:mo>&#x2211;</mml:mo>
                                                <mml:mrow>
                                                    <mml:mi>i</mml:mi>
                                                    <mml:mo>=</mml:mo>
                                                    <mml:mn>1</mml:mn>
                                                </mml:mrow>
                                                <mml:mrow>
                                                    <mml:msup>
                                                        <mml:mi>N</mml:mi>
                                                        <mml:mi>s</mml:mi>
                                                    </mml:msup>
                                                </mml:mrow>
                                            </mml:msubsup>
                                            <mml:mrow>
                                                <mml:mtext>Length</mml:mtext>
                                                <mml:mo stretchy="false">(</mml:mo>
                                                <mml:msubsup>
                                                    <mml:mi>P</mml:mi>
                                                    <mml:mi>i</mml:mi>
                                                    <mml:mi>s</mml:mi>
                                                </mml:msubsup>
                                                <mml:mo stretchy="false">)</mml:mo>
                                            </mml:mrow>
                                        </mml:mstyle>
                                    </mml:mrow>
                                    <mml:mrow>
                                        <mml:mtext>Length</mml:mtext>
                                        <mml:mo stretchy="false">(</mml:mo>
                                        <mml:mtext>DNA</mml:mtext>
                                        <mml:mo stretchy="false">)</mml:mo>
                                    </mml:mrow>
                                </mml:mfrac>
                                <mml:mspace width="12em"/>
                                <mml:mo stretchy="false">(</mml:mo>
                                <mml:mn>4</mml:mn>
                                <mml:mo stretchy="false">)</mml:mo>
                            </mml:mrow>
                        </mml:math> </disp-formula> where 
                    <italic toggle="yes">Length</italic>(
                    <inline-formula>
                        <mml:math display="inline" id="M5">
                            <mml:mrow>
                                <mml:msubsup>
                                    <mml:mi>P</mml:mi>
                                    <mml:mi>i</mml:mi>
                                    <mml:mi>s</mml:mi>
                                </mml:msubsup>
                            </mml:mrow>
                        </mml:math>
                    </inline-formula>) is the length of the poly-purine element 
                    <italic toggle="yes">i</italic> on the DNA strand 
                    <italic toggle="yes">s</italic>, i.e. according to the above definition Length(
                    <inline-formula>
                        <mml:math display="inline" id="M6">
                            <mml:mrow>
                                <mml:msubsup>
                                    <mml:mi>P</mml:mi>
                                    <mml:mi>i</mml:mi>
                                    <mml:mi>s</mml:mi>
                                </mml:msubsup>
                            </mml:mrow>
                        </mml:math>
                    </inline-formula>) 
                    <italic toggle="yes">&#x2265;</italic> 10.</p>
            </sec>
        </sec>
        <sec sec-type="results">
            <title>Results</title>
            <p>We used Triplexator to predict possible triple helices between the full length MEG3 transcript and the 6798 experimentally identified ChOP-seq peaks as well as 6798 control DNA regions (see Methods). The genomic sites with a statistically significant number of triplexes were identified in each DNA set by our custom probabilistic approach (see Methods). As anticipated, more statistically significant (Bonferroni adjusted p-value &lt; 0.01) interactions with the MEG3 lncRNA were predicted for the ChOP-seq peaks than for the control regions. Namely, the interactions with the 3825 (56.3%) ChOP-seq peaks were classified as statistically significant (
                <xref ref-type="fig" rid="f1">Figure 1A</xref>, left) while there were only 617 (9.1%, odds ratio test p-value &lt; 2.2 
                <italic toggle="yes">&#x00d7;</italic> 10
                <sup>&#x2212;16</sup>) such cases among the control DNA regions (
                <xref ref-type="fig" rid="f1">Figure 1B</xref>, left). Since the ChOP-seq method has detected RNA contacts with the chromatin (and not the naked DNA) the obtained binding sites can correspond to several different interaction mechanisms including direct RNA-DNA interactions via triple helices or R-loops, RNA-RNA hybridization with nascent transcripts as well as bindings to nuclear proteins. This may be the reason that many MEG3 binding sites did not produce statistically significant predictions with the MEG3 lncRNA. Therefore, these results supported the original conclusion that the MEG3 lncRNA is able to directly interact with the genomic DNA via triple helices.</p>
            <p>To check the ability of other RNAs to form triplexes with MEG3 binding sites, we applied Triplexator to a set of 153 expressed transcripts (see Methods). Surprisingly, 65 analyzed RNAs showed results similar to MEG3 lncRNA &#x2013; they were predicted to form statistically significant interactions with the majority (&gt; 50%) of the ChOP-seq peaks (
                <xref ref-type="fig" rid="f1">Figure 1A</xref>, middle). To further investigate possible interactions with the ChOP-seq peaks, 153 artificial sequences were generated by di-nucleotide shuffling of the MEG3 transcript (see Methods). Strikingly, these random "RNAs" produced statistically significant number of triplexes with 39% and 9% of the ChOP-seq peaks and control DNA regions, respectively (
                <xref ref-type="fig" rid="f1">Figure 1A, B</xref>, right). These results indicated that the set of the ChOP-seq peaks was different from the randomly sampled genomic sites in that it contained a number of DNA sequences that may be able to interact not only with the MEG3 lncRNA, but with other RNAs as well.</p>
            <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                <label>Figure 1. </label>
                <caption>
                    <p>(
                        <bold>A</bold>,
                        <bold>B</bold>) The number of the DNA sequences from (
                        <bold>A</bold>) the ChOP-seq or (
                        <bold>B</bold>) the control genomi set with the statistically significant number of predicted triplexes for different query RNAs (the black dots). (
                        <bold>C</bold>,
                        <bold>D</bold>,
                        <bold>E</bold>) The heat maps of the 
                        <italic toggle="yes">&#x2013;</italic> log
                        <sub>10</sub> (adjusted p-value) corresponding to the predicted triplexes between the 307 different query RNAs (columns) and (
                        <bold>C</bold>) all the ChOP-seq peaks, (
                        <bold>D</bold>) the control genomic sites or (
                        <bold>E</bold>) the Shared Capture-seq peaks (rows). The universal TTSs were identified based on their interactions with the 153 expressed transcripts (left part of each heat map) and visualized as a separate (top) cluster. The MEG3 column was intentionally drawn wider. The blue color corresponds to the RNA-DNA pairs with adjusted p-value = 1 (including cases where no triplexes were predicted). (
                        <bold>F</bold>) Repeat classes present in different sets of genomic regions.</p>
                </caption>
                <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/21026/2ef9c0c1-4860-41bb-b85d-9861245e3dcf_figure1.gif"/>
            </fig>
            <p>Based on these observations we hypothesized that some of the MEG3-bound genomic sites may be &#x2019;universal&#x2019;, i.e. they may have a potential to form multiple triplexes with a number of different RNAs. Analysis of the Triplexator predictions obtained for the 153 expressed RNAs revealed 1107 (16.3%) ChOP-seq peaks that were predicted to form statistically significant number of triplexes with more than 90% of the analyzed transcripts (
                <xref ref-type="other" rid="SF1">Supplementary Figure 4A</xref>). In contrast, the genomic background contained only 41 (0.6%) such sites (
                <xref ref-type="other" rid="SF1">Supplementary Figure 4B</xref>). Due to the predicted ability of these genomic regions to form triple helices with various RNAs, we called them "universal triplex target sites (TTSs)". Notably, the identified universal TTSs produced strong p-values for the MEG3 lncRNA as well as for the 153 MEG3 shuffled sequences (
                <xref ref-type="fig" rid="f1">Figure 1C, D</xref>). Thus, according to our predictions some of the DNA sequences were more prone to formation of triple helices with different long RNAs and a number of such genomic regions were present among the experimentally identified MEG3 binding sites (ChOP-seq peaks).</p>
            <p>To further investigate the predicted ability of the universal TTSs to bind various RNAs we analyzed the results of the recent Capture-seq experiment
                <sup>
                    <xref ref-type="bibr" rid="ref-25">25</xref>
                </sup>. This 
                <italic toggle="yes">in vitro</italic> study has determined genomic binding sites of three different short RNA oligos that corresponded to the DNA-binding domains (DBDs) of the MEG3 and GATA6-AS lncRNAs. Namely, the 
                <italic toggle="yes">MEG3_13_41</italic>, 
                <italic toggle="yes">GATA6_AS_78_118</italic> and 
                <italic toggle="yes">MEG3_839_890</italic> oligos were 28, 40 and 48 nt long, respectively. Since the experiment has been performed on the RNA- and protein-free ("naked") genomic DNA, the majority of the identified interactions have been assumed to be direct and mediated by triple helices. Comparison of the genomic coordinates corresponding to the identified target DNA fragments demonstrated that most of the interactions were specific to one oligo only (
                <xref ref-type="other" rid="SF1">Supplementary Figure 5</xref>). This can be explained by the fact that the oligo sequences had limited similarity with each other (the identities between the 
                <italic toggle="yes">MEG3_13_41</italic>-
                <italic toggle="yes">GATA6_AS_78_118</italic>, 
                <italic toggle="yes">MEG3_839_890</italic>-
                <italic toggle="yes">GATA6_AS_78_118</italic> and 
                <italic toggle="yes">MEG3_13_41</italic>-
                <italic toggle="yes">MEG3_839_890</italic> oligo pairs were 33%, 40% and 25%, respectively &#x2013; 
                <xref ref-type="other" rid="SF1">Supplementary Figure 6</xref>). Still, 5379 genomic regions were captured by each of the three oligos (
                <xref ref-type="other" rid="SF1">Supplementary Figure 5</xref>). Thus, we expected that this set of &#x2019;shared Capture-seq peaks&#x2019; can be enriched with the potential universal TTSs. To check this we predicted their possible interactions with the 153 RNA sequences that were used in the analysis of the ChOP-seq peaks (see above). Indeed, 2151 (40%) shared Capture-seq peaks were classified as universal TTSs &#x2013; they had statistically significant number of triplexes with most (&gt; 90%) of the analyzed transcripts (
                <xref ref-type="fig" rid="f1">Figure 1E</xref> and 
                <xref ref-type="other" rid="SF1">Supplementary Figure 4C</xref>). Therefore, the fact that the experimentally identified set of shared Capture-seq peaks contained such a high fraction of the universal TTSs indirectly confirmed the predicted property of these special genomic loci.</p>
            <p>Finally, we attempted to reveal the features of the universal TTSs that may allow them to interact with several different RNAs. For this purpose we compared sequence composition of the universal and all the other (i.e. non-universal) genomic regions from each set. While the GC content of the universal and non-universal DNA sequences were similar, the universal TTSs had higher purine (G or A) and, especially, poly-purine content (see 
                <xref ref-type="other" rid="SF1">Supplementary Figure 7</xref> and Methods for the definitions). To find out the origin of these poly-purine elements we analyzed the classes of the overlapping genomic repeats. All three sets of the universal TTSs were enriched with the purine-rich low complexity regions, LCRs (
                <xref ref-type="fig" rid="f1">Figure 1F</xref> and 
                <xref ref-type="other" rid="SF1">Supplementary Figure 8</xref>). Additional analysis of several universal TTSs confirmed that these LCRs were predicted to form multiple triple helices with the majority of the analyzed transcripts (see 
                <xref ref-type="other" rid="SF1">Supplementary Figure 9</xref> for representative cases). Therefore, the presence of the purine-rich low complexity elements was the characteristic property of the universal TTSs that potentially allowed them to interact with a wide range of different RNAs. All together our results suggested the existence of a special type of genomic loci that may function as RNA-binding hubs.</p>
        </sec>
        <sec sec-type="discussion">
            <title>Discussion</title>
            <p>The importance of triplex-dependent gene regulation in the genomes of higher organisms is becoming a generally accepted concept. Here we performed a large-scale bioinformatic analysis of the genomic regions (ChOP-seq and Capture-seq peaks) that have been shown experimentally to interact with particular RNAs (MEG3 lncRNA or short oligos). To filter out not significant Triplexator predictions, the statistical significance of every RNA-DNA interaction was estimated from the Poisson distribution. To our surprise for some genomic regions (that we called "universal triplex target sites") Triplexator predicted statistically significant (adjusted p-value &lt; 0.01) number of triplexes with the majority (&gt; 90%) of the analyzed transcripts. According to our analysis universal TTSs are quite rare in the human genome &#x2013; there were only 0.6% of them among the 6798 randomly sampled regions. On the other hand, 16.3% of the experimentally identified MEG3 binding sites (ChOP-seq peaks) were classified as universal TTSs. Additionally, genomic regions that have been shown to form triplexes with three different oligos (shared Capture-seq peaks) contained 40% of the universal TTSs. All three sets of the identified universal TTSs were enriched with the purine rich low complexity regions.</p>
            <p>The theoretical possibility of the universal TTS existence comes from the degeneracy of the Hoogsteen rules
                <sup>
                    <xref ref-type="bibr" rid="ref-9">9</xref>
                </sup>. In fact, the triplex-based interaction can be formed in both orientations (parallel and anti-parallel) and it involves only purines (G or A) in the DNA. Additionally, the DNA guanine and adenine can bind to RNA guanine and uracil, respectively, in both orientations while the A::A pairing occurs in the anti-parallel orientation only. This makes the long poly-purine elements a possible targets for a number of different RNA oligos.</p>
            <p>One of the possible and actively discussed roles of the chromatin bound RNAs (including lncRNAs) is to bring different chromosomal parts together to enable the remote DNA-DNA contacts
                <sup>
                    <xref ref-type="bibr" rid="ref-8">8</xref>
                </sup>. Moreover, it has recently been shown that RNAs originating from super-enhancers form triplexes at distant regions
                <sup>
                    <xref ref-type="bibr" rid="ref-26">26</xref>
                </sup>. Therefore, it is possible that universal TTSs may facilitate distal enhancer-promoter interactions via engagement with the same enhancer RNA. In line with this hypothesis, we observed the statistical significant enrichment of the universal Capture-seq peaks near (&lt; 1 kb) the transcription start sites (TSSs) of the annotated genes (
                <xref ref-type="other" rid="SF1">Supplementary Figure 10A</xref>). However, the computationally predicted universal ChOP-seq and background TTSs did not have such trend (
                <xref ref-type="other" rid="SF1">Supplementary Figure 10B,C</xref>). Thus, the experimentally identified shared Capture-seq peaks may be more suitable for subsequent functional validation of the universal TTSs in living cells.</p>
            <p>Importantly, the current computational analysis has a number of limitations. Namely, the triplex-based interactions of the full length transcripts were predicted without taking their secondary structure into account. We are not aware of any bioinformatics tools that would be able to produce such predictions. Moreover, cellular localization of the 153 selected expressed transcripts as well as DNA binding proteins and chromatin compaction were not considered. Therefore, our simulations are more similar to the 
                <italic toggle="yes">in vitro</italic> Capture-seq experiments with short oligos than to the interactions of long transcripts with the chromatin inside the nucleus. Comprehensive identification of all the RNA-DNA interactions obtained by high throughput experimental methods may clarify the predicted functionality of the universal TTSs in the cell. Although a few methods for this task have recently been developed, the length of the sequencing reads (e.g. about 40 bp of DNA in case of GRID-seq) does not allow to reliably determine interactions with the long low complexity regions (including universal TTSs). We are looking forward to the new high quality experimental data to gain further insight into the triplex-based RNA-chromatin interactions 
                <italic toggle="yes">in vivo</italic>.</p>
        </sec>
        <sec>
            <title>Data availability</title>
            <sec>
                <title>Underlying data</title>
                <p>Zenodo: vanya-antonov/universal_tts: The initial release of the code, data files and images related to universal TTSs. 
                    <ext-link ext-link-type="uri" xlink:href="http://doi.org/10.5281/zenodo.2654800">http://doi.org/10.5281/zenodo.2654800</ext-link>
                    <sup>
                        <xref ref-type="bibr" rid="ref-27">27</xref>
                    </sup>
                </p>
                <p>This project contains the following underlying data:

                    <list list-type="bullet">
                        <list-item>
                            <p>universal_tts-v1.0.0.zip?download=1.zip</p>
                            <list list-type="bullet">
                                <list-item>
                                    <label>&#x2013; </label>
                                    <p>data (folder containing underlying data, description of individual files can be found in 
                                        <xref ref-type="other" rid="SF1">Supplementary Table 3</xref>)</p>
                                </list-item>
                            </list>
                        </list-item>
                    </list>
</p>
            </sec>
            <sec>
                <title>Extended data</title>
                <p>Zenodo: vanya-antonov/universal_tts: The initial release of the code, data files and images related to universal TTSs. 
                    <ext-link ext-link-type="uri" xlink:href="http://doi.org/10.5281/zenodo.2654800">http://doi.org/10.5281/zenodo.2654800</ext-link>
                    <sup>
                        <xref ref-type="bibr" rid="ref-27">27</xref>
                    </sup>
                </p>
                <p>This project contains the following extended data:

                    <list list-type="bullet">
                        <list-item>
                            <p>universal_tts-v1.0.0.zip?download=1.zip</p>
                            <list list-type="bullet">
                                <list-item>
                                    <label>&#x2013; </label>
                                    <p>images_R (folder containing R scripts to generate figures)</p>
                                </list-item>
                                <list-item>
                                    <label>&#x2013; </label>
                                    <p>scripts (folder containing scripts to compute p-values based on the Triplexator predictions)</p>
                                </list-item>
                            </list>
                        </list-item>
                    </list>
</p>
                <p>
                    <ext-link ext-link-type="uri" xlink:href="https://opensource.org/licenses/GPL-3.0">Data and code are available under the terms of GNU General Public License version 3 (GPL-3.0)</ext-link>.</p>
            </sec>
        </sec>
    </body>
    <back>
        <sec id="SM1" sec-type="supplementary-material">
            <title>Supplementary material</title>
            <p id="SF1">Supplementary File 1. File containing Supplementary figure 1&#x2013;Supplementary figure 10, and Supplementary table 1&#x2013;Supplementary table 3.</p>
            <p>
                <ext-link ext-link-type="uri" xlink:href="https://f1000researchdata.s3.amazonaws.com/supplementary/13522/a0b6e7f5-2cc9-4a47-86a2-e2e5ebb21945_Supplementary_File_1.pdf">Click here to access the data</ext-link>
            </p>
        </sec>
        <ack>
            <title>Acknowledgements</title>
            <p>We are thankful to Dr. Chandrasekhar Kanduri (University of Gothenburg, Sweden) for providing the original coordinates of the ChOP-seq peaks.</p>
        </ack>
        <ref-list>
            <ref id="ref-1">
                <label>1</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Khalil</surname>
                            <given-names>AM</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Guttman</surname>
                            <given-names>M</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Huarte</surname>
                            <given-names>M</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression.</article-title>
                    <source>
						
                        <italic toggle="yes">Proc Natl Acad Sci U S A.</italic>
					</source>
                    <year>2009</year>;<volume>106</volume>(<issue>28</issue>):<fpage>11667</fpage>&#x2013;<lpage>72</lpage>.
                    <pub-id pub-id-type="pmid">19571010</pub-id>
                    <pub-id pub-id-type="doi">10.1073/pnas.0904715106</pub-id>
                    <pub-id pub-id-type="pmcid">2704857</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-2">
                <label>2</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Chu</surname>
                            <given-names>C</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Qu</surname>
                            <given-names>K</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Zhong</surname>
                            <given-names>FL</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Genomic maps of long noncoding RNA occupancy reveal principles of RNA-chromatin interactions.</article-title>
                    <source>
						
                        <italic toggle="yes">Mol Cell.</italic>
					</source>
                    <year>2011</year>;<volume>44</volume>(<issue>4</issue>):<fpage>667</fpage>&#x2013;<lpage>78</lpage>.
                    <pub-id pub-id-type="pmid">21963238</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.molcel.2011.08.027</pub-id>
                    <pub-id pub-id-type="pmcid">3249421</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-3">
                <label>3</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Pandey</surname>
                            <given-names>RR</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Mondal</surname>
                            <given-names>T</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Mohammad</surname>
                            <given-names>F</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Kcnq1ot1 antisense noncoding RNA mediates lineage-specific transcriptional silencing through chromatin-level regulation.</article-title>
                    <source>
						
                        <italic toggle="yes">Mol Cell.</italic>
					</source>
                    <year>2008</year>;<volume>32</volume>(<issue>2</issue>):<fpage>232</fpage>&#x2013;<lpage>246</lpage>.
                    <pub-id pub-id-type="pmid">18951091</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.molcel.2008.08.022</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-4">
                <label>4</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Mondal</surname>
                            <given-names>T</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Subhash</surname>
                            <given-names>S</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Vaid</surname>
                            <given-names>R</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>MEG3 long noncoding RNA regulates the TGF-&#x03b2; pathway genes through formation of RNA-DNA triplex structures.</article-title>
                    <source>
						
                        <italic toggle="yes">Nat Commun.</italic>
					</source>
                    <year>2015</year>;<volume>6</volume>:<fpage>7743</fpage>.
                    <pub-id pub-id-type="pmid">26205790</pub-id>
                    <pub-id pub-id-type="doi">10.1038/ncomms8743</pub-id>
                    <pub-id pub-id-type="pmcid">4525211</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-5">
                <label>5</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Simon</surname>
                            <given-names>MD</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>CI</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Kharchenko</surname>
                            <given-names>PV</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>The genomic binding sites of a noncoding RNA.</article-title>
                    <source>
						
                        <italic toggle="yes">Proc Natl Acad Sci U S A.</italic>
					</source>
                    <year>2011</year>;<volume>108</volume>(<issue>51</issue>):<fpage>20497</fpage>&#x2013;<lpage>502</lpage>.
                    <pub-id pub-id-type="doi">10.1073/pnas.1113536108</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-6">
                <label>6</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Engreitz</surname>
                            <given-names>JM</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Pandya-Jones</surname>
                            <given-names>A</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>McDonel</surname>
                            <given-names>P</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>The Xist lncRNA exploits three-dimensional genome architecture to spread across the X chromosome.</article-title>
                    <source>
						
                        <italic toggle="yes">Science.</italic>
					</source>
                    <year>2013</year>;<volume>341</volume>(<issue>6147</issue>):<fpage>1237973</fpage>.
                    <pub-id pub-id-type="pmid">23828888</pub-id>
                    <pub-id pub-id-type="doi">10.1126/science.1237973</pub-id>
                    <pub-id pub-id-type="pmcid">3778663</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-7">
                <label>7</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Sridhar</surname>
                            <given-names>B</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Rivas-Astroza</surname>
                            <given-names>M</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Nguyen</surname>
                            <given-names>TC</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Systematic Mapping of RNA-Chromatin Interactions In Vivo.</article-title>
                    <source>
						
                        <italic toggle="yes">Curr Biol.</italic>
					</source>
                    <year>2017</year>;<volume>27</volume>(<issue>4</issue>):<fpage>602</fpage>&#x2013;<lpage>609</lpage>.
                    <pub-id pub-id-type="pmid">28132817</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.cub.2017.01.011</pub-id>
                    <pub-id pub-id-type="pmcid">5319903</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-8">
                <label>8</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Li</surname>
                            <given-names>X</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Zhou</surname>
                            <given-names>B</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Chen</surname>
                            <given-names>L</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>GRID-seq reveals the global RNA-chromatin interactome.</article-title>
                    <source>
						
                        <italic toggle="yes">Nat Biotechnol.</italic>
					</source>
                    <year>2017</year>;<volume>35</volume>(<issue>10</issue>):<fpage>940</fpage>&#x2013;<lpage>950</lpage>.
                    <pub-id pub-id-type="pmid">28922346</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nbt.3968</pub-id>
                    <pub-id pub-id-type="pmcid">5953555</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-9">
                <label>9</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Hoogsteen</surname>
                            <given-names>K</given-names>
                        </name>
					</person-group>:
                    <article-title>The crystal and molecular structure of a hydrogen-bonded complex between 1-methylthymine and 9-methyladenine.</article-title>
                    <source>
						
                        <italic toggle="yes">Acta Cryst.</italic>
					</source>
                    <year>1963</year>;<volume>16</volume>(<issue>9</issue>):<fpage>907</fpage>&#x2013;<lpage>916</lpage>.
                    <pub-id pub-id-type="doi">10.1107/S0365110X63002437</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-10">
                <label>10</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Schmitz</surname>
                            <given-names>K-M</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Mayer</surname>
                            <given-names>C</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Postepska</surname>
                            <given-names>A</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Interaction of noncoding RNA with the rDNA promoter mediates recruitment of DNMT3b and silencing of rRNA genes.</article-title>
                    <source>
						
                        <italic toggle="yes">Genes Dev.</italic>
					</source>
                    <year>2010</year>;<volume>24</volume>(<issue>20</issue>):<fpage>2264</fpage>&#x2013;<lpage>2269</lpage>.
                    <pub-id pub-id-type="pmid">20952535</pub-id>
                    <pub-id pub-id-type="doi">10.1101/gad.590910</pub-id>
                    <pub-id pub-id-type="pmcid">2956204</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-11">
                <label>11</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Grote</surname>
                            <given-names>P</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Wittler</surname>
                            <given-names>L</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Hendrix</surname>
                            <given-names>D</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>The tissue-specific lncrna 
                        <italic toggle="yes">fendrr</italic> is an essential regulator of heart and body wall development in the mouse.</article-title>
                    <source>
						
                        <italic toggle="yes">Dev Cell.</italic>
					</source>
                    <year>2013</year>;<volume>24</volume>(<issue>2</issue>):<fpage>206</fpage>&#x2013;<lpage>214</lpage>.
                    <pub-id pub-id-type="pmid">23369715</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.devcel.2012.12.012</pub-id>
                    <pub-id pub-id-type="pmcid">4149175</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-12">
                <label>12</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Postepska-Igielska</surname>
                            <given-names>A</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Giwojna</surname>
                            <given-names>A</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Gasri-Plotnitsky</surname>
                            <given-names>L</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>LncRNA Khps1 Regulates Expression of the Proto-oncogene SPHK1 via Triplex-Mediated Changes in Chromatin Structure.</article-title>
                    <source>
						
                        <italic toggle="yes">Mol Cell.</italic>
					</source>
                    <year>2015</year>;<volume>60</volume>(<issue>4</issue>):<fpage>626</fpage>&#x2013;<lpage>636</lpage>.
                    <pub-id pub-id-type="pmid">26590717</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.molcel.2015.10.001</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-13">
                <label>13</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>O&#x2019;Leary</surname>
                            <given-names>VB</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Ovsepian</surname>
                            <given-names>SV</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Carrascosa</surname>
                            <given-names>LG</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>PARTICLE, a Triplex-Forming Long ncRNA, Regulates Locus-Specific Methylation in Response to Low-Dose Irradiation.</article-title>
                    <source>
						
                        <italic toggle="yes">Cell Rep.</italic>
					</source>
                    <year>2015</year>;<volume>11</volume>(<issue>3</issue>):<fpage>474</fpage>&#x2013;<lpage>485</lpage>.
                    <pub-id pub-id-type="pmid">25900080</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.celrep.2015.03.043</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-14">
                <label>14</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Buske</surname>
                            <given-names>FA</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Bauer</surname>
                            <given-names>DC</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Mattick</surname>
                            <given-names>JS</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Triplexator: detecting nucleic acid triple helices in genomic and transcriptomic data.</article-title>
                    <source>
						
                        <italic toggle="yes">Genome Res.</italic>
					</source>
                    <year>2012</year>;<volume>22</volume>(<issue>7</issue>):<fpage>1372</fpage>&#x2013;<lpage>81</lpage>.
                    <pub-id pub-id-type="pmid">22550012</pub-id>
                    <pub-id pub-id-type="doi">10.1101/gr.130237.111</pub-id>
                    <pub-id pub-id-type="pmcid">3396377</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-15">
                <label>15</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Go&#x00f1;i</surname>
                            <given-names>JR</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>De La Cruz</surname>
                            <given-names>X</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Orozco</surname>
                            <given-names>M</given-names>
                        </name>
					</person-group>:
                    <article-title>Triplex-forming oligonucleotide target sequences in the human genome.</article-title>
                    <source>
						
                        <italic toggle="yes">Nucleic Acids Res.</italic>
					</source>
                    <year>2004</year>;<volume>32</volume>(<issue>1</issue>):<fpage>354</fpage>&#x2013;<lpage>360</lpage>.
                    <pub-id pub-id-type="pmid">14726484</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkh188</pub-id>
                    <pub-id pub-id-type="pmcid">373298</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-16">
                <label>16</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Soibam</surname>
                            <given-names>B</given-names>
                        </name>
					</person-group>:
                    <article-title>Super-lncRNAs: identification of lncRNAs that target super-enhancers via RNA:DNA:DNA triplex formation.</article-title>
                    <source>
						
                        <italic toggle="yes">RNA.</italic>
					</source>
                    <year>2017</year>;<volume>23</volume>(<issue>11</issue>):<fpage>1729</fpage>&#x2013;<lpage>1742</lpage>.
                    <pub-id pub-id-type="pmid">28839111</pub-id>
                    <pub-id pub-id-type="doi">10.1261/rna.061317.117</pub-id>
                    <pub-id pub-id-type="pmcid">5648039</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-17">
                <label>17</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Kent</surname>
                            <given-names>WJ</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Sugnet</surname>
                            <given-names>CW</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Furey</surname>
                            <given-names>TS</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>The human genome browser at UCSC.</article-title>
                    <source>
						
                        <italic toggle="yes">Genome Res.</italic>
					</source>
                    <year>2002</year>;<volume>12</volume>(<issue>6</issue>):<fpage>996</fpage>&#x2013;<lpage>1006</lpage>.
                    <pub-id pub-id-type="pmid">12045153</pub-id>
                    <pub-id pub-id-type="doi">10.1101/gr.229102</pub-id>
                    <pub-id pub-id-type="pmcid">186604</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-18">
                <label>18</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Quinlan</surname>
                            <given-names>AR</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Hall</surname>
                            <given-names>IM</given-names>
                        </name>
					</person-group>:
                    <article-title>BEDTools: a flexible suite of utilities for comparing genomic features.</article-title>
                    <source>
						
                        <italic toggle="yes">Bioinformatics.</italic>
					</source>
                    <year>2010</year>;<volume>26</volume>(<issue>6</issue>):<fpage>841</fpage>&#x2013;<lpage>842</lpage>.
                    <pub-id pub-id-type="pmid">20110278</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btq033</pub-id>
                    <pub-id pub-id-type="pmcid">2832824</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-19">
                <label>19</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Kim</surname>
                            <given-names>D</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Langmead</surname>
                            <given-names>B</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Salzberg</surname>
                            <given-names>SL</given-names>
                        </name>
					</person-group>:
                    <article-title>HISAT: a fast spliced aligner with low memory requirements.</article-title>
                    <source>
						
                        <italic toggle="yes">Nat Methods.</italic>
					</source>
                    <year>2015</year>;<volume>12</volume>(<issue>4</issue>):<fpage>357</fpage>&#x2013;<lpage>60</lpage>.
                    <pub-id pub-id-type="pmid">25751142</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nmeth.3317</pub-id>
                    <pub-id pub-id-type="pmcid">4655817</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-20">
                <label>20</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Harrow</surname>
                            <given-names>J</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Frankish</surname>
                            <given-names>A</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Gonzalez</surname>
                            <given-names>JM</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>GENCODE: the reference human genome annotation for The ENCODE Project.</article-title>
                    <source>
						
                        <italic toggle="yes">Genome Res.</italic>
					</source>
                    <year>2012</year>;<volume>22</volume>(<issue>9</issue>):<fpage>1760</fpage>&#x2013;<lpage>1774</lpage>.
                    <pub-id pub-id-type="pmid">22955987</pub-id>
                    <pub-id pub-id-type="doi">10.1101/gr.135350.111</pub-id>
                    <pub-id pub-id-type="pmcid">3431492</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-21">
                <label>21</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Anders</surname>
                            <given-names>S</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Pyl</surname>
                            <given-names>PT</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Huber</surname>
                            <given-names>W</given-names>
                        </name>
					</person-group>:
                    <article-title>Htseq--a python framework to work with high-throughput sequencing data.</article-title>
                    <source>
						
                        <italic toggle="yes">Bioinformatics.</italic>
					</source>
                    <year>2015</year>;<volume>31</volume>(<issue>2</issue>):<fpage>166</fpage>&#x2013;<lpage>169</lpage>.
                    <pub-id pub-id-type="pmid">25260700</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btu638</pub-id>
                    <pub-id pub-id-type="pmcid">4287950</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-22">
                <label>22</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Jiang</surname>
                            <given-names>M</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Anderson</surname>
                            <given-names>J</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Gillespie</surname>
                            <given-names> J</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>uShuffle: a useful tool for shuffling biological sequences while preserving the k-let counts.</article-title>
                    <source>
						
                        <italic toggle="yes">BMC Bioinformatics.</italic>
					</source>
                    <year>2008</year>;<volume>9</volume>:<fpage>192</fpage>.
                    <pub-id pub-id-type="pmid">18405375</pub-id>
                    <pub-id pub-id-type="doi">10.1186/1471-2105-9-192</pub-id>
                    <pub-id pub-id-type="pmcid">2375906</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-23">
                <label>23</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Gu</surname>
                            <given-names>Z</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Eils</surname>
                            <given-names>R</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Schlesner</surname>
                            <given-names>M</given-names>
                        </name>
					</person-group>:
                    <article-title>Complex heatmaps reveal patterns and correlations in multidimensional genomic data.</article-title>
                    <source>
						
                        <italic toggle="yes">Bioinformatics.</italic>
					</source>
                    <year>2016</year>;<volume>32</volume>(<issue>18</issue>):<fpage>2847</fpage>&#x2013;<lpage>2849</lpage>.
                    <pub-id pub-id-type="pmid">27207943</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btw313</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-24">
                <label>24</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Edgar</surname>
                            <given-names>RC</given-names>
                        </name>
					</person-group>:
                    <article-title>MUSCLE: multiple sequence alignment with high accuracy and high throughput.</article-title>
                    <source>
						
                        <italic toggle="yes">Nucleic Acids Res.</italic>
					</source>
                    <year>2004</year>;<volume>32</volume>(<issue>5</issue>):<fpage>1792</fpage>&#x2013;<lpage>1797</lpage>.
                    <pub-id pub-id-type="pmid">15034147</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkh340</pub-id>
                    <pub-id pub-id-type="pmcid">390337</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-25">
                <label>25</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Kuo</surname>
                            <given-names>CC</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>H&#x00e4;nzelmann</surname>
                            <given-names>S</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Sent&#x00fc;rk Cetin</surname>
                            <given-names>N</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Detection of RNA-DNA binding sites in long noncoding RNAs.</article-title>
                    <source>
						
                        <italic toggle="yes">Nucleic Acids Res.</italic>
					</source>
                    <year>2019</year>;<volume>47</volume>(<issue>6</issue>):<fpage>e32</fpage>.
                    <pub-id pub-id-type="pmid">30698727</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkz037</pub-id>
                    <pub-id pub-id-type="pmcid">6451187</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-26">
                <label>26</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Sent&#x00fc;rk Cetin</surname>
                            <given-names>N</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Kuo</surname>
                            <given-names>C-C</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Ribarska</surname>
                            <given-names>T</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Isolation and genome-wide characterization of cellular DNA:RNA triplex structures.</article-title>
                    <source>
						
                        <italic toggle="yes">Nucleic Acids Res.</italic>
					</source>
                    <year>2019</year>;<volume>47</volume>(<issue>5</issue>):<fpage>2306</fpage>&#x2013;<lpage>2321</lpage>.
                    <pub-id pub-id-type="pmid">30605520</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gky1305</pub-id>
                    <pub-id pub-id-type="pmcid">6411930</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-27">
                <label>27</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Antonov</surname>
                            <given-names>I</given-names>
                        </name>
						</person-group>:
                    <article-title>vanyaantonov/universal_tts: The initial release of the code, data files and images related to universal TTSs (Version v1.0.0).</article-title>
                    <source>
						
                        <italic toggle="yes">Zenodo.</italic>
					</source>
                    <year>2019</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://www.doi.org/10.5281/zenodo.2654800">http://www.doi.org/10.5281/zenodo.2654800</ext-link>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report48244">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.21026.r48244</article-id>
            <title-group>
                <article-title>Reviewer response for version 2</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Zhu</surname>
                        <given-names>Hao</given-names>
                    </name>
                    <xref ref-type="aff" rid="r48244a1">1</xref>
                    <role>Referee</role>
                </contrib>
                <aff id="r48244a1">
                    <label>1</label>Bioinformatics Section, Southern Medical University, Guangzhou, China</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>21</day>
                <month>5</month>
                <year>2019</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2019 Zhu H</copyright-statement>
                <copyright-year>2019</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport48244" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.13522.2"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The manuscript is improved and well-written, and the idea of "universal TTSs" is interesting, but there are still several weak points. These points make some of my concerns remain but are not as serious as before. So, I think it is better to let readers make their own judgement.</p>
            <p> </p>
            <p> First, the authors analyzed basically only the data of MEG3, which makes the basis of the conclusion weak. The authors say, "To check the ability of other RNAs to form triplexes with MEG3 binding sites, we applied Triplexator to a set of 153 expressed transcripts". The specific 153 ones (given that there are ~20,000 annotated lncRNA genes) make the following "65 analyzed RNAs show results similar to MEG3 lncRNA" quite specific.</p>
            <p> Second, both the ChOP-seq data and Triplexator results may have defects.&#x00a0; For example, the authors say, "the number of the predicted triplexes increases with the lengths of input sequences". Why? Theoretically, if the longer sequences do not contain more TFO, the number of the triplexes should not increase.</p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Yes</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>No</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>No</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>No</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>NA</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report48245">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.21026.r48245</article-id>
            <title-group>
                <article-title>Reviewer response for version 2</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Grummt</surname>
                        <given-names>Ingrid</given-names>
                    </name>
                    <xref ref-type="aff" rid="r48245a1">1</xref>
                    <role>Referee</role>
                </contrib>
                <aff id="r48245a1">
                    <label>1</label>Division of Molecular Biology of the Cell II , DKFZ-ZMBH-Allianz, German Cancer Research Center (DKFZ), Heidelberg, Germany</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>13</day>
                <month>5</month>
                <year>2019</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2019 Grummt I</copyright-statement>
                <copyright-year>2019</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport48245" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.13522.2"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The authors have responded to most of my concerns. They increased the number of RNA sequences and included published data in their analysis. Though some concerns are left, the overall conclusion of this study &#x2013; that is, purine-rich sequences interact with various different RNAs &#x2013; supports the notion that regulatory RNAs may target transcriptional co-activators to distinct genomic loci via Hoogsteen interactions with purine-rich gene sequences.</p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>No</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>No</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Partly</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>No</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>No</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>Regulation of gene expression by noncoding RNA</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report29953">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.14683.r29953</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Zhu</surname>
                        <given-names>Hao</given-names>
                    </name>
                    <xref ref-type="aff" rid="r29953a1">1</xref>
                    <role>Referee</role>
                </contrib>
                <aff id="r29953a1">
                    <label>1</label>Bioinformatics Section, Southern Medical University, Guangzhou, China</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>5</day>
                <month>2</month>
                <year>2018</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2018 Zhu H</copyright-statement>
                <copyright-year>2018</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport29953" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.13522.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>reject</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>Many lncRNAs can bind to DNA sequences by forming triplexes (the binding sites are often called TTS, triplex-targeting sites). Whether there are &#x201c;universal TTS&#x201d; as described here is interesting and unreported, and deserves a careful investigation. But I have a major concern about the work: the authors reach the conclusion upon too few examples. Also, why these lncRNAs (BE2L6, LILRA3, HMOX1) were chosen (randomly or selected for some reasons)?</p>
            <p> </p>
            <p> A few others issues should also be addressed. First, what is the relationship between the &#x201c;universal TTSs&#x201d; and base-pairing rules is untouched. For example, do the universal TTSs allow many lncRNAs to bind to them using the same rules? If very different rules are involved, what does this mean? To some extent, binding upon different rules indicates lncRNA specific TTSs, instead of universal TTSs. Second, I feel that the genomic regions used to sum scores are unreasonably long (3000 bp). Finally, it is said that &#x201c;the median Triplexator SumScores were 48 and 25, respectively (p-value=5.2e-100)&#x201d;. Statistically, the difference is significance, but biologically might not. I think 48 is not that large and 25 is not that small.</p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Yes</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>No</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>No</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>No</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>Genome analysis, lncRNA analysis, molecular evolution</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.</p>
        </body>
        <sub-article article-type="response" id="comment4642-29953">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Antonov</surname>
                            <given-names>Ivan</given-names>
                        </name>
                        <aff>Research Center of Biotechnology, RAS, Russian Federation</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>10</day>
                    <month>5</month>
                    <year>2019</year>
                </pub-date>
            </front-stub>
            <body>
                <p>Many lncRNAs can bind to DNA sequences by forming triplexes (the binding sites are often called TTS, triplex-targeting sites). Whether there are &#x201c;universal TTS&#x201d; as described here is interesting and unreported, and deserves a careful investigation. But I have a major concern about the work: the authors reach the conclusion upon too few examples. Also, why these lncRNAs (BE2L6, LILRA3, HMOX1) were chosen (randomly or selected for some reasons)?</p>
                <p> 
                    <bold>In the current version we have significantly increased the number of query RNAs (i.e. 153 expressed transcripts and 153 random sequences obtained by di-nucleotide shuffling of MEG3 lncRNA). The 153 transcripts that we use now were chosen so that their lengths and GC contents were similar to the MEG3 lncRNA.</bold>
                </p>
                <p> </p>
                <p> 1) A few others issues should also be addressed. First, what is the relationship between the &#x201c;universal TTSs&#x201d; and base-pairing rules is untouched. For example, do the universal TTSs allow many lncRNAs to bind to them using the same rules? If very different rules are involved, what does this mean? To some extent, binding upon different rules indicates lncRNA specific TTSs, instead of universal TTSs.</p>
                <p> 
                    <bold>Our preliminary analysis indicated that different RNAs interacted with universal TTSs via mixed (G or U) motif a little bit more frequently than via the purine or pyrimidine motifs. Importantly, all the analyzed transcripts were predicted to form a lot of triple helices (using different RNA motifs) with the universal TTSs. This is what makes uTTS special rather than specific motifs .</bold>
                </p>
                <p> </p>
                <p> 2) Second, I feel that the genomic regions used to sum scores are unreasonably long (3000 bp).</p>
                <p> 
                    <bold>In the current version we decreased the genomic region lengths by considering the exact ChOP-seq and Capture-seq peaks.</bold>
                </p>
                <p> </p>
                <p> 3) Finally, it is said that &#x201c;the median Triplexator SumScores were 48 and 25, respectively (p-value=5.2e-100)&#x201d;. Statistically, the difference is significance, but biologically might not. I think 48 is not that large and 25 is not that small.</p>
                <p> 
                    <bold>We now use p-values instead of SumScore to estimate the statistical significance of each RNA-DNA interaction.</bold>
                </p>
            </body>
        </sub-article>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report29955">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.14683.r29955</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Mironov</surname>
                        <given-names>Andrey A.</given-names>
                    </name>
                    <xref ref-type="aff" rid="r29955a1">1</xref>
                    <xref ref-type="aff" rid="r29955a2">2</xref>
                    <role>Referee</role>
                </contrib>
                <aff id="r29955a1">
                    <label>1</label>Institute for Information Transmission Problems, RAS (Russian Academy of Sciences), Moscow, Russian Federation</aff>
                <aff id="r29955a2">
                    <label>2</label>Department of Bioengineering and Bioinformatics, Moscow Technological University, Moscow, Russian Federation</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>26</day>
                <month>1</month>
                <year>2018</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2018 Mironov AA</copyright-statement>
                <copyright-year>2018</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport29955" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.13522.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The manuscript describes an application of the Triplexator software for search possible binding sites of the imprinting related MEG3 linc RNA on the human genome. The authors give a good example of the statistical analysis of the results. The main paper about this software has 62 references (google scholar data). Most of them have only reference to the software and only a few of them used the Triplexator. Only a few reports show a success story about the application of the Triplexator software and comparison the results with an experiment. In some papers, a significant enrichment of triplex targets on regions of interest was found. But they did not analyze the specificity of the predicted triplex formation. The current paper focused on a specificity of the Triplexator predictions. The authors got unexpected results that the Triplexator gives many non-specific hits for the case.</p>
            <p> </p>
            <p> Comments: 
                <list list-type="order">
                    <list-item>
                        <p>Description of similar transcripts and the parameters of the Triplexator software should be rearranged&#x00a0;because the appearance of some RNA names before definition sounds strange. The parameters of the Triplexator software contains a reference to the BE2L6 RNA while the description of the control RNA set has a reference on UBE2L6.</p>
                    </list-item>
                    <list-item>
                        <p>The di-nucleotide shuffling seems more adequate for RNA analysis.</p>
                    </list-item>
                    <list-item>
                        <p>In one&#x00a0;paper
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-29955-1">1</xref>
                            </sup> the Triplexator software also was used for analysis of MEG3 RNA-DNA contacts. The comparison of the obtained results with the results of given manuscript should be provided. Seems in current manuscript a&#x00a0;more accurate analysis with good controls was provided.</p>
                    </list-item>
                    <list-item>
                        <p>It would be good to look at the practice of using the program on literature and make sure that the program has a sufficiently low specificity.</p>
                    </list-item>
                </list>
            </p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Yes</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>Yes</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Yes</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Yes</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>NA</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
        <back>
            <ref-list>
                <title>References</title>
                <ref id="rep-ref-29955-1">
                    <label>1</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>MEG3 long noncoding RNA regulates the TGF-&#x03b2; pathway genes through formation of RNA-DNA triplex structures.</article-title>
                        <source>
                            <italic>Nat Commun</italic>
                        </source>.<year>2015</year>;<volume>6</volume>:
                        <elocation-id>10.1038/ncomms8743</elocation-id>
                        <fpage>7743</fpage>
                        <pub-id pub-id-type="pmid">26205790</pub-id>
                        <pub-id pub-id-type="doi">10.1038/ncomms8743</pub-id>
                    </mixed-citation>
                </ref>
            </ref-list>
        </back>
        <sub-article article-type="response" id="comment4640-29955">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Antonov</surname>
                            <given-names>Ivan</given-names>
                        </name>
                        <aff>Research Center of Biotechnology, RAS, Russian Federation</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>10</day>
                    <month>5</month>
                    <year>2019</year>
                </pub-date>
            </front-stub>
            <body>
                <p>
                    <bold>Dear reviewer,</bold>
                </p>
                <p>
                    <bold> </bold>
                </p>
                <p>
                    <bold> We would like to apologise for a significant delay with the reply to all the comments. To implement the changes suggested by the reviewers we had to completely redesign our study and significantly increase the number of analyzed RNAs. Particularly, we developed a new method to estimate the statistical significance of the number of triplexes predicted for each RNA-DNA pair. Moreover, we analyzed the results of the recently published Capture-seq experiment that identified interactions of three different RNA oligos with "naked" DNA. We hope that all these analyses improved our study.</bold>
                </p>
                <p> </p>
                <p> 1) Description of similar transcripts and the parameters of the Triplexator software should be rearranged because the appearance of some RNA names before definition sounds strange. The parameters of the Triplexator software contains a reference to the BE2L6 RNA while the description of the control RNA set has a reference on UBE2L6.</p>
                <p> 
                    <bold>We have made the appropriate corrections in the text</bold>
                </p>
                <p> </p>
                <p> 2) The di-nucleotide shuffling seems more adequate for RNA analysis.</p>
                <p> 
                    <bold>We now use the di-nucleotide shuffling to generate random RNA sequences.</bold>
                </p>
                <p> </p>
                <p> 3) In 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/pubmed/26205790">one paper1</ext-link> the Triplexator software also was used for analysis of MEG3 RNA-DNA contacts. The comparison of the obtained results with the results of given manuscript should be provided. Seems in current manuscript a more accurate analysis with good controls was provided.</p>
                <p> 
                    <bold>In this original paper the authors focused on the triplex-based interactions of a single RNA (MEG3) with the chromatin. In the present study we are interested whether other transcripts may have a potential to interact with the same genomic regions. Taking into account the different aims (and approaches) of the studies we are not sure if it is reasonable to compare their results.</bold>
                </p>
                <p> </p>
                <p> 4) It would be good to look at the practice of using the program on literature and make sure that the program has a sufficiently low specificity.</p>
                <p> 
                    <bold>In our recent benchmarking study [
                        <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/pubmed/29697742">PMID:29697742</ext-link>] we showed that Triplexator was the most accurate tool as of 2018. This is why we used it in the present study.</bold>
                </p>
            </body>
        </sub-article>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report29954">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.14683.r29954</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Grummt</surname>
                        <given-names>Ingrid</given-names>
                    </name>
                    <xref ref-type="aff" rid="r29954a1">1</xref>
                    <role>Referee</role>
                </contrib>
                <aff id="r29954a1">
                    <label>1</label>Division of Molecular Biology of the Cell II , DKFZ-ZMBH-Allianz, German Cancer Research Center (DKFZ), Heidelberg, Germany</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>23</day>
                <month>1</month>
                <year>2018</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2018 Grummt I</copyright-statement>
                <copyright-year>2018</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport29954" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.13522.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>reject</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>Long noncoding RNAs (lncRNA) can regulate gene expression by targeting specific DNA sequences via Hoogsteen base paring, forming RNA-DNA triple helical structures. Computational analyses have revealed that a large population of triplex-forming motifs is present across the genome, the majority of annotated human genes containing at least one unique and high-affinity triplex target site, preferentially in regulatory gene regions (Goni et al. 2004, Buske et al. 2012). Moreover, several studies have provided in vitro and in vivo evidence for the existence and biological relevance of RNA-DNA triplexes, including pRNA (Schmitz et al. 2010), Fendrr (Grote et al. 2013), Khps1 (Postepska-Igielska et al. 2015), PARTICLE (O&#x2019;Leary et al. 2015), and MEG3 (Mondal et al. 2015). MEG3 has been shown to associate with AG-rich DNA motifs and facilitate recruitment of PRC2 to target sites. Considering the large number of purine-rich sequences in the genome, triplex-mediated targeting of lncRNAs and associated proteins to distinct genomic loci is very likely a commonly used mechanism of gene regulation.</p>
            <p> </p>
            <p> Given the importance and emerging acceptance of the concept of triplex-dependent gene regulation, it is more than surprising, if not irritating, that the authors challenge this concept feeding the &#x2018;Triplexator&#x2019; only with a few RNAs and a subset of MEG3-interacting regions rather than providing any experimental data and/or more global bioinformatic analysis.</p>
            <p> </p>
            <p> Just some specific comments: 
                <list list-type="bullet">
                    <list-item>
                        <p>In the abstract they claim &#x2018;
                            <italic>these triplex interactions might contribute to establishing long-distance chromosomal contact&#x2019;</italic> without providing any information or bioinformatic analyses.</p>
                    </list-item>
                    <list-item>
                        <p>They use the term &#x2018;hybridization&#x2019; for the interaction between RNA and dsDNA. This is wrong as hybridization refers to Watson-Crick base-pairing between RNA and ssDNA and not to Hoogsteen bonding.</p>
                    </list-item>
                    <list-item>
                        <p>They took MEG3-interacting DNA peaks shorter than 1000 bp, then selected 3000 bp bins centering these regions and used these bins for analysis. There is no rationale for this selection which of course determines the final outcome of the analysis. Accordingly, the majority of these bin regions did not coincide with regions determined by ChOP-seq. Probably, a shorter binning would be more reliable to analyze the available data.</p>
                    </list-item>
                </list> 
                <list list-type="bullet">
                    <list-item>
                        <p>They focused on bins that overlap genes. Even if partial overlapping was accepted, they might have missed some promoters. Intergenic regions containing regulatory sequences (e.g. enhancers) were excluded.</p>
                    </list-item>
                    <list-item>
                        <p>Why were only peaks overlapping with annotated genes considered to be significant (or &#x201d;real&#x201c;)? Genomic regions that do not harbor annotated genes, such as enhancers, are important regulatory elements that are targeted by lncRNAs and as such are functional RNA-binding sites, highly relevant for this study. In addition, since it is known that the base composition of genic and intergenic regions is different, exclusion of intergenic regions introduces a considerable bias to the analysis.</p>
                    </list-item>
                    <list-item>
                        <p>Selection of just three additional RNAs is certainly not adequate for the far-reaching conclusion: &#x2018;
                            <italic>TTSs are able to hybridize with various different RNAs almost irrespectively of their sequence&#x2019;</italic>. It would be more convincing to show the results from scanning more RNAs, irrespective of their length and GC-content. Also, there is no attention given to the expression profiles of selected MEG3-mimicking RNAs. This is important because transcription of MEG3 is highly tissue-specific.</p>
                    </list-item>
                    <list-item>
                        <p>The sum scores from the Triplexator analysis are shown which does not mean that the same regions are involved in triplex formation. It would be much more convincing to show similarities (or differences) of triplex-forming RNAs for a given TTS in a given bin.</p>
                    </list-item>
                </list> 
                <list list-type="bullet">
                    <list-item>
                        <p>The terms &#x2018;universal TTS&#x2019; and &#x2018;universal bins&#x2019; are not synonymous and interchangeable!! One bin (3000 bp) can contain many putative TTSs.</p>
                    </list-item>
                    <list-item>
                        <p>If there are only 18 &#x2018;universal bins&#x2019; out of 3620 bins among seven RNA analyses, this small number is not sufficient for claiming that there is no specificity in RNA targeting.</p>
                    </list-item>
                    <list-item>
                        <p>They hypothesize that 
                            <italic>&#x2018;the universal TTS can be viewed as the anchor point which can be bound by various nuclear RNAs to provide long-distance chromosomal contacts&#x2019;</italic>. Even if this might be true, without any supportive data this is pure speculation.</p>
                    </list-item>
                </list> &#x00a0;</p>
            <p> Altogether, the authors&#x00b4; claim that triplex formation occurs almost sequence-independent is not justified but is based solely on 
                <italic>in silico</italic> analyses. At least another available bioinformatics tool should have been used and standard 
                <italic>in vitro</italic> assays (e.g. EMSA experiments) should have been performed to validate that the candidate RNAs are indeed capable to form triplexes. The authors do not even mention that the 
                <italic>in vivo</italic> situation might be completely different than algorithm-based predictions and that there might be additional factors/constraints involved in triplex formation and stability.</p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>No</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>No</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Partly</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>No</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>No</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>Regulation of gene expression by noncoding RNA</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.</p>
        </body>
        <sub-article article-type="response" id="comment4641-29954">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Antonov</surname>
                            <given-names>Ivan</given-names>
                        </name>
                        <aff>Research Center of Biotechnology, RAS, Russian Federation</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>10</day>
                    <month>5</month>
                    <year>2019</year>
                </pub-date>
            </front-stub>
            <body>
                <p>
                    <bold>We would like to thank Dr. Grummt for the extended comments to our work. They have helped us improve the design of our study and obtain additional results. We hope that they made our conclusions more reliable and reproducible.</bold>
                </p>
                <p> </p>
                <p> </p>
                <p> Given the importance and emerging acceptance of the concept of triplex-dependent gene regulation, it is more than surprising, if not irritating, that the authors challenge this concept feeding the &#x2018;Triplexator&#x2019; only with a few RNAs and a subset of MEG3-interacting regions rather than providing any experimental data and/or more global bioinformatic analysis.</p>
                <p> 
                    <bold>We do not challenge the possibility of triplex-dependant regulation. We simply claim that RNA interactions with some genomic regions have low sequence specificity because many other RNAs may be able to bind the same loci. We investigated features of such regions and found them to be enriched in purine rich low complexity repeats. In the current version, we completely redesigned the study and incorporated the analysis of 306 RNA sequences to confirm our findings. We also modified the text so the main conclusions are clear and non-misleading. &#x00a0;</bold>
                </p>
                <p> </p>
                <p> </p>
                <p> Just some specific comments:</p>
                <p> 1) In the abstract they claim &#x2018;these triplex interactions might contribute to establishing long-distance chromosomal contact&#x2019; without providing any information or bioinformatic analyses.</p>
                <p> 
                    <bold>We believe it is reasonable to speculate about this possibility in the discussion for the following reasons. First, it has recently been shown that "RNAs originating from super-enhancers form triplexes at distant regions" [PMID: 30605520]. Second, we showed that the predicted Capture-seq universal TTSs were highly enriched in gene promoters (Supplementary Figure 10). Together these observations indicate that the same eRNA may be able to interact with several different universal TTSs and therefore contribute to the long-distance chromosomal (i.e. enhancer-promoter) contacts. However, additional experimental verification of this hypothesis is required.</bold>
                </p>
                <p> 
                    <bold>We modified the text in the abstract as follows: "We speculated that such universal TTSs may contribute to establishing long-distance chromosomal contacts and may facilitate distal enhancer-promoter interactions."</bold>
                </p>
                <p> </p>
                <p> 2) They use the term &#x2018;hybridization&#x2019; for the interaction between RNA and dsDNA. This is wrong as hybridization refers to Watson-Crick base-pairing between RNA and ssDNA and not to Hoogsteen bonding.</p>
                <p> 
                    <bold>We have corrected the terminology used in the manuscript.</bold>
                </p>
                <p> </p>
                <p> 3) They took MEG3-interacting DNA peaks shorter than 1000 bp, then selected 3000 bp bins centering these regions and used these bins for analysis. There is no rationale for this selection which of course determines the final outcome of the analysis. Accordingly, the majority of these bin regions did not coincide with regions determined by ChOP-seq. Probably, a shorter binning would be more reliable to analyze the available data.</p>
                <p> 
                    <bold>We now use the exact locations for all the ChOP-seq peaks. To compensate for the peak length variability we developed a method that estimates the statistical significance of the number of triplexes predicted for a RNA-DNA pair taking into account lengths of both sequences.</bold>
                </p>
                <p> </p>
                <p> 4) They focused on bins that overlap genes. Even if partial overlapping was accepted, they might have missed some promoters. Intergenic regions containing regulatory sequences (e.g. enhancers) were excluded.</p>
                <p> 
                    <bold>We now analyze all the ChOP-seq peaks without considering their overlaps with the annotated genes.</bold>
                </p>
                <p> </p>
                <p> 5) Why were only peaks overlapping with annotated genes considered to be significant (or &#x201d;real&#x201c;)? Genomic regions that do not harbor annotated genes, such as enhancers, are important regulatory elements that are targeted by lncRNAs and as such are functional RNA-binding sites, highly relevant for this study. In addition, since it is known that the base composition of genic and intergenic regions is different, exclusion of intergenic regions introduces a considerable bias to the analysis.</p>
                <p> 
                    <bold>We now analyze all the ChOP-seq peaks.</bold>
                </p>
                <p> </p>
                <p> 6) Selection of just three additional RNAs is certainly not adequate for the far-reaching conclusion: &#x2018;TTSs are able to hybridize with various different RNAs almost irrespectively of their sequence&#x2019;. It would be more convincing to show the results from scanning more RNAs, irrespective of their length and GC-content. Also, there is no attention given to the expression profiles of selected MEG3-mimicking RNAs. This is important because transcription of MEG3 is highly tissue-specific.</p>
                <p> 
                    <bold>We have increased the number of the considered query RNAs to 306 and used the expressed transcripts only.</bold>
                </p>
                <p> </p>
                <p> 7) The sum scores from the Triplexator analysis are shown which does not mean that the same regions are involved in triplex formation. It would be much more convincing to show similarities (or differences) of triplex-forming RNAs for a given TTS in a given bin.</p>
                <p> We no longer use sum scores as a measure of triplex-based interaction. Instead, we estimate the statistical significance (p-value) of each RNA-DNA interaction based on the number of predicted triplexes.</p>
                <p> 
                    <bold>Our work was focused on the properties of the DNA sequences that may allow them to interact with various different RNAs. We therefore analyzed the parts of the ChOP-seq/Capture-seq peaks universal TTSs that allowed them to interact with various different RNAs. Our analysis indicates that such triplex-forming hot-spots frequently coincide with the purine-rich low complexity genomic regions.</bold>
                </p>
                <p> </p>
                <p> 8) The terms &#x2018;universal TTS&#x2019; and &#x2018;universal bins&#x2019; are not synonymous and interchangeable!! One bin (3000 bp) can contain many putative TTSs.</p>
                <p> 
                    <bold>We do not use the concept of bins and &#x2018;universal bins&#x2019; in the current version of the manuscript. However, we kept the term &#x2018;universal TTS&#x2019;. </bold>
                </p>
                <p> </p>
                <p> 9) If there are only 18 &#x2018;universal bins&#x2019; out of 3620 bins among seven RNA analyses, this small number is not sufficient for claiming that there is no specificity in RNA targeting.</p>
                <p> 
                    <bold>We would like to emphasize that our paper don't question the concept of the sequence specific triplex-dependent gene regulation (moreover, we support this idea and conduct research in this direction). We claim that some genomic regions may have a potential to form triple helices with a variety of different long RNAs forming "universal" triplex target sites (TTSs). At the same time, we do not challenge the sequence specificificity of the other triplex-based RNA-DNA interactions.</bold>
                </p>
                <p> </p>
                <p> 10) They hypothesize that &#x2018;the universal TTS can be viewed as the anchor point which can be bound by various nuclear RNAs to provide long-distance chromosomal contacts&#x2019;. Even if this might be true, without any supportive data this is pure speculation.</p>
                <p> We agree that at the moment our claim is a hypothesis/speculation. Nevertheless, the recent published results [PMID: 30605520] as well our own indicate the possibility of such mechanism. By discussing it in the current manuscript we hope to attract attention of experimental biologists to further study this topic.</p>
                <p> 
                    <bold>We modified the text in the paper to clarify the issue:</bold>
                </p>
                <p> 
                    <bold>"One of the possible and actively discussed roles of the chromatin bound RNAs (including lncRNAs) is to bring different chromosomal parts together to enable the remote DNA-DNA contacts. Moreover, it has recently been shown that RNAs originating from super-enhancers form triplexes at distant regions. Therefore, it is possible that universal TTSs may facilitate distal enhancer-promoter interactions via engagement with the same enhancer RNA. In line with this hypothesis, we observed the statistical significant enrichment of the universal Capture-</bold>
                </p>
                <p> 
                    <bold>seq peaks near (&lt; 1 kb) the transcription start sites (TSSs) of the annotated genes (Supplementary Figure 10C)."</bold>
                </p>
                <p> </p>
                <p> 11) Altogether, the authors&#x00b4; claim that triplex formation occurs almost sequence-independent is not justified but is based solely on in silico analyses. At least another available bioinformatics tool should have been used and standard in vitro assays (e.g. EMSA experiments) should have been performed to validate that the candidate RNAs are indeed capable to form triplexes. The authors do not even mention that the in vivo situation might be completely different than algorithm-based predictions and that there might be additional factors/constraints involved in triplex formation and stability.</p>
                <p> 
                    <bold>We added analysis of the recent in vitro data obtained by the Capture-seq method. According to this experimental data some genomic fragments can interact with all three RNA oligos used in the original study. This supports the observations obtained in the analysis of the ChOP-seq peaks. Our computational analysis classified almost 40% of these shared Capture-seq peaks as universal TTSs. Moreover, the ChOP-seq and the Capture-seq universal TTSs were similar in that they were enriched with the purine rich low complexity regions. We believe that these results indirectly support the existence of universal TTS . &#x00a0;Yet, experimental validation of these results is beyond the scope of the current paper.</bold>
                </p>
                <p> 
                    <bold>Still, at the end of the manuscript, we discuss the limitations of our computational approach and mention that the obtained results resemble mostly the situation with the naked DNA in vitro, than the interactions with the chromatin in vivo.</bold>
                </p>
                <p> 
                    <bold>We added &#x00a0;the following text to the discussion:</bold>
                </p>
                <p> 
                    <bold>"Importantly, the current computational analysis has a number of limitations. Namely, the triplex-based interactions of the full length transcripts were predicted without taking their secondary structure into account. We are not aware of any bioinformatics tools that would be able to produce such predictions. Moreover, cellular localization of the 153 selected expressed transcripts as well as DNA binding proteins and chromatin compaction were not considered. Therefore, our simulations are more similar to the in vitro Capture-seq experiments with short oligos than to the interactions of long transcripts with the chromatin inside the nucleus."</bold>
                </p>
            </body>
        </sub-article>
    </sub-article>
</article>
