<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="other" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.9259.1</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Software Tool Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                    <subj-group>
                        <subject>Bioinformatics</subject>
                    </subj-group>
                    <subj-group>
                        <subject>Genomics</subject>
                    </subj-group>
                    <subj-group>
                        <subject>Theory &amp; Simulation</subject>
                    </subj-group>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>GeneBreak: detection of recurrent DNA copy number aberration-associated chromosomal breakpoints within genes</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 1; peer review: 1 approved, 1 approved with reservations]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>van den Broek</surname>
                        <given-names>Evert</given-names>
                    </name>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a5">5</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>van Lieshout</surname>
                        <given-names>Stef</given-names>
                    </name>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Rausch</surname>
                        <given-names>Christian</given-names>
                    </name>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a5">5</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Ylstra</surname>
                        <given-names>Bauke</given-names>
                    </name>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>van de Wiel</surname>
                        <given-names>Mark A.</given-names>
                    </name>
                    <xref ref-type="aff" rid="a2">2</xref>
                    <xref ref-type="aff" rid="a3">3</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Meijer</surname>
                        <given-names>Gerrit A.</given-names>
                    </name>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a5">5</xref>
                </contrib>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Fijneman</surname>
                        <given-names>Remond J.A.</given-names>
                    </name>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a5">5</xref>
                </contrib>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Abeln</surname>
                        <given-names>Sanne</given-names>
                    </name>
                    <uri content-type="orcid">https://orcid.org/0000-0002-2779-7174</uri>
                    <xref ref-type="corresp" rid="c2">b</xref>
                    <xref ref-type="aff" rid="a4">4</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>Department of Pathology, VU University Medical Center, Amsterdam, 1081 HZ, The Netherlands</aff>
                <aff id="a2">
                    <label>2</label>Department of Epidemiology &amp; Biostatistics, VU University Medical Center, Amsterdam, 1081 HZ, The Netherlands</aff>
                <aff id="a3">
                    <label>3</label>Department of Mathematics, VU University Medical Center, Amsterdam, Amsterdam, 1081 HV, The Netherlands</aff>
                <aff id="a4">
                    <label>4</label>Department of Computer Science, VU University Medical Center, Amsterdam, 1081 HV, The Netherlands</aff>
                <aff id="a5">
                    <label>5</label>Department of Pathology, Netherlands Cancer Institute, Amsterdam, 1066CX, The Netherlands</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:r.fijneman@nki.nl">r.fijneman@nki.nl</email>
                </corresp>
                <corresp id="c2">
                    <label>b</label>
                    <email xlink:href="mailto:s.abeln@vu.nl">s.abeln@vu.nl</email>
                </corresp>
                <fn fn-type="con">
                    <p>EvdB, GM, RF and SA conceived the study. EvdB, SvL, MvdW, GM, RF and SA designed the workflow and EvdB, SvL and MvdW developed and tested the code. MvdW provided expertise in biostatistics. CR and BY provided expertise in analysis of CNA data obtained by array-CGH and WGS. EvdB, RF and SA prepared the first draft of the manuscript. All authors were involved in the revision of the draft manuscript and have agreed to the final content.</p>
                </fn>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>19</day>
                <month>9</month>
                <year>2016</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2016</year>
            </pub-date>
            <volume>5</volume>
            <elocation-id>2340</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>14</day>
                    <month>9</month>
                    <year>2016</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2016 van den Broek E et al.</copyright-statement>
                <copyright-year>2016</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/5-2340/pdf"/>
            <abstract>
                <p>Development of cancer is driven by somatic alterations, including numerical and structural chromosomal aberrations. Currently, several computational methods are available and are widely applied to detect numerical copy number aberrations (CNAs) of chromosomal segments in tumor genomes. However, there is lack of computational methods that systematically detect structural chromosomal aberrations by virtue of the genomic location of CNA-associated chromosomal breaks and identify genes that appear non-randomly affected by chromosomal breakpoints across (large) series of tumor samples. &#x2018;GeneBreak&#x2019; is developed to systematically identify genes recurrently affected by the genomic location of chromosomal CNA-associated breaks by a genome-wide approach, which can be applied to DNA copy number data obtained by array-Comparative Genomic Hybridization (CGH) or by (low-pass) whole genome sequencing (WGS). First, &#x2018;GeneBreak&#x2019; collects the genomic locations of chromosomal CNA-associated breaks that were previously pinpointed by the segmentation algorithm that was applied to obtain CNA profiles. Next, a tailored annotation approach for breakpoint-to-gene mapping is implemented. Finally, dedicated cohort-based statistics is incorporated with correction for covariates that influence the probability to be a breakpoint gene. In addition, multiple testing correction is integrated to reveal recurrent breakpoint events. This easy-to-use algorithm, &#x2018;GeneBreak&#x2019;, is implemented in R (
                    <ext-link ext-link-type="uri" xlink:href="www.cran.r-project.org">www.cran.r-project.org</ext-link>) and is available from Bioconductor (
                    <ext-link ext-link-type="uri" xlink:href="www.bioconductor.org/packages/release/bioc/html/GeneBreak.html">www.bioconductor.org/packages/release/bioc/html/GeneBreak.html</ext-link>).</p>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>structural chromosomal aberrations</kwd>
                <kwd>recurrent breakpoint genes</kwd>
                <kwd>molecular characterization</kwd>
                <kwd>cancer genome</kwd>
                <kwd>copy number aberration profile</kwd>
                <kwd>computational method</kwd>
            </kwd-group>
            <funding-group>
                <funding-statement>This work was supported by the VUmc-Cancer Center Amsterdam [to E.vd.B.]; performed within the framework of the Center for Translational Molecular Medicine, DeCoDe project [03O-101]; and CTMM-TraIT [05T-401 to EvdB, SvL, BY, GM, RF and SA].</funding-statement>
                <funding-statement>
                    <italic>The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</italic>
                </funding-statement>
            </funding-group>
        </article-meta>
    </front>
    <body>
        <sec sec-type="intro">
            <title>Introduction</title>
            <p>Tumor development is driven by irreversible somatic genomic aberrations such as small nucleotide variants (SNVs) and chromosomal aberrations including numerical as well as structural changes
                <sup>
                    <xref ref-type="bibr" rid="ref-1">1</xref>,
                    <xref ref-type="bibr" rid="ref-2">2</xref>
                </sup>. Genome-wide somatic DNA copy number aberrations (CNA) profiling is a widely established approach to characterize chromosomal aberrations in cancer genomes. At present, application of computational methods has mainly been focused on the analysis of numerical aberrations of chromosomal segments. Recently, evidence is emerging that genes affected by structural chromosomal aberrations, 
                <italic toggle="yes">i.e.</italic> genes affected by chromosomal breaks, represent a biologically and clinically relevant class of mutations in many cancer types including solid tumors
                <sup>
                    <xref ref-type="bibr" rid="ref-3">3</xref>&#x2013;
                    <xref ref-type="bibr" rid="ref-6">6</xref>
                </sup>. Importantly, the actual locations of chromosomal CNA-associated breakpoints, which are the points of copy number level shift in somatic CNA profiles, indicate underlying chromosomal breaks and thereby genomic locations affected by somatic structural aberrations
                <sup>
                    <xref ref-type="bibr" rid="ref-5">5</xref>&#x2013;
                    <xref ref-type="bibr" rid="ref-12">12</xref>
                </sup>. Hence, the wide availability of large series of high-resolution DNA copy number data by for instance array-Comparative Genomic Hybridization (CGH) or by (low-pass) whole genome sequencing (WGS) approaches enables to systematically search for regions and genes that are affected by CNA-associated structural chromosomal changes. Computational methods determining numerical CNAs, consequently, also yield CNA-associated breakpoint locations. However, it is not trivial to identify genes that are recurrently affected by CNA-associated chromosomal breakpoints across (large) series of cancer samples since this methodology also requires dedicated computational methods including comprehensive statistical evaluation.</p>
            <p>We here provide a computational method, &#x2018;GeneBreak&#x2019;, that identifies chromosomal breakpoint locations using DNA copy number profiles. A tailored annotation approach maps breakpoint locations to genes for each individual profile. Moreover, dedicated comprehensive cohort-based statistical analysis including correction for covariates that influence the probability to be a breakpoint gene and multiple testing pinpoints genes that are non-randomly and recurrently affected by chromosomal breaks across multiple tumor samples
                <sup>
                    <xref ref-type="bibr" rid="ref-5">5</xref>
                </sup>. &#x2018;GeneBreak&#x2019; is implemented in R (
                <ext-link ext-link-type="uri" xlink:href="http://www.cran.r-project.org">www.cran.r-project.org</ext-link>) and is available from Bioconductor (
                <ext-link ext-link-type="uri" xlink:href="https://www.bioconductor.org/packages/release/bioc/html/GeneBreak.html">www.bioconductor.org/packages/release/bioc/html/GeneBreak.html</ext-link>). The Bioconductor vignette describes a detailed example workflow of CNA data obtained by analysis of 200 array-CGH samples. A schematic overview of computational methods is depicted in 
                <xref ref-type="fig" rid="f1">Figure 1</xref>.</p>
            <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                <label>Figure 1. </label>
                <caption>
                    <title>Schematic overview of computational methods.</title>
                    <p>GeneBreak&#x2019; requires already segmented DNA copy number data from array-CGH or WGS approaches. The first step involves detection of breakpoint locations. Next, breakpoint locations will be mapped to gene annotations in order to identify genes affected by DNA breakpoints. The final step performs comprehensive cohort-based statistical analyses including correction for multiple testing to reveal both recurrent breakpoint locations and breakpoint genes. The breakpoint frequencies can be visualized with a built-in plot function. This example visualizes the breakpoint locations (vertical black bars) and breakpoint genes (horizontal red bars) on the p-arm of chromosome 20 identified in a cohort of 352 advanced colorectal cancers. The genes labeled with a name are statistically significant recurrent breakpoint genes (FDR&lt;0.1).</p>
                </caption>
                <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/9967/6d9985d8-83d0-4207-b109-0d5215934396_figure1.gif"/>
            </fig>
        </sec>
        <sec sec-type="methods">
            <title>Methods</title>
            <sec>
                <title>DNA copy number profiles</title>
                <p>The breakpoint detection method we provide is amenable for data from any DNA copy number discovery platform, 
                    <italic toggle="yes">e.g.</italic> array-CGH and (low-pass) WGS, and copy number detection algorithm. For optimal results, &#x2018;GeneBreak&#x2019; takes DNA copy number data that are pre-processed by the R-package &#x2018;CGHcall&#x2019;
                    <sup>
                        <xref ref-type="bibr" rid="ref-13">13</xref>
                    </sup> or &#x2018;QDNAseq&#x2019;
                    <sup>
                        <xref ref-type="bibr" rid="ref-14">14</xref>
                    </sup>, both based on the Circular Binary Segmentation algorithm
                    <sup>
                        <xref ref-type="bibr" rid="ref-15">15</xref>
                    </sup>, as input. Alternatively, segmented values (log2-ratios) from a different copy number detection algorithm can be used. In addition, it is recommended to provide discrete DNA copy number states (
                    <italic toggle="yes">e.g.</italic> loss, neutral, gain) that can be used for breakpoint selection. Bioconductor vignette and manual describe commands and workflows in detail (See 
                    <xref ref-type="other" rid="SM">Supplementary material</xref>).</p>
            </sec>
            <sec>
                <title>Breakpoint detection and filter options</title>
                <p>Breakpoints are defined by the chromosomal locations that separate the contiguous DNA copy number segments pinpointed by a segmentation algorithm. &#x2018;GeneBreak&#x2019; identifies chromosomal breakpoint locations for each individual DNA copy number profile. Instead of taking all detected breakpoints, users may want to define more precisely what breakpoints to take into account, based on the two flanking DNA copy number segment characteristics. One of the following three selection options can be applied. A) 
                    <italic toggle="yes">Copy number-deviation</italic>: this selects breakpoints where the shift in log2-ratio between two consecutive DNA copy number segments exceeds the user-defined threshold; B) 
                    <italic toggle="yes">CNA-associated breakpoints:</italic> this selects all breakpoints between consecutive DNA copy number segments, except for breakpoints flanked by two copy number neutral segments; C) 
                    <italic toggle="yes">CNA-breakpoints</italic>: this selects only those breakpoints flanked by segments with dissimilar discrete DNA copy number states.</p>
                <p>Due to the typical granularity of the DNA copy number profile data localization (distance between microarray probes or bin size of WGS copy number data), the detected breakpoints that are defined by the genomic start position of the copy number segments, in fact represent a chromosomal interval.</p>
            </sec>
            <sec>
                <title>Breakpoint gene identification</title>
                <p>For identification of genes affected by chromosomal breakpoints the built-in gene annotations can be used. Alternatively, a user-defined gene annotation file can be provided (see Bioconductor vignette and manual for further details). The implemented mapping approach identifies genes that are associated with one or multiple chromosomal breakpoint intervals.</p>
            </sec>
            <sec>
                <title>Cohort-based breakpoint statistics: breakpoint and gene level</title>
                <p>Cohort-based identification of recurrent breakpoint events can be performed on both genome location- and gene-level. The default statistical analysis includes standard Benjamini-Hochberg false discovery rate (FDR) correction for multiple testing. This method assumes the same permutation null- distribution for all candidate breakpoint events for the analysis of breakpoints at the level of genomic location. For the gene level however, we recommend to apply the built-in regression-based correction for covariates that may influence the breakpoint probability including the number of breakpoints in a tumor profile, the number of gene-associated features and the gene length by gene-associated feature coverage. In addition, a more comprehensive and powerful dedicated Benjamini-Hochberg FDR correction that accounts for discreteness in the null-distribution is supplied
                    <sup>
                        <xref ref-type="bibr" rid="ref-16">16</xref>
                    </sup>. Commands and example workflow can be found in Bioconductor vignette and manual.</p>
            </sec>
        </sec>
        <sec>
            <title>Use case</title>
            <sec>
                <title>Identification of recurrent breakpoint genes in advanced colorectal cancers</title>
                <p>We applied our method to 352 high-resolution array-CGH samples from a series of advanced colorectal cancers
                    <sup>
                        <xref ref-type="bibr" rid="ref-17">17</xref>
                    </sup> following CNA detection using &#x2018;CGHcall&#x2019;
                    <sup>
                        <xref ref-type="bibr" rid="ref-13">13</xref>
                    </sup>. Array-CGH data are available in the Gene Expression Omnibus database under accession number GSE63216 (
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/projects/geo/">www.ncbi.nlm.nih.gov/projects/geo/</ext-link>). We selected for the CNA-associated breakpoints (setting: &#x2018;CNA-associated&#x2019;), used gene annotations from ensembl (human genome NCBI build36/hg18, release 54) and applied the dedicated Benjamini-Hochberg-type FDR correction (setting: &#x2018;Gilbert&#x2019;), for recurrent breakpoint gene identification. A total of 748 genes appeared to be recurrently affected by chromosomal breaks (FDR&lt;0.1)
                    <sup>
                        <xref ref-type="bibr" rid="ref-5">5</xref>
                    </sup>. Breakpoint frequencies of chromosome 20p are visualized with the built-in plot function (
                    <xref ref-type="fig" rid="f1">Figure 1</xref>; see Bioconductor vignette and manual for further details about this function). Interestingly, patient stratification based on recurrent gene breakpoints and well-known point mutations by propagation to the predefined STRING human protein interaction network revealed one CRC subtype with very poor prognosis, which supported clinical relevance of this class of somatic aberrations in advanced colorectal cancers
                    <sup>
                        <xref ref-type="bibr" rid="ref-5">5</xref>
                    </sup>.</p>
            </sec>
        </sec>
        <sec sec-type="conclusions">
            <title>Conclusion</title>
            <p>Genome instability including numerical and structural somatic chromosomal aberrations is a hallmark of cancer. Several tools are available that focus on detection of numerical aberrations of large chromosome segments. The R-package &#x2018;GeneBreak&#x2019; extracts additional information from CNA data. &#x2018;GeneBreak&#x2019; provides an easy-to-use algorithm, which handles identification of genomic breakpoint locations, mapping of breakpoints to genes and includes a comprehensive statistical approach to reveal recurrent breakpoint genes from series of tumor samples. Therefore, &#x2018;GeneBreak&#x2019; can be applied to detect CNA-associated chromosomal breaks in individual tumor samples and facilitates detection of recurrent breakpoint genes across multiple tumor samples.</p>
        </sec>
        <sec>
            <title>Data and software availability</title>
            <p>Publicly available copy number data used for the use case is deposited at Gene Expression Omnibus database under accession number GSE63216 (
                <ext-link ext-link-type="uri" xlink:href="https://protect-eu.mimecast.com/s/6LQhBmNGvCG">https://protect-eu.mimecast.com/s/6LQhBmNGvCG</ext-link>).</p>
            <p>Software available from: 
                <italic toggle="yes">C 
                    <ext-link ext-link-type="uri" xlink:href="https://www.bioconductor.org/packages/release/bioc/html/GeneBreak.html">www.bioconductor.org/packages/release/bioc/html/GeneBreak.html</ext-link> and</italic>
				
                <ext-link ext-link-type="uri" xlink:href="https://protect-eu.mimecast.com/s/aLGhBqmpgF2">https://protect-eu.mimecast.com/s/aLGhBqmpgF2</ext-link>
			</p>
            <p>Latest source code: 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/F1000Research/GeneBreak/releases/tag/v1.0">https://github.com/F1000Research/GeneBreak/releases/tag/v1.0</ext-link>
			</p>
            <p>Archived source code as at the time of publication: F1000Research/Genebreak, doi: 
                <ext-link ext-link-type="uri" xlink:href="https://10.5281/zenodo.153937">10.5281/zenodo.153937</ext-link>
                <sup>
                    <xref ref-type="bibr" rid="ref-18">18</xref>
                </sup>
			</p>
            <p>License: GPL 2</p>
        </sec>
    </body>
    <back>
        <sec id="SM" sec-type="supplementary-material">
            <title>Supplementary material</title>
            <p>
				
                <bold>GeneBreak vignette.</bold>
			</p>
            <p>
				
                <ext-link ext-link-type="uri" xlink:href="https://f1000researchdata.s3.amazonaws.com/supplementary/9259/b9923769-65e7-4664-819e-73db3794f9b5.pdf">Click here to access the data</ext-link>.</p>
            <p>
				
                <bold>GeneBreak Manual</bold>
			</p>
            <p>
				
                <ext-link ext-link-type="uri" xlink:href="https://f1000researchdata.s3.amazonaws.com/supplementary/9259/b0fbf3c2-13c5-4803-9f0b-ae37f1e3e412.pdf">Click here to access the data</ext-link>.</p>
        </sec>
        <ref-list>
            <ref id="ref-1">
                <label>1</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Stratton</surname>
                            <given-names>MR</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Campbell</surname>
                            <given-names>PJ</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Futreal</surname>
                            <given-names>PA</given-names>
                        </name>
					</person-group>:
                    <article-title>The cancer genome.</article-title>
                    <source>
						
                        <italic toggle="yes">Nature.</italic>
					</source>
                    <year>2009</year>;<volume>458</volume>(<issue>7239</issue>):<fpage>719</fpage>&#x2013;<lpage>724</lpage>.
                    <pub-id pub-id-type="pmid">19360079</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nature07943</pub-id>
                    <pub-id pub-id-type="pmcid">2821689</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-2">
                <label>2</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Forbes</surname>
                            <given-names>SA</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Bindal</surname>
                            <given-names>N</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Bamford</surname>
                            <given-names>S</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer.</article-title>
                    <source>
						
                        <italic toggle="yes">Nucleic Acids Res.</italic>
					</source>
                    <year>2011</year>;<volume>39</volume>(<issue>Database issue</issue>):<fpage>D945</fpage>&#x2013;<lpage>D950</lpage>.
                    <pub-id pub-id-type="pmid">20952405</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkq929</pub-id>
                    <pub-id pub-id-type="pmcid">3013785</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-3">
                <label>3</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Mitelman</surname>
                            <given-names>F</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Johansson</surname>
                            <given-names>B</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Mertens</surname>
                            <given-names>F</given-names>
                        </name>
					</person-group>:
                    <article-title>The impact of translocations and gene fusions on cancer causation.</article-title>
                    <source>
						
                        <italic toggle="yes">Nat Rev Cancer.</italic>
					</source>
                    <year>2007</year>;<volume>7</volume>(<issue>4</issue>):<fpage>233</fpage>&#x2013;<lpage>245</lpage>.
                    <pub-id pub-id-type="pmid">17361217</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nrc2091</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-4">
                <label>4</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Inaki</surname>
                            <given-names>K</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Liu</surname>
                            <given-names>ET</given-names>
                        </name>
					</person-group>:
                    <article-title>Structural mutations in cancer: mechanistic and functional insights.</article-title>
                    <source>
						
                        <italic toggle="yes">Trends Genet.</italic>
					</source>
                    <year>2012</year>;<volume>28</volume>(<issue>11</issue>):<fpage>550</fpage>&#x2013;<lpage>559</lpage>.
                    <pub-id pub-id-type="pmid">22901976</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.tig.2012.07.002</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-5">
                <label>5</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>van den Broek</surname>
                            <given-names>E</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Dijkstra</surname>
                            <given-names>MJ</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Krijgsman</surname>
                            <given-names>O</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>High Prevalence and Clinical Relevance of Genes Affected by Chromosomal Breaks in Colorectal Cancer.</article-title>
                    <source>
						
                        <italic toggle="yes">PLoS One.</italic>
					</source>
                    <year>2015</year>;<volume>10</volume>(<issue>9</issue>):<fpage>e0138141</fpage>.
                    <pub-id pub-id-type="pmid">26375816</pub-id>
                    <pub-id pub-id-type="doi">10.1371/journal.pone.0138141</pub-id>
                    <pub-id pub-id-type="pmcid">4574474</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-6">
                <label>6</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Malhotra</surname>
                            <given-names>A</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Lindberg</surname>
                            <given-names>M</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Faust</surname>
                            <given-names>GG</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Breakpoint profiling of 64 cancer genomes reveals numerous complex rearrangements spawned by homology-independent mechanisms.</article-title>
                    <source>
						
                        <italic toggle="yes">Genome Res.</italic>
					</source>
                    <year>2013</year>;<volume>23</volume>(<issue>5</issue>):<fpage>762</fpage>&#x2013;<lpage>776</lpage>.
                    <pub-id pub-id-type="pmid">23410887</pub-id>
                    <pub-id pub-id-type="doi">10.1101/gr.143677.112</pub-id>
                    <pub-id pub-id-type="pmcid">3638133</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-7">
                <label>7</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Edwards</surname>
                            <given-names>PA</given-names>
                        </name>
					</person-group>:
                    <article-title>Fusion genes and chromosome translocations in the common epithelial cancers.</article-title>
                    <source>
						
                        <italic toggle="yes">J Pathol.</italic>
					</source>
                    <year>2010</year>;<volume>220</volume>(<issue>2</issue>):<fpage>244</fpage>&#x2013;<lpage>254</lpage>.
                    <pub-id pub-id-type="pmid">19921709</pub-id>
                    <pub-id pub-id-type="doi">10.1002/path.2632</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-8">
                <label>8</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Hermsen</surname>
                            <given-names>M</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Snijders</surname>
                            <given-names>A</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Guerv&#x00f3;s</surname>
                            <given-names>MA</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Centromeric chromosomal translocations show tissue-specific differences between squamous cell carcinomas and adenocarcinomas.</article-title>
                    <source>
						
                        <italic toggle="yes">Oncogene.</italic>
					</source>
                    <year>2005</year>;<volume>24</volume>(<issue>9</issue>):<fpage>1571</fpage>&#x2013;<lpage>1579</lpage>.
                    <pub-id pub-id-type="pmid">15674345</pub-id>
                    <pub-id pub-id-type="doi">10.1038/sj.onc.1208294</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-9">
                <label>9</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Muggeo</surname>
                            <given-names>VM</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Adelfio</surname>
                            <given-names>G</given-names>
                        </name>
					</person-group>:
                    <article-title>Efficient change point detection for genomic sequences of continuous measurements.</article-title>
                    <source>
						
                        <italic toggle="yes">Bioinformatics.</italic>
					</source>
                    <year>2011</year>;<volume>27</volume>(<issue>2</issue>):<fpage>161</fpage>&#x2013;<lpage>166</lpage>.
                    <pub-id pub-id-type="pmid">21088029</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btq647</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-10">
                <label>10</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Ritz</surname>
                            <given-names>A</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Paris</surname>
                            <given-names>PL</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Ittmann</surname>
                            <given-names>MM</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Detection of recurrent rearrangement breakpoints from copy number data.</article-title>
                    <source>
						
                        <italic toggle="yes">BMC Bioinformatics.</italic>
					</source>
                    <year>2011</year>;<volume>12</volume>:<fpage>114</fpage>.
                    <pub-id pub-id-type="pmid">21510904</pub-id>
                    <pub-id pub-id-type="doi">10.1186/1471-2105-12-114</pub-id>
                    <pub-id pub-id-type="pmcid">3112242</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-11">
                <label>11</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Tolo&#x015f;i</surname>
                            <given-names>L</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Thei&#x00df;en</surname>
                            <given-names>J</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Halachev</surname>
                            <given-names>K</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>A method for finding consensus breakpoints in the cancer genome from copy number data.</article-title>
                    <source>
						
                        <italic toggle="yes">Bioinformatics.</italic>
					</source>
                    <year>2013</year>;<volume>29</volume>(<issue>14</issue>):<fpage>1793</fpage>&#x2013;<lpage>1800</lpage>.
                    <pub-id pub-id-type="pmid">23716195</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btt300</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-12">
                <label>12</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Liu</surname>
                            <given-names>H</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Zilberstein</surname>
                            <given-names>A</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Pannier</surname>
                            <given-names>P</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Evaluating translocation gene fusions by SNP array data.</article-title>
                    <source>
						
                        <italic toggle="yes">Cancer Inform.</italic>
					</source>
                    <year>2012</year>;<volume>11</volume>:<fpage>15</fpage>&#x2013;<lpage>27</lpage>.
                    <pub-id pub-id-type="pmid">22259228</pub-id>
                    <pub-id pub-id-type="doi">10.4137/CIN.S8026</pub-id>
                    <pub-id pub-id-type="pmcid">3256939</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-13">
                <label>13</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>van de Wiel</surname>
                            <given-names>MA</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Kim</surname>
                            <given-names>KI</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Vosse</surname>
                            <given-names>SJ</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>CGHcall: calling aberrations for array CGH tumor profiles.</article-title>
                    <source>
						
                        <italic toggle="yes">Bioinformatics.</italic>
					</source>
                    <year>2007</year>;<volume>23</volume>(<issue>7</issue>):<fpage>892</fpage>&#x2013;<lpage>894</lpage>.
                    <pub-id pub-id-type="pmid">17267432</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btm030</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-14">
                <label>14</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Scheinin</surname>
                            <given-names>I</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Sie</surname>
                            <given-names>D</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Bengtsson</surname>
                            <given-names>H</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>DNA copy number analysis of fresh and formalin-fixed specimens by shallow whole-genome sequencing with identification and exclusion of problematic regions in the genome assembly.</article-title>
                    <source>
						
                        <italic toggle="yes">Genome Res.</italic>
					</source>
                    <year>2014</year>;<volume>24</volume>(<issue>12</issue>):<fpage>2022</fpage>&#x2013;<lpage>2032</lpage>.
                    <pub-id pub-id-type="pmid">25236618</pub-id>
                    <pub-id pub-id-type="doi">10.1101/gr.175141.114</pub-id>
                    <pub-id pub-id-type="pmcid">4248318</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-15">
                <label>15</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Olshen</surname>
                            <given-names>AB</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Venkatraman</surname>
                            <given-names>ES</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Lucito</surname>
                            <given-names>R</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Circular binary segmentation for the analysis of array-based DNA copy number data.</article-title>
                    <source>
						
                        <italic toggle="yes">Biostatistics.</italic>
					</source>
                    <year>2004</year>;<volume>5</volume>(<issue>4</issue>):<fpage>557</fpage>&#x2013;<lpage>572</lpage>.
                    <pub-id pub-id-type="pmid">15475419</pub-id>
                    <pub-id pub-id-type="doi">10.1093/biostatistics/kxh008</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-16">
                <label>16</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Gilbert</surname>
                            <given-names>PB</given-names>
                        </name>
					</person-group>:
                    <article-title>A modified false discovery rate multiple-comparisons procedure for discrete data, applied to human immunodeficiency virus genetics.</article-title>
                    <source>
						
                        <italic toggle="yes">Appl Statist.</italic>
					</source>
                    <year>2005</year>;<volume>54</volume>(<issue>1</issue>):<fpage>143</fpage>&#x2013;<lpage>158</lpage>.
                    <pub-id pub-id-type="doi">10.1111/j.1467-9876.2005.00475.x</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-17">
                <label>17</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Haan</surname>
                            <given-names>JC</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Labots</surname>
                            <given-names>M</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Rausch</surname>
                            <given-names>C</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Genomic landscape of metastatic colorectal cancer.</article-title>
                    <source>
						
                        <italic toggle="yes">Nat Commun.</italic>
					</source>
                    <year>2014</year>;<volume>5</volume>: 5457.
                    <pub-id pub-id-type="pmid">25394515</pub-id>
                    <pub-id pub-id-type="doi">10.1038/ncomms6457</pub-id>
                    <pub-id pub-id-type="pmcid">4243240</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-18">
                <label>18</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Broek</surname>
                            <given-names>E</given-names>
                        </name>
					</person-group>:
                    <article-title>F1000Research/GeneBreak.</article-title>
                    <source>
					
                        <italic toggle="yes">Zenodo</italic>. 
					</source>
                    <year>2016</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.5281/zenodo.153937">Data Source</ext-link>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report18598">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.9967.r18598</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Rubio</surname>
                        <given-names>Angel</given-names>
                    </name>
                    <xref ref-type="aff" rid="r18598a1">1</xref>
                    <role>Referee</role>
                </contrib>
                <aff id="r18598a1">
                    <label>1</label>Group of Bioinformatics, TECNUN, University of Navarra, San Sebastian, Spain</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>13</day>
                <month>2</month>
                <year>2017</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2017 Rubio A</copyright-statement>
                <copyright-year>2017</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport18598" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.9259.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The paper shows an inspiring vision of the copy number changes in the genome focusing on the "changes" more than on the levels of change. The underlying reasoning is that a copy number change, if occurs within the loci occupied by a gene, implies an alteration in the coding sequence of the gene.</p>
            <p> </p>
            <p> In addition, it is shown that these changes occur recurrently, i.e. the loci where the copy number changes tend to be similar in different samples with the same type of cancer.</p>
            <p> The methodology has been uploaded to Bioconductor. The stringent quality checks of Bioconductor guarantees the availability for different platforms and, in fact, the vignette is easy to follow and use.</p>
            <p> </p>
            <p> My main concern with this paper is the (lack of) description of the statistical method to state the recurrence of the copy number changes. Within the methods section is only stated that there are two methods (genome location and gene-level) but the differences between them or the underlying statistical model is missing.</p>
            <p>Reviewer Expertise:</p>
            <p>NA</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report16416">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.9967.r16416</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Marschall</surname>
                        <given-names>Tobias</given-names>
                    </name>
                    <xref ref-type="aff" rid="r16416a1">1</xref>
                    <role>Referee</role>
                </contrib>
                <aff id="r16416a1">
                    <label>1</label>Center for Bioinformatics, Max-Planck Institute for Infomatics, Saarbr&#x00fc;cken, Germany</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>9</day>
                <month>1</month>
                <year>2017</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2017 Marschall T</copyright-statement>
                <copyright-year>2017</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport16416" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.9259.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>GeneBreak is an R package to help identifying recurrent breakpoints of copy number variants (CNVs). While the offered analyses are straightforward from a methodological point of view, this package can be valuable in practice, providing an easy and reproducible way to conduct such analyses. I appreciate that it is available from bioconductor (and hence easily installable), openly developed on github, and archived on zenodo.</p>
            <p> </p>
            <p> I merely have some minor suggestions for improvements: 
                <list list-type="bullet">
                    <list-item>
                        <p>When trying to use the package, I couldn't open the example data (got a "data set &#x2018;copynumber.data.chr20&#x2019; not found" error). Could you verify that it's available?</p>
                    </list-item>
                    <list-item>
                        <p>First sentence: SNV commonly means "single nucleotide variation" (not "small").</p>
                    </list-item>
                    <list-item>
                        <p>P1, L9: "
                            <italic>Recently, ...</italic>" Of course what you consider "recent" is a matter of taste, but here you are citing a review paper from 2007. I wouldn't call this recent.</p>
                    </list-item>
                    <list-item>
                        <p>P1, Methods, L3: "
                            <italic>... and copy number detection algorithm</italic>" Either explain what exactly you mean here, or remove.</p>
                    </list-item>
                    <list-item>
                        <p>P1, paragraph "
                            <italic>Due to the typical granularity [...], in fact represent a chromosomal interval.</italic>" I can guess what you mean here, but writing this more clearly would be good.</p>
                    </list-item>
                    <list-item>
                        <p>P1, "
                            <italic>This method assumes the same permutation null- distribution for all candidate breakpoint events for the analysis of breakpoints at the level of genomic location.</italic>" Could you describe in more detail how the null distribution is obtained?&#x00a0;</p>
                    </list-item>
                    <list-item>
                        <p>P1, "
                            <italic>In addition, a more comprehensive and powerful dedicated Benjamini-Hochberg FDR correction that accounts for discreteness in the null-distribution is supplied.</italic>" The Benjamini-Hochberg procedure is a well defined statistical method. I would rephrase the respective sentence(s) to explicitly say that you are talking about Gilbert's method.</p>
                    </list-item>
                </list>
            </p>
            <p>Reviewer Expertise:</p>
            <p>NA</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
</article>
