<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="other" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.16083.2</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Software Tool Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>seqCAT: a Bioconductor R-package for variant analysis of high throughput sequencing data</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 2; peer review: 1 approved, 1 not approved]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Fasterius</surname>
                        <given-names>Erik</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-0492-9960</uri>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Al-Khalili Szigyarto</surname>
                        <given-names>Cristina</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Funding Acquisition</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-6990-1905</uri>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>School of Chemistry, Biotechnology and Health, KTH - Royal Institute of Technology, Stockholm, 10691, Sweden</aff>
                <aff id="a2">
                    <label>2</label>Science for Life Laboratory, KTH Royal Institute of Technology, Solna, Sweden</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:caks@kth.se">caks@kth.se</email>
                </corresp>
                <fn id="fn1">
                    <p>*To whom correspondence should be sent</p>
                </fn>
                <fn fn-type="conflict">
                    <p>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>12</day>
                <month>8</month>
                <year>2019</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2018</year>
            </pub-date>
            <volume>7</volume>
            <elocation-id>1466</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>22</day>
                    <month>7</month>
                    <year>2019</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2019 Fasterius E and Al-Khalili Szigyarto C</copyright-statement>
                <copyright-year>2019</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/7-1466/pdf"/>
            <abstract>
                <p>High throughput sequencing technologies are flourishing in the biological sciences, enabling unprecedented insights into 
                    <italic toggle="yes">e.g.</italic> genetic variation, but require extensive bioinformatic expertise for the analysis. There is thus a need for simple yet effective software that can analyse both existing and novel data, providing interpretable biological results with little bioinformatic prowess. We present 
                    <italic toggle="yes">seqCAT</italic>, a Bioconductor toolkit for analysing genetic variation in high throughput sequencing data. It is a highly accessible, easy-to-use and well-documented R-package that enables a wide range of researchers to analyse their own and publicly available data, providing biologically relevant conclusions and publication-ready figures. SeqCAT can provide information regarding genetic similarities between an arbitrary number of samples, validate specific variants as well as define functionally similar variant groups for further downstream analyses. Its ease of use, installation, complete data-to-conclusions functionality and the inherent flexibility of the R programming language make seqCAT a powerful tool for variant analyses compared to already existing solutions. A publicly available dataset of liver cancer-derived organoids is analysed herein using the seqCAT package, corroborating the original authors' conclusions that the organoids are genetically stable. A previously known liver cancer-related mutation is additionally shown to be present in a sample though it was not listed in the original publication. Differences between DNA- and RNA-based variant calls in this dataset are also analysed revealing a high median concordance of 97.5%. SeqCAT is an open source software under a MIT licence available at 
                    <ext-link ext-link-type="uri" xlink:href="https://bioconductor.org/packages/release/bioc/html/seqCAT.html">https://bioconductor.org/packages/release/bioc/html/seqCAT.html</ext-link>.</p>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>High throughput sequencing</kwd>
                <kwd>whole exome sequencing</kwd>
                <kwd>RNA sequencing</kwd>
                <kwd>variant analysis</kwd>
                <kwd>single nucleotide variant</kwd>
                <kwd>R</kwd>
                <kwd>Bioconductor</kwd>
            </kwd-group>
            <funding-group>
                <award-group id="fund-1">
                    <funding-source>European Community 7th Framework Program</funding-source>
                    <award-id>Grantagreementno.278568&#x201c;PRIMES&#x201d;</award-id>
                </award-group>
                <funding-statement>This work was supported by the European Community 7th Framework Program under grant agreement no. 278 568 ``PRIMES''.</funding-statement>
                <funding-statement>
                    <italic>The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</italic>
                </funding-statement>
            </funding-group>
        </article-meta>
        <notes>
            <sec sec-type="version-changes">
                <label>Revised</label>
                <title>Amendments from Version 1</title>
                <p>Based on reviewers comments current version of the manuscript, the software and its documentation have been further improved. The manuscript clarifies the purpose and functionality of seqCAT as a software for genotype analysis. The software has been revised to allow access of information from additional file formats. The structure and use of data objects have been streamlined to only utilise data frames at the user-level. Software improvements aim to simplify the use of seqCAT and clarify the documentation. All changes implemented are described in the manuscript.</p>
            </sec>
        </notes>
    </front>
    <body>
        <sec sec-type="intro">
            <title>Introduction</title>
            <p>High throughput sequencing (HTS) technologies such as genome, exome and RNA sequencing (RNA-seq) have become some of the most powerful and widely used tools in biological research worldwide, and an increasing amount of such data is being stored in online data repositories (
                <italic toggle="yes">e.g.</italic> the Gene Expression Omnibus, GEO, and the Sequence Read Archive, SRA)
                <sup>
                    <xref ref-type="bibr" rid="ref-1">1</xref>&#x2013;
                    <xref ref-type="bibr" rid="ref-3">3</xref>
                </sup>. While decreasing experimental costs and optimised protocols enable a broad range of researchers to apply HTS to their respective scientific questions, 
                <italic toggle="yes">e.g.</italic> gene expression or genetic variation, the analysis of the resulting data is not a trivial matter, often requiring a high level of bioinformatic expertise
                <sup>
                    <xref ref-type="bibr" rid="ref-4">4</xref>,
                    <xref ref-type="bibr" rid="ref-5">5</xref>
                </sup>. This is especially true for variant analyses, where data is commonly stored using the relatively complex 
                <italic toggle="yes">variant call format</italic> (VCF). There is a multitude of software packages that can analyse data in the VCF format, but they vary in their functionality, outputs and simplicity of use. Software that focuses on applications other than genetic heterogeneity or general analysis of HTS variant data include 
                <italic toggle="yes">Anvi'o</italic>
                <sup>
                    <xref ref-type="bibr" rid="ref-6">6</xref>
                </sup> (metagenomics of microbial populations), 
                <italic toggle="yes">PhyloSNP</italic>
                <sup>
                    <xref ref-type="bibr" rid="ref-7">7</xref>
                </sup> (phylogenetic trees from SNP data), 
                <italic toggle="yes">KING</italic>
                <sup>
                    <xref ref-type="bibr" rid="ref-8">8</xref>
                </sup> (kinship estimation), 
                <italic toggle="yes">SomaticSniper</italic>
                <sup>
                    <xref ref-type="bibr" rid="ref-9">9</xref>
                </sup> (comparisons of paired tumour and normal samples), 
                <italic toggle="yes">PLINK</italic>
                <sup>
                    <xref ref-type="bibr" rid="ref-10">10</xref>
                </sup> (a toolkit for analysing whole genome association and population data), and 
                <italic toggle="yes">vcfR</italic>
                <sup>
                    <xref ref-type="bibr" rid="ref-11">11</xref>
                </sup> (quality control and filtering of VCF files). Many of these require use of the command line and are no longer being actively developed. Two examples of command line-based software are 
                <italic toggle="yes">vcftools</italic>
                <sup>
                    <xref ref-type="bibr" rid="ref-12">12</xref>
                </sup> and the R-package 
                <italic toggle="yes">VariantAnnotation</italic>
                <sup>
                    <xref ref-type="bibr" rid="ref-13">13</xref>
                </sup>. Both of these provide underlying structure and several ways to analyse variant data, but require further analysis of their outputs and can be difficult for less experienced users to work with. Software such as 
                <italic toggle="yes">BEDTools</italic>
                <sup>
                    <xref ref-type="bibr" rid="ref-14">14</xref>
                </sup>, 
                <italic toggle="yes">BEDOPS</italic>
                <sup>
                    <xref ref-type="bibr" rid="ref-15">15</xref>
                </sup> and related tools that work on genomic interval-based comparisons similarly require additional downstream analyses and necessitate using a command line-interface. While some of the previously mentioned software allow for a considerable number of different analyses to be performed, the choice of analysis or how to perform it based on the biological question at hand is not always apparent. The necessity of downstream analyses is also an important consideration. There are also several software tools with a more easy-to-use graphical interface such as the 
                <italic toggle="yes">Integrative Genomics Viewer</italic>
                <sup>
                    <xref ref-type="bibr" rid="ref-16">16</xref>
                </sup> or the web-based 
                <italic toggle="yes">Ensembl Genome Browser</italic>),
                <sup>
                    <xref ref-type="bibr" rid="ref-17">17</xref>
                </sup>. While such software can more easily provide visualisations and figures, they are generally more limited in their functionality. Web-based applications are restricted in the amount of data that can be uploaded, and also come with the added issue of data security
                <sup>
                    <xref ref-type="bibr" rid="ref-18">18</xref>
                </sup>. Proprietary software (such as the 
                <italic toggle="yes">Ingenuity Variant Analysis</italic>)
                <sup>
                    <xref ref-type="bibr" rid="ref-19">19</xref>
                </sup> not only require a licence to use, but also constitute a &#x201c;black box&#x201d; where the underlying methods are not available for direct inspection or scrutiny.</p>
            <p>There is thus a need for transparent, user-friendly and powerful bioinformatic tools to enable as many researchers as possible to analyse, visualise and interpret their own and publicly available HTS data. Two important aspects of such analyses is the true identity of cells analysed and comparability of both the biological samples and the data sets. Validation and evaluation of cell line authenticity, for example, is an increasingly widespread issue, as is the question of biological equivalence for any sample in general
                <sup>
                    <xref ref-type="bibr" rid="ref-20">20</xref>
                </sup>. Here we present an open source R-package, the 
                <italic toggle="yes">High Throughput Sequencing Cell Authentication Toolkit</italic> (seqCAT), which uses data from HTS experiments (whether it be of DNA or RNA origin) to investigate these matters.</p>
            <p>One of the common outputs from HTS experiments is that of 
                <italic toggle="yes">sequence variation</italic>. Single nucleotide variants (SNVs), for example, are sequence variations at the nucleotide level. Such data is the output of many variant calling programs and algorithms, which is used by seqCAT in order to analyse genetic differences between samples. We have previously demonstrated the usefulness and general applicability of such analyses for both cell line authentication
                <sup>
                    <xref ref-type="bibr" rid="ref-21">21</xref>
                </sup> and genetic heterogeneity in public cell line datasets
                <sup>
                    <xref ref-type="bibr" rid="ref-22">22</xref>
                </sup>. The capabilities of seqCAT include creation of SNV profiles from VCF files, comparisons of the overall genetic similarity between profiles, investigations of SNV impact distributions (
                <italic toggle="yes">i.e.</italic> variants&#x2019; predicted impact on protein function) as well as interrogations of the genotypes of previously known or user-specified variants across samples. Each individual profile can represent SNVs from a HTS experiment or from an external variant database.</p>
            <p>The seqCAT package distinguishes itself from those previously mentioned several ways. First, it includes not only processing and filtering of variant data, but also a number of downstream analyses and visualisations. It provides a simple way for researchers to explore, subset and group variants in ways that are of biological importance. Furthermore, its implementation in the R programming language allows the user to work on their data without the command line, while also giving access to the extensive library of packages provided by the R environment. While its applications are not as numerous as some existing software, seqCAT is focused in its exploration of genomic and transcriptomic variation across many biological contexts. Finally, seqCAT contains a detailed manual, only presents the user with base-R data objects and contains several helper functions aimed solely at making it simple and easy to use, allowing even novices to utilise it.</p>
            <p>In the present study, we use seqCAT to explore genetic differences within a public dataset containing both whole exome sequencing (WES) and RNA-seq data for long-term organoid cultures. We show that the organoids are genetically stable over a culture-period of several months, corroborating the original authors&#x2019; conclusions. We also demonstrate how seqCAT can be used to compare DNA- and RNA-based variant calls using the same dataset. The results highlight potential uses of variant analyses and demonstrate how seqCAT may be utilised to interrogate genetic differences at both the global and gene-specific level.</p>
        </sec>
        <sec sec-type="methods">
            <title>Methods</title>
            <p>SeqCAT was developed for the 
                <italic toggle="yes">Bioconductor</italic> repository for R-packages. It follows existing best coding practises, including a clean, modular and robust design, as per the requirements for Bioconductor packages
                <sup>
                    <xref ref-type="bibr" rid="ref-23">23</xref>
                </sup>. The basis of all seqCAT analyses are 
                <italic toggle="yes">SNV profiles</italic>: collections of filtered, high-quality SNVs for any given sample. The creation of these SNV profiles is performed by filtering an input VCF file based on the available variant calling quality metrics
                <sup>
                    <xref ref-type="bibr" rid="ref-21">21</xref>
                </sup>. These criteria are taken directly from the input VCF; they are based on the variant calling software used to create them and are not specific to seqCAT. There is also an option to skip this filtering step, for cases where the VCF does not contain any filtering information from the variant caller or when the user does not wish to perform filtering. Additional optional filtering steps include removal of variants below a specified sequencing depth (ten by default), removal of mitochondrial and non-standard chromosomes, as well as removal of duplicate variant entries. While profiles for individual samples may be created as needed by the user, several convenience-functions for working with multiple VCFs and profiles in aggregate are also available. SeqCAT can analyse VCF files with or without annotations from 
                <italic toggle="yes">e.g.</italic> snpEff
                <sup>
                    <xref ref-type="bibr" rid="ref-24">24</xref>
                </sup>.</p>
            <p>The SNV profiles are subsequently compared to each other in a pairwise manner, yielding information on 
                <italic toggle="yes">e.g.</italic> the 
                <italic toggle="yes">overlap</italic> (SNVs that are present in both samples being compared), the 
                <italic toggle="yes">concordance</italic> (the proportion of SNVs with identical genotypes for both samples) and the 
                <italic toggle="yes">similarity score</italic> (a previously defined weighted measure of the concordance)
                <sup>
                    <xref ref-type="bibr" rid="ref-22">22</xref>
                </sup>. Comparisons may be performed individually or in aggregate, depending on what type of analysis the user is interested in. Comparisons with external databases is also possible; seqCAT currently contains functionality to read and compare variants present in the 
                <italic toggle="yes">Catalogue of somatic mutations in cancer</italic> (COSMIC) database
                <sup>
                    <xref ref-type="bibr" rid="ref-25">25</xref>
                </sup>. Only overlapping variants are analysed by default, but non-overlaps can optionally be included as well. Examining specific chromosomes, genes or genomic regions is also possible, as are analyses of variant functionality through their predicted impact on protein-function.</p>
            <p>Installation of both seqCAT and its dependencies is simple, and its use is described in-depth in its vignette; a major design goal of seqCAT was ease-of-use for a broad range of researchers, regardless of expertise in R. While existing data structures and objects from Bioconductor are used internally, none of these are required learning for the user; results are given as standard R-objects
                <sup>
                    <xref ref-type="bibr" rid="ref-13">13</xref>,
                    <xref ref-type="bibr" rid="ref-26">26</xref>
                </sup>. This makes exploration of the data as simple and easy as possible for the user. SeqCAT allows for re-analysis of already created SNV profiles, facilitating comparisons of samples across any number of datasets and includes several functions for creating publication-ready figures. All these capabilities make seqCAT a useful, simple and intuitive tool for a wide range of researchers.</p>
            <sec>
                <title>Operation</title>
                <p>The seqCAT package is designed to work with Bioconductor version 3.9 and R version 3.6.</p>
            </sec>
        </sec>
        <sec sec-type="results">
            <title>Results</title>
            <sec>
                <title>Using seqCAT to investigate genetic heterogeneity in liver cancer-derived organoids</title>
                <p>To demonstrate the capabilities of seqCAT, we analysed a recently published dataset from Broutier 
                    <italic toggle="yes">et al.</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref-27">27</xref>
                    </sup>. The authors created liver cancer-derived organoids for modelling disease and performed both whole exome sequencing and RNA-seq on the original tissues and the organoid cultures. We used seqCAT to analyse the raw VCF files available at GEO under accession GSE84073 (see the 
                    <xref ref-type="other" rid="SC1">Supplementary Code</xref> for details and 
                    <xref ref-type="other" rid="SD2">Supplementary Data 1</xref> for the study metadata). The overall SNV-based genetic similarities between tissues and organoids are clearly grouped according to their respective patient of origin, as can be seen in 
                    <xref ref-type="fig" rid="f1">Figure 1A</xref>. We also investigated if this holds true for SNV profile subsets containing only coding and missense variants. The original VCF files were thus annotated using 
                    <italic toggle="yes">snpEff</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref-24">24</xref>
                    </sup>, followed by creation, reading and sub-setting of SNV profiles. 
                    <xref ref-type="fig" rid="f1">Figure 1B</xref> shows the pairwise comparisons of these variant subsets, indicating that groupings based on genetic similarities of missense variants also separate the dataset in a per-patient manner. Comparisons with COSMIC liver variants were also performed, although the relatively tiny number of variants (no more than 23 at most) make these comparisons less informative and statistically relevant. This data covers upwards of hundreds of thousands of overlapping variants for each non-COSMIC pairwise comparison (
                    <xref ref-type="table" rid="T1">Table 1</xref>).</p>
                <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                    <label>Figure 1. </label>
                    <caption>
                        <p>Pairwise comparisons of all WES SNV profiles, showing the genetic similarity of all individual samples for either no variant sub-setting (
                            <bold>A</bold>) or sub-setting for coding variants only (
                            <bold>B</bold>). The colour gradient is defined for ranges of the similarity score: scores between 0 and 50 are shown as white, scores between 50 and 90 as a white-to-grey gradient and, finally, a grey-to-blue gradient for 90 to 100. Samples are named according to their type: original tissues (T), established organoids (O1) and long-term cultured organoids (O2). These figures were created using the 
                            <monospace>plot_heatmap</monospace> seqCAT function.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/21974/52f9ca21-6351-42ef-a92e-15ed7ebbb317_figure1.gif"/>
                </fig>
                <table-wrap id="T1" orientation="portrait" position="anchor">
                    <label>Table 1. </label>
                    <caption>
                        <title>Summary statistics of whole exome sequencing SNV profile comparisons  (median values).</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="center" colspan="1" rowspan="1" valign="top">Patient</th>
                                <th align="center" colspan="1" rowspan="1" valign="top">overlaps
                                    <break/>(all)</th>
                                <th align="center" colspan="1" rowspan="1" valign="top">overlaps
                                    <break/>(COSMIC)</th>
                                <th align="center" colspan="1" rowspan="1" valign="top">overlaps
                                    <break/>(coding)</th>
                                <th align="center" colspan="1" rowspan="1" valign="top">similarity score
                                    <break/>(all)</th>
                                <th align="center" colspan="1" rowspan="1" valign="top">similarity score
                                    <break/>(COSMIC)</th>
                                <th align="center" colspan="1" rowspan="1" valign="top">similarity score
                                    <break/>(coding)</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="center" colspan="1" rowspan="1" valign="top">CC1</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">153815</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">21</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">111977</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">96.7</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">63.0</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">97.0</td>
                            </tr>
                            <tr>
                                <td align="center" colspan="1" rowspan="1" valign="top">CC2</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">137261</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">17</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">97344</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">97.4</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">68.2</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">98.0</td>
                            </tr>
                            <tr>
                                <td align="center" colspan="1" rowspan="1" valign="top">CC3</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">122577</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">18</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">87604</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">95.8</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">76.9</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">96.1</td>
                            </tr>
                            <tr>
                                <td align="center" colspan="1" rowspan="1" valign="top">CHC1</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">153589</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">17</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">112011</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">97.1</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">73.9</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">97.4</td>
                            </tr>
                            <tr>
                                <td align="center" colspan="1" rowspan="1" valign="top">CHC2</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">132805</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">16</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">95203</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">93.9</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">72.7</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">93.9</td>
                            </tr>
                            <tr>
                                <td align="center" colspan="1" rowspan="1" valign="top">HCC1</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">142389</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">18</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">104087</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">94.3</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">75.0</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">94.4</td>
                            </tr>
                            <tr>
                                <td align="center" colspan="1" rowspan="1" valign="top">HCC3</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">130186</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">23</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">92613</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">97.0</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">75.0</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">97.5</td>
                            </tr>
                            <tr>
                                <td align="center" colspan="1" rowspan="1" valign="top">Healthy1</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">155949</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">19</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">113592</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">97.9</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">80.0</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">98.3</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <p>We sought to investigate the genetic stability of the organoids both in terms of their transition from primary tissue to organoid culture, as well as long-term culturing. 
                    <xref ref-type="fig" rid="f2">Figure 2A</xref> shows a boxplot of genetic similarities for both of these comparisons, indicating that the long-term cultures seem to be more genetically similar than the transition from tissue to organoid at the SNV-level. This conclusion is not statistically significant, however, with p-values of 0.36 and 0.41 for all and subset variants, respectively (
                    <xref ref-type="other" rid="SC1">Supplementary Code</xref>). A larger cohort may thus be needed to fully explore the difference between tissue-to-organoid and long-term-culturing stability. The overall high genetic similarities of all the organoids are clear, however: the lowest median similarity score across all patients and all variants is 93.9 (patient CHC2), while reaching as high as 97.9 (healthy patient 1); see 
                    <xref ref-type="table" rid="T1">Table 1</xref>. The similarity scores across coding and non-subset profiles are roughly equivalent.</p>
                <fig fig-type="figure" id="f2" orientation="portrait" position="float">
                    <label>Figure 2. </label>
                    <caption>
                        <p>(
                            <bold>A</bold>) Comparisons of genetic similarities between original tissue, derived organoids and long-term cultured organoids. Results are shown for both non-subset variant comparisons and for subsets including coding variants only. The differences between T vs. O1 and O1 vs. O2 for each subset are not statistically significant (
                            <italic toggle="yes">&#x03b1;</italic> = 0.01). (
                            <bold>B</bold>) Analysis of previously known liver cancer SNVs as listed in the original publication, where the genotype of each individual variant is visualised by different colours. White squares indicate that no confident variant was called for that position in that particular sample. This figure was created using the 
                            <monospace>plot_variant_list</monospace> seqCAT function.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/21974/52f9ca21-6351-42ef-a92e-15ed7ebbb317_figure2.gif"/>
                </fig>
                <p>The original publication
                    <sup>
                        <xref ref-type="bibr" rid="ref-27">27</xref>
                    </sup> lists a number of previously known liver cancer variants (
                    <xref ref-type="other" rid="SD3">Supplementary Data 2</xref>), which we analysed with seqCAT. This analysis reveals that some of the known variants are present in the organoids but absent in their corresponding tissue (
                    <xref ref-type="fig" rid="f2">Figure 2B</xref>). SeqCAT indicates that these specific variants would need to be investigated further, which the original authors have done in most cases. However, it revealed that the GPRIN1 variant is present in the CC1 samples, even though it is not listed in the literature-based variant list of the original publication (nor the COSMIC database). This is likely due to how seqCAT uses pre-defined variant lists, \textit{i.e.} by looking for all known variants in all samples.</p>
                <p>Annotations with 
                    <italic toggle="yes">snpEff</italic> include variant 
                    <italic toggle="yes">impacts</italic>, which are the predicted effects on protein functions and range from HIGH, MODERATE, LOW through MODIFIER, in decreasing order of importance. An example of a HIGH impact is a variant leading to protein truncation, while a MODIFIER variants is predicted to a little to no effect on their resulting protein (such as intronic variants). SeqCAT can summarise and visualise these impacts across profile comparisons. 
                    <xref ref-type="fig" rid="f3">Figure 3</xref> shows the impact distributions of matching and mismatching variants for an aggregation of all comparisons between samples in the tissue-to-organoid transition as well as through the long-term culturing process.</p>
                <fig fig-type="figure" id="f3" orientation="portrait" position="float">
                    <label>Figure 3. </label>
                    <caption>
                        <p>Distribution of variant impacts for the an aggregate of all pairwise comparisons between tissue and early organoid cultures (
                            <bold>A</bold>), and early versus late organoid cultures (
                            <bold>B</bold>). Matching variants (i.e. variants with identical genotypes for both samples being compared) are dark blue, while mismatching variants are a lighter shade of blue. These figures were created using the 
                            <monospace>plot_impacts</monospace> seqCAT function.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/21974/52f9ca21-6351-42ef-a92e-15ed7ebbb317_figure3.gif"/>
                </fig>
                <p>In order to investigate if any of these mismatching variants are biologically relevant, we performed GO (Gene Ontology) and KEGG (Kyoto Encyclopedia of Genes and Genomes) enrichment using DAVID
                    <sup>
                        <xref ref-type="bibr" rid="ref-28">28</xref>
                    </sup> on genes affected by mismatching variants in the HIGH and MODERATE impact categories. While no terms were significantly enriched for the tissue-to-organoid transition (
                    <italic toggle="yes">&#x03b1;</italic> = 0.01), three olfactory-related terms and one related to protein de-ubiquitination were significantly enriched for long-term culturing comparisons (see 
                    <xref ref-type="other" rid="SD4">Supplementary Data 3</xref> and 
                    <xref ref-type="other" rid="SD5">Supplementary Data 4</xref>).</p>
                <p>In summary, these results corroborate the original authors&#x2019; conclusion that the organoids are genetically stable 
                    <italic toggle="yes">in vitro</italic> models of liver cancer and demonstrate how seqCAT can be used to analyse genetic variation in HTS data.</p>
            </sec>
            <sec>
                <title>Using seqCAT to examine differences between DNA and RNA variants</title>
                <p>The Broutier dataset contains not only WES data but also RNA-seq data on the same samples, enabling comparison of RNA-seq data to the already performed WES analyses. We thus downloaded the publicly available raw FASTQ files, performed read alignment with the 2-pass mode of STAR
                    <sup>
                        <xref ref-type="bibr" rid="ref-29">29</xref>
                    </sup>, variant calling using GATK
                    <sup>
                        <xref ref-type="bibr" rid="ref-30">30</xref>
                    </sup> and annotation using snpEff
                    <sup>
                        <xref ref-type="bibr" rid="ref-24">24</xref>
                    </sup>, as previously described
                    <sup>
                        <xref ref-type="bibr" rid="ref-21">21</xref>
                    </sup>. We subsequently used seqCAT to create SNV profiles for each RNA-seq sample and performed pairwise comparisons across all WES and RNA-seq SNV profiles. This resulted in a grouping with high similarities between WES and RNA-seq samples for the same patient (
                    <xref ref-type="fig" rid="f4">Figure 4</xref>).</p>
                <fig fig-type="figure" id="f4" orientation="portrait" position="float">
                    <label>Figure 4. </label>
                    <caption>
                        <title>Pairwise comparisons of all WES and RNA-seq SNV profiles, demonstrating the high similarity between DNA/RNA-based variant callings.</title>
                        <p>The colour gradient is the same one used for 
                            <xref ref-type="fig" rid="f1">Figure 1</xref>: scores between 0 and 50 are white, scores between 50 and 90 are shown with a white-to-grey gradient, and a grey-to-blue gradient for scores between 90 and 100. This figure was created using the 
                            <monospace>plot_heatmap</monospace> seqCAT function.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/21974/52f9ca21-6351-42ef-a92e-15ed7ebbb317_figure4.gif"/>
                </fig>
                <p>There are several previously published studies that show discrepancy between DNA and RNA variants with varying extent and proposed causes
                    <sup>
                        <xref ref-type="bibr" rid="ref-31">31</xref>,
                        <xref ref-type="bibr" rid="ref-32">32</xref>
                    </sup>. In order to quantify the differences between DNA- and RNA-based variants in the organoid dataset, the median concordance for all same-sample comparisons was calculated to be 97.5%; the concordance was used in lieu of the similarity score in order to increase comparability with previously published results. This was also performed for sample type-specific comparisons, where the concordance for tissue versus tissue comparisons was 96.5% and 97.7% for organoid versus organoid. Per-patient (
                    <italic toggle="yes">e.g.</italic> CC1 vs. CC1) calculations were also performed, shown in 
                    <xref ref-type="table" rid="T2">Table 2</xref>. The minimum per-patient concordance was 94.8% and the maximum 98.9%, while the minimum for any individual comparison was 81.1% and a maximum of 99.0% (see the 
                    <xref ref-type="other" rid="SC1">Supplementary Code</xref> for the calculations). The minimum value of 81.1% (tissue versus tissue for patient CC1) is the only DNA/RNA comparison with a concordance lower than 90%. These concordances are generally higher than the 80 to 90% that have previously been shown
                    <sup>
                        <xref ref-type="bibr" rid="ref-32">32</xref>
                    </sup>, but still highlight a difference between DNA and RNA variants. While different explanations for this discrepancy has previously been suggested (such as RNA editing), a deeper investigation of these is outside the scope of this paper.</p>
                <table-wrap id="T2" orientation="portrait" position="anchor">
                    <label>Table 2. </label>
                    <caption>
                        <title>Median concordance for WES versus RNA-seq SNV profile comparisons across all patients.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="center" colspan="1" rowspan="1">Patient</th>
                                <th align="center" colspan="1" rowspan="1">Median concordance</th>
                                <th align="center" colspan="1" rowspan="1">Median overlaps</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="center" colspan="1" rowspan="1">CC1</td>
                                <td align="center" colspan="1" rowspan="1">96.9%</td>
                                <td align="center" colspan="1" rowspan="1">3744</td>
                            </tr>
                            <tr>
                                <td align="center" colspan="1" rowspan="1">CC2</td>
                                <td align="center" colspan="1" rowspan="1">98.9%</td>
                                <td align="center" colspan="1" rowspan="1">609</td>
                            </tr>
                            <tr>
                                <td align="center" colspan="1" rowspan="1">CC3</td>
                                <td align="center" colspan="1" rowspan="1">98.2%</td>
                                <td align="center" colspan="1" rowspan="1">718</td>
                            </tr>
                            <tr>
                                <td align="center" colspan="1" rowspan="1">CHC1</td>
                                <td align="center" colspan="1" rowspan="1">97.9%</td>
                                <td align="center" colspan="1" rowspan="1">3164</td>
                            </tr>
                            <tr>
                                <td align="center" colspan="1" rowspan="1">CHC2</td>
                                <td align="center" colspan="1" rowspan="1">95.1%</td>
                                <td align="center" colspan="1" rowspan="1">920</td>
                            </tr>
                            <tr>
                                <td align="center" colspan="1" rowspan="1">HCC1</td>
                                <td align="center" colspan="1" rowspan="1">96.5%</td>
                                <td align="center" colspan="1" rowspan="1">1872</td>
                            </tr>
                            <tr>
                                <td align="center" colspan="1" rowspan="1">HCC3</td>
                                <td align="center" colspan="1" rowspan="1">97.3%</td>
                                <td align="center" colspan="1" rowspan="1">745</td>
                            </tr>
                            <tr>
                                <td align="center" colspan="1" rowspan="1">Healthy1</td>
                                <td align="center" colspan="1" rowspan="1">94.8%</td>
                                <td align="center" colspan="1" rowspan="1">1606</td>
                            </tr>
                            <tr>
                                <td align="center" colspan="1" rowspan="1">Healthy2</td>
                                <td align="center" colspan="1" rowspan="1">98.7%</td>
                                <td align="center" colspan="1" rowspan="1">1367</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <p>In summary, results from seqCAT demonstrate an overall high level of concordance between DNA and RNA variant calls, but highlight that there is some variation between sample types and patients.</p>
            </sec>
        </sec>
        <sec sec-type="discussion">
            <title>Discussion</title>
            <p>HTS experiments are becoming increasingly more common and the need for simple and powerful bioinformatic software is as great as ever. Analyses of genetic variation through 
                <italic toggle="yes">e.g.</italic> SNVs represents a common endeavour for many scientific studies, but the methods and data analysis pipelines used vary. In this study we present seqCAT, an easy-to-use and well-documented Bioconductor
                <sup>
                    <xref ref-type="bibr" rid="ref-23">23</xref>
                </sup> R-package that performs variant analyses of HTS data. The capabilities of seqCAT include the creation of SNV profiles, comparisons of global genetic similarities for all variants common between samples and analyses of single variants or genes of special interest. While the seqCAT package itself is new, the underlying theory and general methodology have previously been used for investigations into cell line authenticity
                <sup>
                    <xref ref-type="bibr" rid="ref-21">21</xref>
                </sup> and genetic heterogeneity in public cell line datasets
                <sup>
                    <xref ref-type="bibr" rid="ref-22">22</xref>
                </sup>.</p>
            <p>SeqCAT may be used to analyse both novel sequencing data as well as publicly available data in repositories (such as the GEO)
                <sup>
                    <xref ref-type="bibr" rid="ref-1">1</xref>
                </sup>, but may also be utilised to define genetic profiles for any sample of interest. Such profiles are of great interest for researchers using model systems (such as cell lines or organoids), as it allows for a clear definition of the genetic background of the model itself. This could then be referred back to at a later time, to make sure that genetic drift (that obscure interpretation of biological results) has not occurred. SeqCAT is both easy to install and to use, and includes in-depth documentation on its functionality and underlying theory.</p>
            <p>In the present study, we have used seqCAT to analyse a publicly available dataset containing WES and RNA-seq data from organoid cultures and their tissues-of-origin
                <sup>
                    <xref ref-type="bibr" rid="ref-27">27</xref>
                </sup>. The global analysis of WES SNVs demonstrate the overall high genetic similarities between the organoids and their respective tissues, with equivalent results for comparisons covering all variants or only missense variants. The seqCAT-analysis of known variants indicate that a GPRIN1 variant is present for the CC1 patient; this variant is only listed as present in a CHC-type patient in the original study. The SNV-based results presented herein corroborate the original authors&#x2019; conclusions that organoids are genetically stable over time, but the higher level of genetic similarity between early and long-term cultured organoids as compared to the tissue-to-organoid transition is statistically non-significant.</p>
            <p>The analyses of genes affected by mismatching HIGH and MODERATE impact variants show that none of the differences between tissue and initial organoid cultures are significantly enriched for specific biological functions, indicating that these differences likely are random. The transition from primary tissue to organoid can thus be viewed as a stable transition, especially given the high overall similarity previously discussed. The long-term culturing results do, however, present four significantly enriched terms. Three of these are related to ectopic expression of olfactory receptors, which have previously been shown to be present in both healthy and cancerous tissues
                <sup>
                    <xref ref-type="bibr" rid="ref-33">33</xref>,
                    <xref ref-type="bibr" rid="ref-34">34</xref>
                </sup>. The single GO-term related to protein de-ubiquitination may be important for studies investigating ubiquitination in liver cancer. Both of these points should thus be accounted for when performing a study with these organoids. The overall results yielded by the seqCAT-analyses corroborate the conclusions from the original study, 
                <italic toggle="yes">i.e.</italic> that these organoids are genetically stable and may be suitable models for studying liver cancer.</p>
            <p>There have been several studies comparing variant calls from DNA and RNA of the same samples, but they have come to differing conclusions as to both the extent and causes of the DNA/RNA discrepancies. Li 
                <italic toggle="yes">et al.</italic> performed both DNA/RNA-seq across 27 individuals in addition to analyses of protein expression using mass spectrometry, where peptides corresponding to variants found in both DNA and RNA were present
                <sup>
                    <xref ref-type="bibr" rid="ref-31">31</xref>
                </sup>. They argue that their results indicate biological significance of RNA variants, given that they are translated to proteins, and that the differences between DNA and RNA variants can be biologically meaningful. Indeed, there have been several studies analysing RNA-seq variants that yielded novel biological insights, demonstrating the utility of such endeavours
                <sup>
                    <xref ref-type="bibr" rid="ref-35">35</xref>&#x2013;
                    <xref ref-type="bibr" rid="ref-39">39</xref>
                </sup>. A study by Guo 
                <italic toggle="yes">et al.</italic> analysed DNA/RNA-seq data for 10 breast cancer patients from the TCGA and calculated DNA/RNA concordances to range between from 80 to 90%
                <sup>
                    <xref ref-type="bibr" rid="ref-32">32</xref>
                </sup>. They argue that these differences are mostly technical rather than biological.</p>
            <p>The results of the present study indicate that the extent of DNA/RNA differences may not be as large as previously shown: the median concordance for DNA/RNA pairs was 97.5% overall, with a range of 90 to 99% (plus a single comparison with 81.1%), while Guo 
                <italic toggle="yes">et al.</italic> reported a range of 80 to 90% concordance. Both studies thus find a discrepancy between DNA- and RNA-based variant calls, but disagree on its extent. The RNA-seq pipeline utilised in this study is based on the current best practices of GATK, which uses the STAR software for read alignment that has proven to be highly accurate for RNA-seq data
                <sup>
                    <xref ref-type="bibr" rid="ref-29">29</xref>,
                    <xref ref-type="bibr" rid="ref-40">40</xref>
                </sup>. The latest assembly of the human genome (GRCh38) was also used, as the choice of assembly has been highlighted as an important parameter that can yield higher accuracy
                <sup>
                    <xref ref-type="bibr" rid="ref-32">32</xref>
                </sup>. Guo 
                <italic toggle="yes">et al.</italic> used an earlier assembly from 2009 (GRch37), which might partly explain the discrepancy between the results. The choice of sequencing platform and differences in mutational profiles of breast and liver cancer could also be affect the comparisons. While technical issues will always exist even for DNA/DNA or RNA/RNA comparisons, the results of the present study may represent a closer estimate of the biological relevance of DNA/RNA differences first noted by Li 
                <italic toggle="yes">et al.</italic>
            </p>
            <p>It is clear is that there is a discrepancy between DNA- and RNA-based variant calls, but the exact extent of this difference remains to be determined, as well as whether it is a consequence of technical artefacts or biological variation. A full evaluation of these matters likely require a larger study than what has previously been attempted, including using the latest technologies as well as protein-level validation. The analyses performed herein demonstrate how seqCAT may be utilised as a part of such an endeavour.</p>
        </sec>
        <sec sec-type="conclusions">
            <title>Conclusions</title>
            <p>The seqCAT Bioconductor R-package provides an effective and easy-to-use toolkit for analysing HTS variant data, enabling researchers to investigate genetic differences and potential variation within and between their samples or publicly available data from other laboratories. Little R expertise is required to use seqCAT, and its use is extensively documented. We have used seqCAT to analyse genetic variation in a publicly available dataset of liver cancer organoids, corroborating the conclusions drawn by its original authors, as well as demonstrate high levels of DNA/RNA SNV concordance in this dataset. These results serve as a case study in how to utilise the capabilities of seqCAT, which make it a valuable and intuitive tool for a wide range of researchers.</p>
        </sec>
        <sec>
            <title>Software and data availability</title>
            <p>Software is available from: 
                <ext-link ext-link-type="uri" xlink:href="https://bioconductor.org/packages/release/bioc/html/seqCAT.html">https://bioconductor.org/packages/release/bioc/html/seqCAT.html</ext-link>
            </p>
            <p>Source code available from: 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/fasterius/seqCAT">https://github.com/fasterius/seqCAT</ext-link>
            </p>
            <p>Archived source code as at time of publication: 
                <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.2669143">https://doi.org/10.5281/zenodo.2669143</ext-link>
                <sup>
                    <xref ref-type="bibr" rid="ref-41">41</xref>
                </sup>
            </p>
            <p>Software license: MIT</p>
            <p>The data used in this article is publicly available at the GEO through the accession number 
                <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE84073">GSE84073</ext-link>.</p>
        </sec>
    </body>
    <back>
        <ack>
            <title>Acknowledgements</title>
            <p>We would like to acknowledge support from Science for Life Laboratory (SciLifeLab), the National Genomics Infrastructure (NGI) and Uppmax for providing assistance in computational infrastructure.</p>
        </ack>
        <sec id="SM1" sec-type="supplementary-material">
            <title>Supplementary material</title>
            <p id="SC1">Supplementary Code: A RMarkdown document for reproducing the analyses and figures of the study using the seqCAT package.</p>
            <p>
                <ext-link ext-link-type="uri" xlink:href="https://f1000researchdata.s3.amazonaws.com/supplementary/16083/77e97048-fac5-41f0-a345-9318b6a91f19_supplementary_code.pdf">Click here to access the data</ext-link>.</p>
            <p id="SD2">Supplementary Data 1: Metadata for the Broutier 
                <italic toggle="yes">et al.</italic> study
                <sup>
                    <xref ref-type="bibr" rid="ref-27">27</xref>
                </sup>.</p>
            <p>
                <ext-link ext-link-type="uri" xlink:href="https://f1000researchdata.s3.amazonaws.com/supplementary/16083/5ab988f9-9811-41db-82a8-cc7bc1099e65_sdata.1.metadata.txt">Click here to access the data</ext-link>.</p>
            <p id="SD3">Supplementary Data 2: List of the previously known SNVs used in the Broutier 
                <italic toggle="yes">et al.</italic> study
                <sup>
                    <xref ref-type="bibr" rid="ref-27">27</xref>
                </sup>.</p>
            <p>
                <ext-link ext-link-type="uri" xlink:href="https://f1000researchdata.s3.amazonaws.com/supplementary/16083/6f7ac5a7-4740-48a2-83c8-7f956bd7d75d_sdata.2.known_variants.txt">Click here to access the data</ext-link>.</p>
            <p id="SD4">Supplementary Data 3: Full results of the enrichment analysis of tissue versus established organoids.</p>
            <p>
                <ext-link ext-link-type="uri" xlink:href="https://f1000researchdata.s3.amazonaws.com/supplementary/16083/c74c4b7e-7837-4d65-abd2-ba66518e3eaf_sdata.3.enrichment.T_O1.txt">Click here to access the data</ext-link>.</p>
            <p id="SD5">Supplementary Data 4: Full results of the enrichment analysis of established organoids versus long-term cultured organoids.</p>
            <p>
                <ext-link ext-link-type="uri" xlink:href="https://f1000researchdata.s3.amazonaws.com/supplementary/16083/8221bf27-9f13-45ed-99a6-0c3069111149_sdata.4.enrichment.O1_O2.txt">Click here to access the data</ext-link>.</p>
        </sec>
        <ref-list>
            <ref id="ref-1">
                <label>1</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Edgar</surname>
                            <given-names>R</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Domrachev</surname>
                            <given-names>RM</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Lash</surname>
                            <given-names>AE</given-names>
                        </name>
					</person-group>:
                    <article-title>Gene Expression Omnibus: NCBI gene expression and hybridization array data repository.</article-title>
                    <source>
						
                        <italic toggle="yes">Nucleic Acids Res.</italic>
					</source>
                    <year>2002</year>;<volume>30</volume>(<issue>1</issue>):<fpage>207</fpage>&#x2013;<lpage>210</lpage>.
                    <pub-id pub-id-type="pmid">11752295</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/30.1.207</pub-id>
                    <pub-id pub-id-type="pmcid">99122</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-2">
                <label>2</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Zhu</surname>
                            <given-names>Y</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Stephens</surname>
                            <given-names>RM</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Meltzer</surname>
                            <given-names>PS</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>SRAdb: query and use public next-generation sequencing data from within R.</article-title>
                    <source>
						
                        <italic toggle="yes">BMC Bioinformatics.</italic>
					</source>
                    <year>2013</year>;<volume>14</volume>:<fpage>19</fpage>.
                    <pub-id pub-id-type="pmid">23323543</pub-id>
                    <pub-id pub-id-type="doi">10.1186/1471-2105-14-19</pub-id>
                    <pub-id pub-id-type="pmcid">3560148</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-3">
                <label>3</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Heather</surname>
                            <given-names>JM</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Chain</surname>
                            <given-names>B</given-names>
                        </name>
					</person-group>:
                    <article-title>The sequence of sequencers: The history of sequencing DNA.</article-title>
                    <source>
						
                        <italic toggle="yes">Genomics.</italic>
					</source>
                    <year>2016</year>;<volume>107</volume>(<issue>1</issue>):<fpage>1</fpage>&#x2013;<lpage>8</lpage>.
                    <pub-id pub-id-type="pmid">26554401</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.ygeno.2015.11.003</pub-id>
                    <pub-id pub-id-type="pmcid">4727787</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-4">
                <label>4</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Sboner</surname>
                            <given-names>A</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Mu</surname>
                            <given-names>XJ</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Greenbaum</surname>
                            <given-names>D</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>The real cost of sequencing: higher than you think!</article-title>
                    <source>
						
                        <italic toggle="yes">Genome Biol.</italic>
					</source>
                    <year>2011</year>;<volume>12</volume>(<issue>8</issue>):<fpage>125</fpage>.
                    <pub-id pub-id-type="pmid">21867570</pub-id>
                    <pub-id pub-id-type="doi">10.1186/gb-2011-12-8-125</pub-id>
                    <pub-id pub-id-type="pmcid">3245608</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-5">
                <label>5</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Muir</surname>
                            <given-names>P</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Li</surname>
                            <given-names>S</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Lou</surname>
                            <given-names>S</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>The real cost of sequencing: scaling computation to keep pace with data generation.</article-title>
                    <source>
						
                        <italic toggle="yes">Genome Biol.</italic>
					</source>
                    <year>2016</year>;<volume>17</volume>:<fpage>53</fpage>.
                    <pub-id pub-id-type="pmid">27009100</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13059-016-0917-0</pub-id>
                    <pub-id pub-id-type="pmcid">4806511</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-6">
                <label>6</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Eren</surname>
                            <given-names>AM</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname> Esen</surname>
                            <given-names>&#x00d6;C</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname> Quince</surname>
                            <given-names>C</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Anvi'o: an advanced analysis and visualization platform for 'omics data.</article-title>
                    <source>
						
                        <italic toggle="yes">PeerJ.</italic>
					</source>
                    <year>2015</year>;<volume>3</volume>:<fpage>e1319</fpage>.
                    <pub-id pub-id-type="pmid">26500826</pub-id>
                    <pub-id pub-id-type="doi">10.7717/peerj.1319</pub-id>
                    <pub-id pub-id-type="pmcid">4614810</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-7">
                <label>7</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Faison</surname>
                            <given-names>WJ</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Rostovtsev</surname>
                            <given-names>A</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Castro-Nallar</surname>
                            <given-names>E</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Whole genome single-nucleotide variation profile-based phylogenetic tree building methods for analysis of viral, bacterial and human genomes.</article-title>
                    <source>
						
                        <italic toggle="yes">Genomics.</italic>
					</source>
                    <year>2014</year>;<volume>104</volume>(<issue>1</issue>):<fpage>1</fpage>&#x2013;<lpage>7</lpage>.
                    <pub-id pub-id-type="pmid">24930720</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.ygeno.2014.06.001</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-8">
                <label>8</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Manichaikul</surname>
                            <given-names>A</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Mychaleckyj</surname>
                            <given-names>JC</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Rich</surname>
                            <given-names>SS</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Robust relationship inference in genome-wide association studies.</article-title>
                    <source>
						
                        <italic toggle="yes">Bioinformatics.</italic>
					</source>
                    <year>2010</year>;<volume>26</volume>(<issue>22</issue>):<fpage>2867</fpage>&#x2013;<lpage>2873</lpage>.
                    <pub-id pub-id-type="pmid">20926424</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btq559</pub-id>
                    <pub-id pub-id-type="pmcid">3025716</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-9">
                <label>9</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Larson</surname>
                            <given-names>DE</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Harris</surname>
                            <given-names>CC</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Chen</surname>
                            <given-names>K</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>SomaticSniper: identification of somatic point mutations in whole genome sequencing data.</article-title>
                    <source>
						
                        <italic toggle="yes">Bioinformatics.</italic>
					</source>
                    <year>2012</year>;<volume>28</volume>(<issue>3</issue>):<fpage>311</fpage>&#x2013;<lpage>317</lpage>.
                    <pub-id pub-id-type="pmid">22155872</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btr665</pub-id>
                    <pub-id pub-id-type="pmcid">3268238</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-10">
                <label>10</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Purcell</surname>
                            <given-names>S</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Neale</surname>
                            <given-names>B</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Todd-Brown</surname>
                            <given-names>K</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>PLINK: a tool set for whole-genome association and population-based linkage analyses.</article-title>
                    <source>
						
                        <italic toggle="yes">Am J Hum Genet.</italic>
					</source>
                    <year>2007</year>;<volume>81</volume>(<issue>3</issue>):<fpage>559</fpage>&#x2013;<lpage>75</lpage>.
                    <pub-id pub-id-type="pmid">17701901</pub-id>
                    <pub-id pub-id-type="doi">10.1086/519795</pub-id>
                    <pub-id pub-id-type="pmcid">1950838</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-11">
                <label>11</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Knaus</surname>
                            <given-names>BJ</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Grunwald</surname>
                            <given-names>NJ</given-names>
                        </name>
					</person-group>:
                    <article-title>VcfR: a package to manipulate and visualize VCF format data in R.</article-title>
                    <source>
						
                        <italic toggle="yes">bioRxiv.</italic>
					</source>
                    <year>2016</year>;<fpage>041277</fpage>.
                    <pub-id pub-id-type="doi">10.1101/041277</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-12">
                <label>12</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Danecek</surname>
                            <given-names>P</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Auton</surname>
                            <given-names>A</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Abecasis</surname>
                            <given-names>G</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>The variant call format and VCFtools.</article-title>
                    <source>
						
                        <italic toggle="yes">Bioinformatics.</italic>
					</source>
                    <year>2011</year>;<volume>27</volume>(<issue>15</issue>):<fpage>2156</fpage>&#x2013;<lpage>2158</lpage>.
                    <pub-id pub-id-type="pmid">21653522</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btr330</pub-id>
                    <pub-id pub-id-type="pmcid">3137218</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-13">
                <label>13</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Obenchain</surname>
                            <given-names>V</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Lawrence</surname>
                            <given-names>M</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Carey</surname>
                            <given-names>V</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>VariantAnnotation: a Bioconductor package for exploration and annotation of genetic variants.</article-title>
                    <source>
						
                        <italic toggle="yes">Bioinformatics.</italic>
					</source>
                    <year>2014</year>;<volume>30</volume>(<issue>14</issue>):<fpage>2076</fpage>&#x2013;<lpage>2078</lpage>.
                    <pub-id pub-id-type="pmid">24681907</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btu168</pub-id>
                    <pub-id pub-id-type="pmcid">4080743</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-14">
                <label>14</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Quinlan</surname>
                            <given-names>AR</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Hall</surname>
                            <given-names>IM</given-names>
                        </name>
					</person-group>:
                    <article-title>BEDTools: a flexible suite of utilities for comparing genomic features.</article-title>
                    <source>
						
                        <italic toggle="yes">Bioinformatics.</italic>
					</source>
                    <year>2010</year>;<volume>26</volume>(<issue>6</issue>):<fpage>841</fpage>&#x2013;<lpage>842</lpage>.
                    <pub-id pub-id-type="pmid">20110278</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btq033</pub-id>
                    <pub-id pub-id-type="pmcid">2832824</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-15">
                <label>15</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Neph</surname>
                            <given-names>S</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Kuehn</surname>
                            <given-names>MS</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Reynolds</surname>
                            <given-names>AP</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>BEDOPS: high-performance genomic feature operations.</article-title>
                    <source>
						
                        <italic toggle="yes">Bioinformatics.</italic>
					</source>
                    <year>2012</year>;<volume>28</volume>(<issue>14</issue>):<fpage>1919</fpage>&#x2013;<lpage>1920</lpage>.
                    <pub-id pub-id-type="pmid">22576172</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/bts277</pub-id>
                    <pub-id pub-id-type="pmcid">3389768</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-16">
                <label>16</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Robinson</surname>
                            <given-names>JT</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Thorvaldsd&#x00f3;ttir</surname>
                            <given-names>H</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Winckler</surname>
                            <given-names>W</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Integrative genomics viewer.</article-title>
                    <source>
						
                        <italic toggle="yes">Nat Biotechnol.</italic>
					</source>
                    <year>2011</year>;<volume>29</volume>(<issue>1</issue>):<fpage>24</fpage>&#x2013;<lpage>26</lpage>.
                    <pub-id pub-id-type="pmid">21221095</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nbt.1754</pub-id>
                    <pub-id pub-id-type="pmcid">3346182</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-17">
                <label>17</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Zerbino</surname>
                            <given-names>DR</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Achuthan</surname>
                            <given-names>P</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Akanni</surname>
                            <given-names>W</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Ensembl 2018.</article-title>
                    <source>
						
                        <italic toggle="yes">Nucleic Acids Res.</italic>
					</source>
                    <year>2018</year>;<volume>46</volume>(<issue>D1</issue>):<fpage>D754</fpage>&#x2013;<lpage>D761</lpage>.
                    <pub-id pub-id-type="pmid">29155950</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkx1098</pub-id>
                    <pub-id pub-id-type="pmcid">5753206</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-18">
                <label>18</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Pabinger</surname>
                            <given-names>S</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Dander</surname>
                            <given-names>A</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Fischer</surname>
                            <given-names>M</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>A survey of tools for variant analysis of next-generation genome sequencing data.</article-title>
                    <source>
						
                        <italic toggle="yes">Brief Bioinform.</italic>
					</source>
                    <year>2014</year>;<volume>15</volume>(<issue>2</issue>):<fpage>256</fpage>&#x2013;<lpage>278</lpage>.
                    <pub-id pub-id-type="pmid">23341494</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bib/bbs086</pub-id>
                    <pub-id pub-id-type="pmcid">3956068</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-19">
                <label>19</label>
                <mixed-citation publication-type="journal">
                    <collab>QIAGEN</collab>:
                    <article-title>Ingenuity Variant Analysis</article-title>.<year>2018</year>; accessed 2018-05-30.
                    <ext-link ext-link-type="uri" xlink:href="https://www.qiagenbioinformatics.com/products/ingenuity-variant-analysis">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-20">
                <label>20</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Capes-Davis</surname>
                            <given-names>A</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Neve</surname>
                            <given-names>RM</given-names>
                        </name>
					</person-group>:
                    <article-title>Authentication: A Standard Problem or a Problem of Standards?</article-title>
                    <source>
						
                        <italic toggle="yes">PLoS Biol.</italic>
					</source>
                    <year>2016</year>;<volume>14</volume>(<issue>6</issue>):<fpage>e1002477</fpage>.
                    <pub-id pub-id-type="pmid">27300550</pub-id>
                    <pub-id pub-id-type="doi">10.1371/journal.pbio.1002477</pub-id>
                    <pub-id pub-id-type="pmcid">4907433</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-21">
                <label>21</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Fasterius</surname>
                            <given-names>E</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Raso</surname>
                            <given-names>C</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Kennedy</surname>
                            <given-names>S</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>A novel RNA sequencing data analysis method for cell line authentication.</article-title>
                    <source>
						
                        <italic toggle="yes">PLoS One.</italic>
					</source>
                    <year>2017</year>;<volume>12</volume>(<issue>2</issue>):<fpage>e0171435</fpage>.
                    <pub-id pub-id-type="pmid">28192450</pub-id>
                    <pub-id pub-id-type="doi">10.1371/journal.pone.0171435</pub-id>
                    <pub-id pub-id-type="pmcid">5305277</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-22">
                <label>22</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Fasterius</surname>
                            <given-names>E</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Al-Khalili Szigyarto</surname>
                            <given-names>C</given-names>
                        </name>
					</person-group>:
                    <article-title>Analysis of public RNA-sequencing data reveals biological consequences of genetic heterogeneity in cell line populations.</article-title>
                    <source>
						
                        <italic toggle="yes">Sci Rep.</italic>
					</source>
                    <year>2018</year>;<volume>8</volume>(<issue>1</issue>): 11226.
                    <pub-id pub-id-type="pmid">30046134</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41598-018-29506-3</pub-id>
                    <pub-id pub-id-type="pmcid">6060100</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-23">
                <label>23</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Huber</surname>
                            <given-names>W</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Carey</surname>
                            <given-names>VJ</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Gentleman</surname>
                            <given-names>R</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Orchestrating high-throughput genomic analysis with Bioconductor.</article-title>
                    <source>
						
                        <italic toggle="yes">Nat Methods.</italic>
					</source>
                    <year>2015</year>;<volume>12</volume>(<issue>2</issue>):<fpage>115</fpage>&#x2013;<lpage>121</lpage>.
                    <pub-id pub-id-type="pmid">25633503</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nmeth.3252</pub-id>
                    <pub-id pub-id-type="pmcid">4509590</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-24">
                <label>24</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Cingolani</surname>
                            <given-names>P</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Platts</surname>
                            <given-names>A</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Wang le</surname>
                            <given-names>L</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of 
                        <italic toggle="yes">Drosophila melanogaster</italic> strain w
                        <sup>1118</sup>; iso-2; iso-3.</article-title>
                    <source>
						
                        <italic toggle="yes">Fly (Austin).</italic>
					</source>
                    <year>2012</year>;<volume>6</volume>(<issue>2</issue>):<fpage>80</fpage>&#x2013;<lpage>92</lpage>.
                    <pub-id pub-id-type="pmid">22728672</pub-id>
                    <pub-id pub-id-type="doi">10.4161/fly.19695</pub-id>
                    <pub-id pub-id-type="pmcid">3679285</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-25">
                <label>25</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Forbes</surname>
                            <given-names>SA</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Beare</surname>
                            <given-names>D</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Gunasekaran</surname>
                            <given-names>P</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>COSMIC: exploring the world&#x2019;s knowledge of somatic mutations in human cancer.</article-title>
                    <source>
						
                        <italic toggle="yes">Nucleic Acids Res.</italic>
					</source>
                    <year>2015</year>;<volume>43</volume>(<issue>Database issue</issue>):<fpage>D805</fpage>&#x2013;<lpage>11</lpage>.
                    <pub-id pub-id-type="pmid">25355519</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gku1075</pub-id>
                    <pub-id pub-id-type="pmcid">4383913</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-26">
                <label>26</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Lawrence</surname>
                            <given-names>M</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Huber</surname>
                            <given-names>W</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Pag&#x00e8;s</surname>
                            <given-names>H</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Software for computing and annotating genomic ranges.</article-title>
                    <source>
						
                        <italic toggle="yes">PLoS Comput Biol.</italic>
					</source>
                    <year>2013</year>;<volume>9</volume>(<issue>8</issue>):<fpage>e1003118</fpage>.
                    <pub-id pub-id-type="pmid">23950696</pub-id>
                    <pub-id pub-id-type="doi">10.1371/journal.pcbi.1003118</pub-id>
                    <pub-id pub-id-type="pmcid">3738458</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-27">
                <label>27</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Broutier</surname>
                            <given-names>L</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Mastrogiovanni</surname>
                            <given-names>G</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Verstegen</surname>
                            <given-names>MM</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Human primary liver cancer-derived organoid cultures for disease modeling and drug screening.</article-title>
                    <source>
						
                        <italic toggle="yes">Nat Med.</italic>
					</source>
                    <year>2017</year>;<volume>23</volume>(<issue>12</issue>):<fpage>1424</fpage>&#x2013;<lpage>1435</lpage>.
                    <pub-id pub-id-type="pmid">29131160</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nm.4438</pub-id>
                    <pub-id pub-id-type="pmcid">5722201</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-28">
                <label>28</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Huang da</surname>
                            <given-names>W</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Sherman</surname>
                            <given-names>BT</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Lempicki</surname>
                            <given-names>RA</given-names>
                        </name>
					</person-group>:
                    <article-title>Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources.</article-title>
                    <source>
						
                        <italic toggle="yes">Nat Protoc.</italic>
					</source>
                    <year>2008</year>;<volume>4</volume>(<issue>1</issue>):<fpage>44</fpage>&#x2013;<lpage>57</lpage>.
                    <pub-id pub-id-type="pmid">19131956</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nprot.2008.211</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-29">
                <label>29</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Dobin</surname>
                            <given-names>A</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Davis</surname>
                            <given-names>CA</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Schlesinger</surname>
                            <given-names>F</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>STAR: ultrafast universal RNA-seq aligner.</article-title>
                    <source>
						
                        <italic toggle="yes">Bioinformatics.</italic>
					</source>
                    <year>2013</year>;<volume>29</volume>(<issue>1</issue>):<fpage>15</fpage>&#x2013;<lpage>21</lpage>.
                    <pub-id pub-id-type="pmid">23104886</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/bts635</pub-id>
                    <pub-id pub-id-type="pmcid">3530905</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-30">
                <label>30</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>McKenna</surname>
                            <given-names>A</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Hanna</surname>
                            <given-names>M</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Banks</surname>
                            <given-names>E</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.</article-title>
                    <source>
						
                        <italic toggle="yes">Genome Res.</italic>
					</source>
                    <year>2010</year>;<volume>20</volume>(<issue>9</issue>):<fpage>1297</fpage>&#x2013;<lpage>1303</lpage>.
                    <pub-id pub-id-type="pmid">20644199</pub-id>
                    <pub-id pub-id-type="doi">10.1101/gr.107524.110</pub-id>
                    <pub-id pub-id-type="pmcid">2928508</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-31">
                <label>31</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Li</surname>
                            <given-names>M</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>IX</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Li</surname>
                            <given-names>Y</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Widespread RNA and DNA sequence differences in the human transcriptome.</article-title>
                    <source>
						
                        <italic toggle="yes">Science.</italic>
					</source>
                    <year>2011</year>;<volume>333</volume>(<issue>6038</issue>):<fpage>53</fpage>&#x2013;<lpage>58</lpage>.
                    <pub-id pub-id-type="pmid">21596952</pub-id>
                    <pub-id pub-id-type="doi">10.1126/science.1207018</pub-id>
                    <pub-id pub-id-type="pmcid">3204392</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-32">
                <label>32</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Guo</surname>
                            <given-names>Y</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Zhao</surname>
                            <given-names>S</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Sheng</surname>
                            <given-names>Q</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>The discrepancy among single nucleotide variants detected by DNA and RNA high throughput sequencing data.</article-title>
                    <source>
						
                        <italic toggle="yes">BMC Genomics.</italic>
					</source>
                    <year>2017</year>;<volume>18</volume>(<issue>Suppl 6</issue>):<fpage>690</fpage>.
                    <pub-id pub-id-type="pmid">28984205</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s12864-017-4022-x</pub-id>
                    <pub-id pub-id-type="pmcid">5629567</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-33">
                <label>33</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Flegel</surname>
                            <given-names>C</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Manteniotis</surname>
                            <given-names>S</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Osthold</surname>
                            <given-names>S</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Expression profile of ectopic olfactory receptors determined by deep sequencing.</article-title>
                    <source>
						
                        <italic toggle="yes">PLoS One.</italic>
					</source>
                    <year>2013</year>;<volume>8</volume>(<issue>2</issue>):<fpage>e55368</fpage>.
                    <pub-id pub-id-type="pmid">23405139</pub-id>
                    <pub-id pub-id-type="doi">10.1371/journal.pone.0055368</pub-id>
                    <pub-id pub-id-type="pmcid">3566163</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-34">
                <label>34</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Abaffy</surname>
                            <given-names>T</given-names>
                        </name>
					</person-group>:
                    <article-title>Human olfactory receptors expression and their role in non-olfactory tissues-a mini-review.</article-title>
                    <source>
						
                        <italic toggle="yes">J Pharmacogenomics Pharmacoproteomics.</italic>
					</source>
                    <year>2015</year>;<volume>6</volume>:<fpage>152</fpage>.
                    <pub-id pub-id-type="doi">10.4172/2153-0645.1000152</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-35">
                <label>35</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Miller</surname>
                            <given-names>AC</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Obholzer</surname>
                            <given-names>ND</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Shah</surname>
                            <given-names>AN</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>RNA-seq-based mapping and candidate identification of mutations from forward genetic screens.</article-title>
                    <source>
						
                        <italic toggle="yes">Genome Res.</italic>
					</source>
                    <year>2013</year>;<volume>23</volume>(<issue>4</issue>):<fpage>679</fpage>&#x2013;<lpage>686</lpage>.
                    <pub-id pub-id-type="pmid">23299976</pub-id>
                    <pub-id pub-id-type="doi">10.1101/gr.147322.112</pub-id>
                    <pub-id pub-id-type="pmcid">3613584</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-36">
                <label>36</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Piskol</surname>
                            <given-names>R</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Ramaswami</surname>
                            <given-names>G</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Li</surname>
                            <given-names>JB</given-names>
                        </name>
					</person-group>:
                    <article-title>Reliable identification of genomic variants from RNA-seq data.</article-title>
                    <source>
						
                        <italic toggle="yes">Am J Hum Genet.</italic>
					</source>
                    <year>2013</year>;<volume>93</volume>(<issue>4</issue>):<fpage>641</fpage>&#x2013;<lpage>651</lpage>.
                    <pub-id pub-id-type="pmid">24075185</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.ajhg.2013.08.008</pub-id>
                    <pub-id pub-id-type="pmcid">3791257</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-37">
                <label>37</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Lee</surname>
                            <given-names>MC</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Lopez-Diaz</surname>
                            <given-names>FJ</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Khan</surname>
                            <given-names>SY</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Single-cell analyses of transcriptional heterogeneity during drug tolerance transition in cancer cells by RNA sequencing.</article-title>
                    <source>
						
                        <italic toggle="yes">Proc Natl Acad Sci U S A.</italic>
					</source>
                    <year>2014</year>;<volume>111</volume>(<issue>44</issue>):<fpage>E4726</fpage>&#x2013;<lpage>E4735</lpage>.
                    <pub-id pub-id-type="pmid">25339441</pub-id>
                    <pub-id pub-id-type="doi">10.1073/pnas.1404656111</pub-id>
                    <pub-id pub-id-type="pmcid">4226127</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-38">
                <label>38</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Deelen</surname>
                            <given-names>P</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Zhernakova</surname>
                            <given-names>DV</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>de Haan</surname>
                            <given-names>M</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Calling genotypes from public RNA-sequencing data enables identification of genetic variants that affect gene-expression levels.</article-title>
                    <source>
						
                        <italic toggle="yes">Genome Med.</italic>
					</source>
                    <year>2015</year>;<volume>7</volume>(<issue>1</issue>):<fpage>30</fpage>.
                    <pub-id pub-id-type="pmid">25954321</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13073-015-0152-4</pub-id>
                    <pub-id pub-id-type="pmcid">4423486</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-39">
                <label>39</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Kang</surname>
                            <given-names>HM</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Subramaniam</surname>
                            <given-names>M</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Targ</surname>
                            <given-names>S</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Multiplexed droplet single-cell RNA-sequencing using natural genetic variation.</article-title>
                    <source>
						
                        <italic toggle="yes">Nat Biotechnol.</italic>
					</source>
                    <year>2018</year>;<volume>36</volume>(<issue>1</issue>):<fpage>89</fpage>&#x2013;<lpage>94</lpage>.
                    <pub-id pub-id-type="pmid">29227470</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nbt.4042</pub-id>
                    <pub-id pub-id-type="pmcid">5784859</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-40">
                <label>40</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Engstr&#x00f6;m</surname>
                            <given-names>PG</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Steijger</surname>
                            <given-names>T</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Sipos</surname>
                            <given-names>B</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Systematic evaluation of spliced alignment programs for RNA-seq data.</article-title>
                    <source>
						
                        <italic toggle="yes">Nat Methods.</italic>
					</source>
                    <year>2013</year>;<volume>10</volume>(<issue>12</issue>):<fpage>1185</fpage>&#x2013;<lpage>1191</lpage>.
                    <pub-id pub-id-type="pmid">24185836</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nmeth.2722</pub-id>
                    <pub-id pub-id-type="pmcid">4018468</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-41">
                <label>41</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Fasterius</surname>
                            <given-names>E</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>vobencha</surname>
                        </name>
						
                        <name name-style="western">
                            <surname>hpages</surname>
                        </name>
					</person-group>:
                    <article-title>fasterius/seqCAT: seqCAT version 1.2.1 (Version 1.2.1).</article-title>
                    <source>
						
                        <italic toggle="yes">Zenodo.</italic>
					</source>
                    <year>2018</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://www.doi.org/10.5281/zenodo.2669143">http://www.doi.org/10.5281/zenodo.2669143</ext-link>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report52330">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.21974.r52330</article-id>
            <title-group>
                <article-title>Reviewer response for version 2</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Fan</surname>
                        <given-names>Jean</given-names>
                    </name>
                    <xref ref-type="aff" rid="r52330a1">1</xref>
                    <xref ref-type="aff" rid="r52330a2">2</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-0212-5451</uri>
                </contrib>
                <aff id="r52330a1">
                    <label>1</label>Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA</aff>
                <aff id="r52330a2">
                    <label>2</label>Department of Chemistry and Chemical Biology, Harvard, Cambridge, MA, USA</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>23</day>
                <month>9</month>
                <year>2019</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2019 Fan J</copyright-statement>
                <copyright-year>2019</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport52330" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.16083.2"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>reject</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>While the authors have improved aspects of the manuscript through a revision, it remains unclear why the presented analysis necessitated the development of new software. Compared to VariantAnnotation (another Bioconductor package much like seqCAT; not a 'command line-based software' as noted in the revision), seqCAT uses a number of default and optional filtering parameters which, while now described in the revision, remain unexplained and unjustified, thus limiting its transparency, particularly for novice users for which this package is intended.</p>
            <p> </p>
            <p> Furthermore, it is unclear whether the noted 'highly accessible, easy-to-use' and 'intuitive' features of seqCAT were benchmarked with user testing, particularly by those with 'little R expertise' and other 'novices'. A more thorough discussion of seqCAT's limitations, particular compared to existing software, is needed so users may better decide which software is best suitable for their particular needs.</p>
            <p>Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?</p>
            <p>No</p>
            <p>Is the rationale for developing the new software tool clearly explained?</p>
            <p>No</p>
            <p>Is the description of the software tool technically sound?</p>
            <p>Partly</p>
            <p>Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?</p>
            <p>Yes</p>
            <p>Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>bioinformatics, software development, RNA variant calling, cancer biology</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.</p>
        </body>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report52329">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.21974.r52329</article-id>
            <title-group>
                <article-title>Reviewer response for version 2</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Lexa</surname>
                        <given-names>Matej</given-names>
                    </name>
                    <xref ref-type="aff" rid="r52329a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-4213-5259</uri>
                </contrib>
                <aff id="r52329a1">
                    <label>1</label>Department of Machine Learning and Data Processing, Faculty of Informatics, Masaryk University, Botanicka, Czech Republic</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>21</day>
                <month>8</month>
                <year>2019</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2019 Lexa M</copyright-statement>
                <copyright-year>2019</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport52329" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.16083.2"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?</p>
            <p>Yes</p>
            <p>Is the rationale for developing the new software tool clearly explained?</p>
            <p>Partly</p>
            <p>Is the description of the software tool technically sound?</p>
            <p>Partly</p>
            <p>Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?</p>
            <p>Yes</p>
            <p>Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>NA</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report49270">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.17563.r49270</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Fan</surname>
                        <given-names>Jean</given-names>
                    </name>
                    <xref ref-type="aff" rid="r49270a1">1</xref>
                    <xref ref-type="aff" rid="r49270a2">2</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-0212-5451</uri>
                </contrib>
                <aff id="r49270a1">
                    <label>1</label>Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA</aff>
                <aff id="r49270a2">
                    <label>2</label>Department of Chemistry and Chemical Biology, Harvard, Cambridge, MA, USA</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>10</day>
                <month>6</month>
                <year>2019</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2019 Fan J</copyright-statement>
                <copyright-year>2019</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport49270" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.16083.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>reject</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>Overview</p>
            <p> </p>
            <p> Fasterius and Szigyarto present seqCAT, an R-package for single nucleotide variant analysis (downstream of alignment and variant calling). The authors apply seqCAT to characterize the genetic concordance between DNA and RNA samples as well as cancer samples and derived organized.</p>
            <p> </p>
            <p> While high throughput sequencing technologies are indeed producing large amounts of big data demanding novel computational tools and software for efficient processing and analysis, it is unclear why the analysis presented in this manuscript demanded a new software rather than applying functionalities already existing in packages such as VariantAnnotation and GenomicRanges, both which the presented software builds on. The software, while comparably easy to use to other Bioconductor packages, is not sufficiently well documented, particularly with respect to the variant filtering step. Likewise, the presented analysis does not highlight the utility or necessity of the developed software. Generally, the rationale for the development of the software is unclear, particularly given its redundancy with existing software.&#x00a0;</p>
            <p> </p>
            <p> Therefore, this manuscript and software in its current form is not of high enough standard to warrant indexing. I hope the following comments will be useful for the authors.&#x00a0;</p>
            <p> </p>
            <p> Major comments (software)</p>
            <p> </p>
            <p> Being that this is a software tool article, I have compared my typical VCF analysis approach to seqCAT's pipeline and have the following set of major and minor comments:</p>
            <p> </p>
            <p> 1. Regarding the create_profile() function&#x00a0;</p>
            <p> </p>
            <p> ```{r}</p>
            <p> # Start with same VCF file</p>
            <p> library("seqCAT")</p>
            <p> vcf &lt;- system.file("extdata", "example.vcf.gz", package = "seqCAT")</p>
            <p> </p>
            <p> # How I would typically parse a VCF file&#x00a0;</p>
            <p> library(VariantAnnotation)</p>
            <p> data &lt;- readVcfAsVRanges(vcf)</p>
            <p> vi &lt;- sampleNames(data) == "HCT116"</p>
            <p> hct.norm &lt;- data[vi,]</p>
            <p> class(hct.norm)</p>
            <p> length(hct.norm)</p>
            <p> </p>
            <p> # How seqCAT does it</p>
            <p> setwd("~/Desktop") # note a text file is written so I need to change my working directory to somewhere that I have write permission</p>
            <p> create_profile(vcf, "HCT116", "hct116.profile.txt", filter=FALSE)</p>
            <p> hct116 &lt;- read_profile("hct116.profile.txt", "HCT116") # now I have to read the written file it back in</p>
            <p> class(hct116)</p>
            <p> length(hct116)</p>
            <p> ```</p>
            <p> </p>
            <p> My approach reads in a VCF file as VRanges and filters for the variants annotated as HCT116 for the sample using functionalities available in the VariantAnnotation package,&#x00a0;ending up with 12055 variants. In comparison, with seqCAT, I only get 1210 variants, despite setting the filter parameter to FALSE. The documentation is not sufficiently clear for me to understand the cause of the difference.&#x00a0;</p>
            <p> </p>
            <p> I can speculate that perhaps seqCAT is restricting to the single-nucleotide variants as opposed to larger indels. So I can test this:</p>
            <p> </p>
            <p> ```{r}</p>
            <p> # limit to single nucleotide variants</p>
            <p> vi &lt;- width(ranges(hct.norm)) == 1</p>
            <p> hct.norm &lt;- hct.norm[vi,]</p>
            <p> length(hct.norm)</p>
            <p> ```</p>
            <p> </p>
            <p> However, this still leaves me with 10773 variants, magnitudes more than the 1210 variants identified by seqCAT. The filtering criteria used by seqCAT are not well documented.&#x00a0;</p>
            <p> </p>
            <p> </p>
            <p> 2. Regarding the compare_profiles() function</p>
            <p> </p>
            <p> ```{r}</p>
            <p> # Begin with same set of variants</p>
            <p> create_profile(vcf, "HCT116", "hct116.profile.txt", filter=TRUE)</p>
            <p> hct116 &lt;- read_profile("hct116.profile.txt", "HCT116")</p>
            <p> create_profile(vcf, "RKO", "rko.profile.txt", filter=TRUE)</p>
            <p> rko &lt;- read_profile("rko.profile.txt", "RKO")</p>
            <p> </p>
            <p> # How I would typically compare two VCF files</p>
            <p> # to count number of shared variants</p>
            <p> length(intersect(hct116, rko))</p>
            <p> # or if I want to keep the detailed info</p>
            <p> foo &lt;- unique(hct116[hct116 %in% rko,])</p>
            <p> </p>
            <p> # How seqCAT does it</p>
            <p> hct116_rko &lt;- compare_profiles(hct116, rko)</p>
            <p> dim(hct116_rko)</p>
            <p> class(hct116_rko)</p>
            <p> ```</p>
            <p> </p>
            <p> In both cases, I end up with 282 shared variants between HCT116 and RKO, so it is unclear why the compare_profiles() function is necessary given existing intersect() functionalities already available. Furthermore, note that the resulting output of seqCAT's compare_profiles() function is a dataframe object, even though the inputs are both GRanges objects, whereas my approach maintains a GRanges data structure. It is unclear why seqCAT casts the GRanges input into dataframes rather than maintaining a consistent data structure.</p>
            <p> </p>
            <p> </p>
            <p> Minor comments (software)</p>
            <p> </p>
            <p> 1. The authors note that seqCAT "follows existing best coding practises, including a clean, modular and robust design." Please elaborate on what these existing best coding practices are or cite the referenced best coding practices if possible. Also note the misspelling.&#x00a0;</p>
            <p> </p>
            <p> 2. The seqCAT software provides additional functionalities "to read and compare variants present in the Catalogue of somatic mutations in cancer (COSMIC) database" as noted in the manuscript. However, these functionalities do not appear used in the analyses presented in the manuscript. Out of curiosity, are any of the genetic differences between primary cancer samples and derived organoids found in COSMIC? Likewise, what is the genetic similarity measurement when subsetted to variants present in COSMIC?</p>
            <p> </p>
            <p> Major comments (biology)</p>
            <p> </p>
            <p> While this is a software tool article, the authors apply the software to analyze biological data and propose a number of biological conclusions, for which I have the following set of major and minor comments:</p>
            <p> </p>
            <p> 1. The authors propose that "long-term cultures seem to be more genetically similar than the transition from tissue to organoid." However, the analysis performed to support this claim was limited to single nucleotide variants. So it remains unclear whether larger-scale genetic alterations are present and whether long-term cultures are also genetically more similar than the transition from tissue to organoid in terms of these other non-single-nucleotide genetic alternations.&#x00a0;</p>
            <p> </p>
            <p> 2. The authors note that "there are only a limited number of mismatching HIGH variants" when comparing tissue and early organoid cultures. However, closer inspection finds that the proportion of mismatching HIGH variants (0.2%) is quite comparable to the proportion of matching HIGH variants (0.3%). Based on this result, the null hypothesis that both matching and mismatching variants come from the same underlying impact distribution cannot be rejected. It is inaccurate to imply that mismatching variants are depleted in the HIGH impact category simply because there are so few of them as the total number of mismatching variants is also fewer than matching variants.&#x00a0;</p>
            <p> </p>
            <p> 3. Why was the GPRIN1 variant missed in the original publication and found by seqCAT? It is unclear whether this discrepancy is the result of alignment, variant calling, and other upstream differences or if it is the result of using seqCAT. How do we know this mutation is not a false positive/sequencing error? What is&#x00a0;GPRIN1? Is it an important oncogene? Is the mutation present in COSMIC? "Given the importance of these previously known variants it is likely that the GPRIN1 mutation may be of significance" is not sufficiently justified.&#x00a0;</p>
            <p> </p>
            <p> Minor comments (biology)</p>
            <p> </p>
            <p> 1. The authors find that "Differences between DNA- and RNA-based variant calls in this dataset are also analysed revealing a high median concordance of 97.5%." This is still not 100% So are the differences enriched for putative RNA editing? Or do they occur at SNVs with lower quality scores? What explains the difference?</p>
            <p> </p>
            <p> 2. The conclusion "organoids are accurate...in vitro models of liver cancer" is not adequately supported without a more thorough transcriptomics comparison of the organoids and primary cancer tissue. While such a study is beyond the scope of this manuscript, the authors should refrain from drawing inadequately supported conclusions.&#x00a0;</p>
            <p> </p>
            <p> 3. The authors find that their DNA and RNA variant calls exhibit ~97.5% median concordance compared to previous 80 to 90% estimates, speculating that this increased concordance is solely the result of improved alignment, variant calling best practices, and using the latest human genome assembly, rather than due to use of seqCAT. It is unclear whether or not sequencing technology differences, or even biological differences between breast cancer and liver cancer are also contributing factors.</p>
            <p>Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?</p>
            <p>No</p>
            <p>Is the rationale for developing the new software tool clearly explained?</p>
            <p>No</p>
            <p>Is the description of the software tool technically sound?</p>
            <p>Partly</p>
            <p>Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?</p>
            <p>Yes</p>
            <p>Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>bioinformatics, software development, RNA variant calling, cancer biology</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.</p>
        </body>
        <sub-article article-type="response" id="comment4763-49270">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Al-Khalili Szigyarto</surname>
                            <given-names>Cristina</given-names>
                        </name>
                        <aff>KTH Royal Institute of Technology, Sweden</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>18</day>
                    <month>7</month>
                    <year>2019</year>
                </pub-date>
            </front-stub>
            <body>
                <p>We are grateful for the constructive and thorough criticism presented by the reviewer, and have implemented several alternations to both the seqCAT package itself (version 1.7.1 and forwards), its documentation and the manuscript. Ease-of-use is one of the main features of seqCAT, as the reviewer has already pointed out, along with a complete data-to-figures workflow. The manuscript&#x2019;s introduction has been extended to more fully address these points and motivate the rationale behind seqCAT&#x2019;s creation.</p>
                <p> The discrepancy between the process used by the reviewer and seqCAT can indeed be attributed to the additional filtering that seqCAT performs. This filtering includes variant caller-specific filtering, minimum variant depth, mitochondrial variants, non-standard chromosome, unique variants at the gene- or position-level and variants with missing genotypes. We agree with the reviewer that this should be made clearer in both the manuscript and the documentation, and have thus made appropriate changes (version 1.7.2). As already pointed out by the reviewer and as stated in the manuscript, seqCAT only works with SNV profiles, 
                    <italic>i.e.</italic> single nucleotide variants.</p>
                <p> The reason the `compare_profiles` function does not use code similar to that shown in the reviewer&#x2019;s example is that such a procedure, while indeed yielding the same overlaps, loses the sample-specific metadata. Looking at the number of metadata-columns for the example comparison there is a difference of 11 (18 using the reviewer&#x2019;s code, 29 for seqCAT). These sample-specific metadata is used both in the `compare_profiles` itself (such as for comparing sample genotypes) as well as in downstream analyses (such as when plotting variant grids or impact distributions).</p>
                <p> SNV profile creation is now performed within R and the mandatory storage of profiles to disk has been removed; this can now be performed with a separate function for users that still require it. Inconsistencies in seqCAT&#x2019;s data structure has been addressed: now only data frames are presented to the user, in order to facilitate simplicity and ease-of-use.</p>
                <p> Several updates and extensions to the manuscript has also been added, addressing the biology-related comments raised by the reviewer.</p>
            </body>
        </sub-article>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report46167">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.17563.r46167</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Lexa</surname>
                        <given-names>Matej</given-names>
                    </name>
                    <xref ref-type="aff" rid="r46167a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-4213-5259</uri>
                </contrib>
                <aff id="r46167a1">
                    <label>1</label>Department of Machine Learning and Data Processing, Faculty of Informatics, Masaryk University, Botanicka, Czech Republic</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>9</day>
                <month>4</month>
                <year>2019</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2019 Lexa M</copyright-statement>
                <copyright-year>2019</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport46167" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.16083.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The manuscript describes a novel computational tool for genotype analysis and comparison called seqCAT.</p>
            <p> </p>
            <p> The tool has been created as a package for R/Bioconductor and has already been accepted into the Bioconductor repository. I was able to install it and follow the example code in the package as well as study its use in the manuscript. In all my tests the package and its functions worked as designed/described in the accompanying materials.</p>
            <p> </p>
            <p> Although the code is fully functional, both the code and the submitted manuscript leave much to be deserved. The most important issues in this respect are i) the absense of critical comparison with existing tools, ii) better description of some of the available functionality and last but not least, iii) better integration into the existing data and code structure. 
                <list list-type="bullet">
                    <list-item>
                        <p>As far as other tools are regarded, the authors cite the need for a tool like seqCAT by referring to vcftools,&#x00a0;VariantAnnotation R package, IGV&#x00a0;and Ensembl Genome Browser and some proprietary software. However, today there are dozens of tools that may come close to the functionality presented here and deserve to be mentioned and compared critically. Just a quick browsing of several sources yielded software, such as adegenet (https://cran.r-project.org/web/packages/adegenet/index.html), anvi'o (http://merenlab.org/2015/07/20/analyzing-variability/), SomaticSniper (http://gmt.genome.wustl.edu/packages/somatic-sniper/), PhyloSNP (https://hive.biochemistry.gwu.edu/dna.cgi?cmd=phylosnp), GATK or BEDOPS that has a vcf2bed function (https://bedops.readthedocs.io/en/latest/content/reference/file-management/conversion/vcf2bed.html) that can lead to comparison based on interval sets. Would PLINK and its SNP profiling abilities be powerful enough (http://zzz.bwh.harvard.edu/plink/profile.shtml)? Are methods typically used for small and medium-sized SNP samples, such as the MATLAB code here (https://jamanetwork.com/journals/jamaoncology/fullarticle/2598491) different from methods that must be applied to whole-genome data? I don't know the answers to some of these questions but I feel the authors should look wider to show the advantages of seqCAT, if any. One advantage, also mentioned by the authors is simplicity of use. However, it should be clear what the trade-offs are.</p>
                    </list-item>
                    <list-item>
                        <p>The manuscript mentions SNVs are filtered based on quality and other criteria but doesn't give enough details about what is happening under the hood. The software is open source, however the manuscript should lay out basic principles of data manipulation done by their package in plain English. Also, reading a profile into a package and comparing it to others create different GenomicRanges/data frame data objects in R that should also be described briefly.</p>
                    </list-item>
                    <list-item>
                        <p>Loosely connected to the data frame data structures mentioned above, I see the way seqCAT manipulates data as a weak point. First of all, it calculate profiles and saves them into a file, effectively outside R, only to read the files in the next step. It would seem much more natural, to use some internal data structure, maybe even the same data frame created later, to keep the data in R and offer appropriate writing/reading/conversion functions to create files outside R. As for conversions, data formats for some of the data calculated by seqCAT already exist and would make the software much more powerful, if the users could write to them (or even read from them). Although the profiles can be exported into BED/GFF3 with some third party libraries (e.g. rtracklayer), perhaps it would be useful to go to BAM/SAM, back to VCF after some manipulation (right now only filtration, presumably), or hapmap and others for transfer of data into other software (e.g. PLINK, VarDict)?</p>
                    </list-item>
                </list>
            </p>
            <p>Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?</p>
            <p>Yes</p>
            <p>Is the rationale for developing the new software tool clearly explained?</p>
            <p>Partly</p>
            <p>Is the description of the software tool technically sound?</p>
            <p>Partly</p>
            <p>Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?</p>
            <p>Yes</p>
            <p>Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>Bioinformatics</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <sub-article article-type="response" id="comment4762-46167">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Al-Khalili Szigyarto</surname>
                            <given-names>Cristina</given-names>
                        </name>
                        <aff>KTH Royal Institute of Technology, Sweden</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>18</day>
                    <month>7</month>
                    <year>2019</year>
                </pub-date>
            </front-stub>
            <body>
                <p>We are thankful for the feedback provided by the reviewer, and have made several changes to both the seqCAT package itself, its documentation and the manuscript. The latter&#x2019;s introductory section has been extended with a more thorough exploration of existing tools and how seqCAT differs from them. The description of the various filtering criteria has also been extended, along with a new section covering the same in the package vignette.</p>
                <p> The structure and use of data objects have been streamlined to only utilise data frames at the user-level, to further facilitate easy-of-use and consistency (version 1.7.1). The creation of SNV profiles is now performed within R, with no saving of the final profile to the hard disk; a separate function for this has been added as an option for users that still desire the old functionality (1.7.1). Reading and writing in several other file formats has also been added (1.7.2).</p>
            </body>
        </sub-article>
    </sub-article>
</article>
