<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.18276.1</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Research Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>Fast and accurate differential transcript usage by testing equivalence class counts</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 1; peer review: 3 approved with reservations]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Cmero</surname>
                        <given-names>Marek</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-7783-5530</uri>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="yes" equal-contrib="yes">
                    <name>
                        <surname>Davidson</surname>
                        <given-names>Nadia M.</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-8461-7467</uri>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="yes" equal-contrib="yes">
                    <name>
                        <surname>Oshlack</surname>
                        <given-names>Alicia</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-9788-5690</uri>
                    <xref ref-type="corresp" rid="c2">b</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>Murdoch Childrens Research Institute, Parkville, Victoria, 3052, Australia</aff>
                <aff id="a2">
                    <label>2</label>School of BioScience, University of Melbourne, Parkville, Victoria, Australia</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:nadia.davidson@mcri.edu.au">nadia.davidson@mcri.edu.au</email>
                </corresp>
                <corresp id="c2">
                    <label>b</label>
                    <email xlink:href="mailto:alicia.oshlack@mcri.edu.au">alicia.oshlack@mcri.edu.au</email>
                </corresp>
                <fn id="fn1">
                    <p>
                        <sup>*</sup>These authors contributed equally in supervision of this work</p>
                </fn>
                <fn fn-type="conflict">
                    <p>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>7</day>
                <month>3</month>
                <year>2019</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2019</year>
            </pub-date>
            <volume>8</volume>
            <elocation-id>265</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>20</day>
                    <month>2</month>
                    <year>2019</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2019 Cmero M et al.</copyright-statement>
                <copyright-year>2019</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/8-265/pdf"/>
            <abstract>
                <p>
                    <bold>Background:</bold> RNA sequencing has enabled high-throughput and fine-grained quantitative analyses of the transcriptome. While differential gene expression is the most widely used application of this technology, RNA-seq data also has the resolution to infer differential transcript usage (DTU), which can elucidate the role of different transcript isoforms between experimental conditions, cell types or tissues. DTU has typically been inferred from exon-count data, which has issues with assigning reads unambiguously to counting bins, and requires alignment of reads to the genome. Recently, approaches have emerged that use transcript quantifications estimates directly for DTU. Transcript counts can be inferred from 'pseudo' or lightweight aligners, which are significantly faster than traditional genome alignment. However, recent evaluations show lower sensitivity in DTU analysis. Transcript abundances are estimated from equivalence classes (ECs), which determine the transcripts that any given read is compatible with. Recent work has proposed performing differential expression testing directly on equivalence class read counts (ECs).</p>
                <p>
                    <bold>Methods:</bold> Here we demonstrate that ECs can be used effectively with existing count-based methods for detecting DTU. We evaluate this approach on simulated human and drosophila data, as well as on a real dataset through subset testing.</p>
                <p>
                    <bold>Results:</bold> We find that ECs counts have similar sensitivity and false discovery rates as exon-level counts but can be generated in a fraction of the time through the use of pseudo-aligners.</p>
                <p>
                    <bold>Conclusions:</bold> We posit that equivalence class read counts are a natural unit on which to perform many types of analysis.</p>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>RNA-seq</kwd>
                <kwd>differential transcript usage</kwd>
                <kwd>equivalence class</kwd>
                <kwd>transcript compatibility class</kwd>
                <kwd>pseudo-alignment</kwd>
                <kwd>DEXSeq</kwd>
                <kwd>Salmon</kwd>
                <kwd>Kallisto</kwd>
            </kwd-group>
            <funding-group>
                <award-group id="fund-1" xlink:href="http://dx.doi.org/10.13039/501100000925">
                    <funding-source>National Health and Medical Research Council</funding-source>
                    <award-id>APP1140626</award-id>
                </award-group>
                <funding-statement>This work was supported by NHMRC project grant number APP1140626 to AO and ND.</funding-statement>
                <funding-statement>
                    <italic>The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</italic>
                </funding-statement>
            </funding-group>
        </article-meta>
    </front>
    <body>
        <sec sec-type="intro">
            <title>Introduction</title>
            <p>RNA sequencing with short-read sequencing technologies (RNA-seq) has been used for over a decade for exploring the transcriptome. While differential gene expression is one of the most widely used applications of this data, significantly higher resolution can be achieved by using the data to explore the multiple transcripts expressed from each gene locus. In particular, it has been shown that each gene can have multiple isoforms, sometimes with distinct functions, and the dominant transcript can be different across samples
                <sup>
                    <xref ref-type="bibr" rid="ref-1">1</xref>
                </sup>. Therefore, one important analysis task is to look for differential transcript usage (DTU) between samples.</p>
            <p>DTU can be inferred through differential exon usage, where the proportions of RNA-Seq fragments aligning to each exon change relative to each other between biological groups. Anders 
                <italic toggle="yes">et al.</italic>
                <sup>
                    <xref ref-type="bibr" rid="ref-2">2</xref>
                </sup> showed that exon counts could be used to test for differential exon usage with a generalized linear model that accounts for biological variability. However, counting fragments across exons is not ideal because many fragments will align across multiple exons, making their assignment to an individual exon ambiguous. Moreover, individual exons often need to be partitioned into multiple disjoint counting bins when exon lengths differ between transcripts. Typically, there will be more counting bins than transcripts, resulting in lower power to detect differences between samples.</p>
            <p>An alternative to using exon counts for testing DTU is to perform tests directly on estimated transcript abundances
                <sup>
                    <xref ref-type="bibr" rid="ref-3">3</xref>
                </sup>. Recently, fast and accurate methods for quantifying gene expression at the transcript level have been developed
                <sup>
                    <xref ref-type="bibr" rid="ref-4">4</xref>,
                    <xref ref-type="bibr" rid="ref-5">5</xref>
                </sup>. These methods use transcript annotations that include multiple known transcript sequences for each gene as a reference for the alignment. The expression levels of individual transcripts can be estimated from pseudo-aligned reads that are compatible with transcripts associated with a specific gene
                <sup>
                    <xref ref-type="bibr" rid="ref-6">6</xref>
                </sup>. Transcript abundance estimates can be used as an alternative starting measure for DTU testing, which has been shown to perform comparably with state-of-the-art methods
                <sup>
                    <xref ref-type="bibr" rid="ref-3">3</xref>
                </sup>. In addition, pseudo-alignment is significantly faster than methods that map to a genome. However, in the most comprehensive comparison using simulated data, exon-count based methods were shown to have slightly better performance compared with methods that first estimate transcript abundances
                <sup>
                    <xref ref-type="bibr" rid="ref-3">3</xref>
                </sup>.</p>
            <p>Conceptually, quantification by lightweight or &#x2018;pseudo&#x2019; alignment begins by using a transcript annotation as a reference and then assigns each read as &#x2018;compatible&#x2019; with one or more transcripts that are a close alignment to the read
                <sup>
                    <xref ref-type="bibr" rid="ref-4">4</xref>
                </sup>. Because different transcripts of the same gene share large amounts of sequence, many reads are compatible with several transcripts. Reads are therefore assigned to an equivalence class, or transcript compatibility class, which reflects the combination of transcripts compatible with the read sequence (
                <xref ref-type="fig" rid="f1">Figure 1</xref>). For the purposes of this work, we consider an equivalence class to be defined as in Bray 
                <italic toggle="yes">et al.</italic>
                <sup>
                    <xref ref-type="bibr" rid="ref-4">4</xref>
                </sup>, i.e. any fragments that are pseudo-aligned to the same set of transcripts are considered to be part of the same equivalence class. 
                <xref ref-type="fig" rid="f1">Figure 1</xref> shows a toy example of a gene with three different transcripts. Depending on its sequence, a read can align to all three transcripts, only two of the transcripts or just one transcript. These different combinations result in four possible equivalence classes, containing read counts, for this gene.</p>
            <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                <label>Figure 1. </label>
                <caption>
                    <title>The use of equivalence classes for detecting differential transcript usage (DTU) in a hypothetical gene.</title>
                    <p>The example shows a gene consisting of six exons (Ex1-6) and three transcripts (t
                        <sub>1-3</sub>) resulting in four equivalence classes (EC1-4). t
                        <sub>1</sub> is predominantly expressed in condition 1 (S1), whereas t
                        <sub>3</sub> is predominantly expressed in condition 2 (S2). The DTU is evident as a change in the relative counts for EC2, EC3 and EC4 between conditions. The pipelines for the three alternative methods for detecting DTU are shown: quantification of transcript expression followed by DTU testing, assignment of read counts to equivalence classes followed by testing of equivalence class counts (DECU) and assignment of read counts to exons followed by differential exon counts (DEU). Genes that are detected to have DECU or DEU are inferred to have DTU. The transcript quantification table in the left-most column is example data only, and is not based on real inference.</p>
                </caption>
                <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/19992/0d36a8cc-9198-4953-bd91-6f2ed5eef87c_figure1.gif"/>
            </fig>
            <p>Recently, equivalence classes have been used for clustering single-cells
                <sup>
                    <xref ref-type="bibr" rid="ref-7">7</xref>,
                    <xref ref-type="bibr" rid="ref-8">8</xref>
                </sup> and Yi and colleagues have recently introduced direct differential testing on equivalence classes in a catch-all method to identify genes that display any transcript-level phenomena such as cancellation (isoform switching), domination (high abundance isoform(s) that mask transcript-level differences) and collapsing (multiple transcripts exhibiting small changes in the same direction)
                <sup>
                    <xref ref-type="bibr" rid="ref-9">9</xref>
                </sup>. Here we focus on the case of isoform switching using methods originally designed for testing exon read counts. We evaluate the appropriateness of equivalence class read counts as an alternative choice for quantification compared to exon- and transcript-level quantification. We propose that DTU can be more accurately detected using equivalence class counts directly, rather than using these counts to first estimate individual transcript abundances before performing DTU. Soneson 
                <italic toggle="yes">et al.</italic> applied a conceptually similar method with MISO
                <sup>
                    <xref ref-type="bibr" rid="ref-10">10</xref>
                </sup> by defining counting bins as combinations isoforms and counting according to isoform compatibility
                <sup>
                    <xref ref-type="bibr" rid="ref-3">3</xref>
                </sup>. In our scenario, count-based DTU testing procedures such as DEXSeq are applied directly to equivalence classes generated from fast lightweight aligners, such as Salmon and Kallisto. DTU testing on equivalence class counts is not only fast but also bypasses inherent uncertainty in directly estimating transcript abundances before statistical testing.</p>
            <p>We evaluate the performance of DTU testing on equivalence class read counts using real and simulated data, and show that the approach yields higher sensitivity and lower false discovery rates than estimating counts from transcript abundances, and performs faster with accuracies similar or better than counting across exons.</p>
        </sec>
        <sec sec-type="results">
            <title>Results</title>
            <p>Here we propose an alternative pipeline for performing DTU and evaluate its performance using simulated and real datasets
                <sup>
                    <xref ref-type="bibr" rid="ref-23">23</xref>
                </sup>. The method we propose is to first perform alignment with a lightweight aligner and extract equivalence class (EC or transcript compatibility) counts. These EC counts are assigned to genes using the annotation of the transcripts matching to the EC. Next, each gene is tested for DTU between conditions using a count based statistical testing method where exon counts are replaced with EC counts (
                <xref ref-type="fig" rid="f1">Figure 1</xref>). Significant genes can then be interpreted to have a difference between the relative abundance of transcripts of that gene between conditional groups. In evaluating the EC approach, we used Salmon for pseudo-alignment and DEXSeq for differential testing. We then compared DTU results against the alternative quantification and counting approaches, also using DEXSeq for testing (see Methods). It should be noted that we are not attempting to evaluate the statistical testing method (DEXSeq) in relation to other methods, as this has been done previously in several papers
                <sup>
                    <xref ref-type="bibr" rid="ref-3">3</xref>,
                    <xref ref-type="bibr" rid="ref-9">9</xref>
                </sup>.</p>
            <p>The datasets we used to evaluate performance were simulated data from human and drosophila from Soneson 
                <italic toggle="yes">et al.</italic>
                <sup>
                    <xref ref-type="bibr" rid="ref-3">3</xref>
                </sup> and biological data from Bottomly 
                <italic toggle="yes">et al.</italic>
                <sup>
                    <xref ref-type="bibr" rid="ref-11">11</xref>
                </sup>. Each of the Soneson datasets consisted of two sample groups, each with three replicates, where 1000 genes were randomly selected to have DTU such that the expression levels of the two most abundant transcripts were switched. The Bottomly dataset contains 10 and 11 replicates each from two mouse strains that were used to call truth and then were subsampled to three replicates in the testing scenarios.</p>
            <sec>
                <title>Fewer equivalence classes are expressed than exons</title>
                <p>The number of counting bins used for DTU detection has an impact on sensitivity. More bins leads to lower average counts per bin and therefore lower statistical power per bin and more multiple testing correction. We therefore examined the number of ECs, transcripts and exons present in each dataset. Although the theoretical number of ECs from a set of transcripts can be calculated from the annotation and has the potential to be large, not all combinations of transcripts exist or are expressed. The number of equivalence classes calculated from pseudo-alignment depends on the experimental data as only ECs with reads assigned to them are reported. We compared the number of transcripts and exons in the three datasets (with at least one read) to the number of ECs. In both the simulated human and drosophila datasets, as well as in the Bottomly mouse data, the number of ECs is greater than the number of transcripts, but substantially fewer than the number of exons, indicating that there might be more power for testing DTU using ECs, compared to exon counts (
                    <xref ref-type="fig" rid="f2">Figure 2a</xref>).</p>
                <fig fig-type="figure" id="f2" orientation="portrait" position="float">
                    <label>Figure 2. </label>
                    <caption>
                        <title>The number of counting bins and variance between replicates.</title>
                        <p>(a) The number of transcripts, equivalence classes and exons per gene, where each feature has at least one associated read. (b) The density of the log
                            <sub>2</sub> of the variance of counts over the mean for each feature (calculated per condition).</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/19992/0d36a8cc-9198-4953-bd91-6f2ed5eef87c_figure2.gif"/>
                </fig>
            </sec>
            <sec>
                <title>Equivalence class replicates have low variance</title>
                <p>In addition, we found that the variability of counts across replicates calculated from ECs was lower than that from estimated transcript abundances across all three data sets (
                    <xref ref-type="fig" rid="f2">Figure 2b</xref>). Count variability of ECs was on average closer to the exon count variability distribution than ECs. For instance, the Bottomly data had an average log
                    <sub>2</sub> variance to mean ratio of -2.249 and -1.519 in exons and ECs respectively, compared to 0.115 in transcripts. The simulated data followed a similar pattern. Supplementary Figure 1
                    <sup>
                        <xref ref-type="bibr" rid="ref-21">21</xref>
                    </sup> shows the dispersion-mean trends, again demonstrating lower dispersion in ECs compared to transcript abundance estimates. We hypothesise that the greater dispersion observed for transcript data arises from the abundance estimation step used by pseudo-aligners to infer transcript counts. Due to the lower dispersion, we anticipate that ECs yield greater power for DTU compared to transcript abundance estimates.</p>
            </sec>
            <sec>
                <title>Performance of equivalence classes for DTU detection</title>
                <p>Several methods were previously tested on the simulated data from Soneson 
                    <italic toggle="yes">et al.</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref-3">3</xref>
                    </sup>; DEXSeq&#x2019;s default counting pipeline and featureCounts were shown to perform best. We recalculated exon counts using DEXSeq&#x2019;s counting pipeline (as recommended by Soneson 
                    <italic toggle="yes">et al.</italic>, we excluded region of genes that overlapped on the same strand in the input annotation) and ran Salmon
                    <sup>
                        <xref ref-type="bibr" rid="ref-5">5</xref>
                    </sup> to obtain both transcript abundance estimates and equivalence class counts. All other comparison results were obtained from Soneson 
                    <italic toggle="yes">et al.</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref-3">3</xref>
                    </sup>. For the simulated datasets, we found that ECs had the highest sensitivity in both the drosophila and human datasets (
                    <xref ref-type="fig" rid="f3">Figure 3a</xref>) with a TPR of 0.697 and 0.739 respectively (FDR &lt; 0.05). However, ECs also had a slightly higher FDR than exon-counting methods.</p>
                <fig fig-type="figure" id="f3" orientation="portrait" position="float">
                    <label>Figure 3. </label>
                    <caption>
                        <title>The performance of the equivalence class method for differential transcript usage.</title>
                        <p>(a) The equivalence class method compared to other state-of-the-art methods on simulated data described in Soneson 
                            <italic toggle="yes">et al.</italic>
                            <sup>
                                <xref ref-type="bibr" rid="ref-3">3</xref>
                            </sup>. (b) The ability of the equivalence class, transcript and exon-based methods to recreate the results of a full comparison (10 vs. 11) of the Bottomly data, using only a (randomly selected) subset of samples (3 vs. 3) across 20 iterations. The union of all genes called as significant across all three methods is used to calculate the FDR, and the intersect (genes called by all three methods) is used for the TPR. Full results (union, intersect and each method&#x2019;s individual truth set) is shown in Supplementary Figure 3.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/19992/0d36a8cc-9198-4953-bd91-6f2ed5eef87c_figure3.gif"/>
                </fig>
                <p>We next tested the performance of the EC method on a biological dataset from Bottomly 
                    <italic toggle="yes">et al.</italic> We tested the complete RNA-seq dataset (10 vs. 11) for DTU using DEXseq on counts generated from transcript abundance estimates, exons and ECs. To calculate the FDR, we considered the set of 'true' DTU genes to be the union of all genes called significant (FDR &lt; 0.05) across the three methods. To calculate the TPR, the intersect of genes called by all methods was used. Supplementary Figure 2
                    <sup>
                        <xref ref-type="bibr" rid="ref-21">21</xref>
                    </sup> shows the number of significant genes and overlap between all three methods. ECs called the highest number of genes with significant DTU (1485 genes, in contrast to the 748 and 391 genes called significant by the transcript and exon-based methods respectively). Similar to the FDR experiments described in Pimentel 
                    <italic toggle="yes">et al.</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref-12">12</xref>
                    </sup>, we randomly selected three samples per condition and performed DTU using all three methods and repeated this for 20 iterations. 
                    <xref ref-type="fig" rid="f3">Figure 3b</xref> shows the results. EC-based testing performed the best, with a mean FDR of 0.305 across all iterations (compared to a mean FDR of 0.569 and 0.373 for the transcript and exon-based methods respectively). The mean TPR was also slightly higher for ECs at 0.544, compared to exons at 0.539 and 0.36 for the transcript-based method. Results for all three combinations of the &#x2018;truth gene&#x2019; sets (union, intersect and individual) are shown in Supplementary Figure 3
                    <sup>
                        <xref ref-type="bibr" rid="ref-21">21</xref>
                    </sup>. The EC-based method had consistently lower FDR, which is also illustrated by the rank-order plot (Supplementary Figure 4
                    <sup>
                        <xref ref-type="bibr" rid="ref-21">21</xref>
                    </sup>), showing the number of false positives present in the top 500 FDR-ranked genes. In terms of the TPR, ECs performed better than transcripts, but worse than exons when using the union of all methods as the truth set. In the Bottomly analysis, Salmon was used as a representative method for transcript abundance estimation. We also performed the analysis with Kallisto, which gave results consistent with Salmon (Supplementary Figure 5
                    <sup>
                        <xref ref-type="bibr" rid="ref-21">21</xref>
                    </sup>).</p>
            </sec>
            <sec>
                <title>Computational performance</title>
                <p>While the performance of EC counts in term of sensitivity and FDR are only slightly better than exons level counts, another advantage of using ECs for analysis is the speed of alignment. The process can be broken down into workflow components that include alignment of sequenced reads, quantification and testing. 
                    <xref ref-type="table" rid="T1">Table 1</xref> shows the compute times for all three methods on all three datasets broken down into workflow components. For the exon counting method, STAR was used for the alignment of reads to the genome (see Methods). In every case, the transcript quantification method had the fastest total run time followed by ECs and then exons. The difference was mainly driven by the speed of using pseudo alignment for transcript and EC quantification, indicating that for larger datasets the speed of analysis will be significantly faster for our proposed EC based method compared with traditional exon counting methods. A small amount of extra time was also needed for the the EC method for matching EC counts to genes. In addition, DEXSeq generally runs more slowly with larger numbers of counting bins, which is the case for ECs compared with transcripts and improved scalability of DTU approaches is likely to narrow this performance gap. The speed of featurecounts over DEXseq&#x2019;s counting significantly improved run times for the exon-based method; however, the total run times still lagged behind the psuedo-alignment methods. We also note that the transcript-abundance inference stage performed by pseudo-aligners is not necessary for EC-based DTU testing, making salmon slightly faster to run when quantification is skipped (
                    <xref ref-type="table" rid="T1">Table 1</xref>).</p>
                <table-wrap id="T1" orientation="portrait" position="anchor">
                    <label>Table 1. </label>
                    <caption>
                        <title>Comparison of compute times.</title>
                        <p>Compute times shown in hh:mm:ss for the simulated data (101 bp paired-end) and Bottomly (76 bp single-end) read data, with each sample aligned and quantified in serial with access to 256GB RAM and 8 cores per sample, and post-quantification steps performed on count data from all samples from each batch in a single run with 256GB RAM and 8 cores. The alignment and quantification steps show the total time taken for all samples (i.e. the serial runtime). The drosophila and human samples contained approximately 25M and 40M reads respectively, and the Bottomly samples contained approximately 16M reads. Exons counts were quantified using DEXSeq-count (ds) and featureCounts (fc).</p>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1">Data</th>
                                <th align="center" colspan="4" rowspan="1">Compute times, hh:mm:ss</th>
                            </tr>
                            <tr>
                                <th align="left" colspan="1" rowspan="1">drosophila</th>
                                <th align="left" colspan="1" rowspan="1">Transcripts</th>
                                <th align="left" colspan="1" rowspan="1">ECs</th>
                                <th align="left" colspan="1" rowspan="1">Exons (ds)</th>
                                <th align="left" colspan="1" rowspan="1">Exons (fc)</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">Alignment</td>
                                <td align="right" colspan="1" rowspan="1">-</td>
                                <td align="right" colspan="1" rowspan="1">-</td>
                                <td align="right" colspan="1" rowspan="1">03:10:34</td>
                                <td align="right" colspan="1" rowspan="1">03:10:34</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">Quantification</td>
                                <td align="right" colspan="1" rowspan="1">00:09:47</td>
                                <td align="right" colspan="1" rowspan="1">00:09:09</td>
                                <td align="right" colspan="1" rowspan="1">02:48:45</td>
                                <td align="right" colspan="1" rowspan="1">00:00:53</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">Match ECs</td>
                                <td align="right" colspan="1" rowspan="1">-</td>
                                <td align="right" colspan="1" rowspan="1">00:00:18</td>
                                <td align="right" colspan="1" rowspan="1">-</td>
                                <td align="right" colspan="1" rowspan="1">-</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">DEXSeq DTU</td>
                                <td align="right" colspan="1" rowspan="1">00:01:17</td>
                                <td align="right" colspan="1" rowspan="1">00:03:28</td>
                                <td align="right" colspan="1" rowspan="1">00:03:16</td>
                                <td align="right" colspan="1" rowspan="1">00:02:47</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">
                                    <bold>Total</bold>
                                </td>
                                <td align="right" colspan="1" rowspan="1">
                                    <bold>00:11:04</bold>
                                </td>
                                <td align="right" colspan="1" rowspan="1">
                                    <bold>00:12:55</bold>
                                </td>
                                <td align="right" colspan="1" rowspan="1">
                                    <bold>06:02:35</bold>
                                </td>
                                <td align="right" colspan="1" rowspan="1">
                                    <bold>03:14:14</bold>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">
                                    <bold>hsapiens</bold>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">Alignment</td>
                                <td align="right" colspan="1" rowspan="1">-</td>
                                <td align="right" colspan="1" rowspan="1">-</td>
                                <td align="right" colspan="1" rowspan="1">01:16:33</td>
                                <td align="right" colspan="1" rowspan="1">01:16:33</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">Quantification</td>
                                <td align="right" colspan="1" rowspan="1">00:15:59</td>
                                <td align="right" colspan="1" rowspan="1">00:13:06</td>
                                <td align="right" colspan="1" rowspan="1">04:50:37</td>
                                <td align="right" colspan="1" rowspan="1">00:01:42</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">Match ECs</td>
                                <td align="right" colspan="1" rowspan="1">-</td>
                                <td align="right" colspan="1" rowspan="1">00:01:14</td>
                                <td align="right" colspan="1" rowspan="1">-</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">DEXSeq DTU</td>
                                <td align="right" colspan="1" rowspan="1">00:04:54</td>
                                <td align="right" colspan="1" rowspan="1">00:27:07</td>
                                <td align="right" colspan="1" rowspan="1">00:15:53</td>
                                <td align="right" colspan="1" rowspan="1">00:30:08</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">
                                    <bold>Total</bold>
                                </td>
                                <td align="right" colspan="1" rowspan="1">
                                    <bold>00:20:53</bold>
                                </td>
                                <td align="right" colspan="1" rowspan="1">
                                    <bold>00:41:27</bold>
                                </td>
                                <td align="right" colspan="1" rowspan="1">
                                    <bold>06:23:03</bold>
                                </td>
                                <td align="right" colspan="1" rowspan="1">
                                    <bold>01:48:23</bold>
                                </td>
                            </tr>
                            <tr>
                                <td align="right" colspan="1" rowspan="1">
                                    <bold>mouse (Bottomly)</bold>
                                </td>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">Alignment</td>
                                <td align="right" colspan="1" rowspan="1">-</td>
                                <td align="right" colspan="1" rowspan="1">-</td>
                                <td align="right" colspan="1" rowspan="1">00:43:12</td>
                                <td align="right" colspan="1" rowspan="1">00:43:12</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">Quantification</td>
                                <td align="right" colspan="1" rowspan="1">00:16:32</td>
                                <td align="right" colspan="1" rowspan="1">00:12:25</td>
                                <td align="right" colspan="1" rowspan="1">02:53:01</td>
                                <td align="right" colspan="1" rowspan="1">00:01:29</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">Match ECs</td>
                                <td align="right" colspan="1" rowspan="1">-</td>
                                <td align="right" colspan="1" rowspan="1">00:00:51</td>
                                <td align="right" colspan="1" rowspan="1">-</td>
                                <td colspan="1" rowspan="1"/>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">DEXSeq DTU</td>
                                <td align="right" colspan="1" rowspan="1">00:08:49</td>
                                <td align="right" colspan="1" rowspan="1">00:25:08</td>
                                <td align="right" colspan="1" rowspan="1">00:34:53</td>
                                <td align="right" colspan="1" rowspan="1">00:44:59</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">Total</td>
                                <td align="right" colspan="1" rowspan="1">00:25:21</td>
                                <td align="right" colspan="1" rowspan="1">00:38:24</td>
                                <td align="right" colspan="1" rowspan="1">04:11:06</td>
                                <td align="right" colspan="1" rowspan="1">01:29:40</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <p>We also considered peak RAM usage (shown in Supplementary Table 1
                    <sup>
                        <xref ref-type="bibr" rid="ref-21">21</xref>
                    </sup>), and alignment was found to use the most RAM. Overall, methods utilising pseudo alignment required significantly lower memory compared with traditional alignment. For the most RAM intensive dataset, the human simulation, exon counting required 29 GB compared to 10 GB for ECs and 5 GB for estimated transcript abundances.</p>
            </sec>
        </sec>
        <sec sec-type="discussion">
            <title>Discussion</title>
            <p>DTU detection has previously been approached by either testing for changes to the read counts across exons or changes in the relative abundance of transcripts. These approaches are intuitive but are not necessarily optimal for short read data analysis. In particular, individual exons are not necessarily the optimal unit of isoform quantification as there are often many more exons than transcripts. In addition, transcript quantification can be difficult because read assignment is ambiguous. Fortunately, transcript quantification methods generate equivalence class counts as a forestep to estimating abundances. We propose that equivalence classes are the optimal unit for performing count based differential testing. Equivalence class counts benefit from the advantages of both exon and transcript counts: they can be generated quickly through pseudo-alignment, there are fewer expressed than exons, and they retain the low variance between replicates seen in exon counts compared to transcripts abundances.</p>
            <p>Here we evaluated the use of equivalence classes as the counting unit for differential transcript usage. We used two simulated datasets from drosophila and human and one biological dataset from mouse. Our results suggest that equivalence class counts provide equal or better accuracy in DTU detection compared to exon counts or estimated transcript abundances. We also found the analysis was quick to run and we provide code to convert pseudo alignments into gene level EC annotations.</p>
            <p>The ECs used in our evaluation are defined using only the set of transcripts for which reads are compatible. Extensions to this model have been proposed that incorporate read-level information, such as fragment length, to more accurately calculate the probability of a read arising from a given transcript
                <sup>
                    <xref ref-type="bibr" rid="ref-13">13</xref>
                </sup>. Although, we do not consider probability-based equivalence classes in this work, incorporating this information for DTU deserves exploration in future work. In addition, EC counts may be calculated from full read alignment rather than pseudo-alignment
                <sup>
                    <xref ref-type="bibr" rid="ref-14">14</xref>,
                    <xref ref-type="bibr" rid="ref-15">15</xref>
                </sup>, which has the potential to improve accuracy further. In this work, we limited our investigation to comparing the best counting metric preceding DTU statistical testing, using DEXSeq as a representative method. Evaluation of statistical testing methods for DTU is outside the scope of this manuscript and would require further work.</p>
            <p>One limitation of using equivalence classes is in the interpretation of the results. Although we can detect DTU at the gene-level, it is not simple to determine which isoforms have changed abundance without further work. We propose that superTranscripts
                <sup>
                    <xref ref-type="bibr" rid="ref-16">16</xref>
                </sup>, which are a method for visualising the transcriptome, could be used for interpretation. Alternatively, transcript abundances, which are generated together with ECs, can still be used to provide insight into the isoform switching.</p>
            <p>Finally, in this work, we have focused on differential transcript usage, but EC counts have the potential to be useful in a range of other expression analysis. EC counts have already been applied to areas such as clustering and dimensionality reduction
                <sup>
                    <xref ref-type="bibr" rid="ref-7">7</xref>
                </sup>, gene-level differential expression
                <sup>
                    <xref ref-type="bibr" rid="ref-9">9</xref>
                </sup>, single-cell transcriptomics
                <sup>
                    <xref ref-type="bibr" rid="ref-7">7</xref>,
                    <xref ref-type="bibr" rid="ref-8">8</xref>
                </sup> and fusion detection
                <sup>
                    <xref ref-type="bibr" rid="ref-20">20</xref>
                </sup>. We foresee that equivalence classes could serve as a base unit of measurement in many other types of analyses.</p>
        </sec>
        <sec sec-type="methods">
            <title>Methods</title>
            <p>We detected sequence content bias in the Bottomly RNA-seq data using 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/s-andrews/FastQC">FastQC</ext-link> v0.11.4, and therefore performed trimming using 
                <ext-link ext-link-type="uri" xlink:href="http://www.usadellab.org/cms/?page=trimmomatic">Trimmomatic</ext-link>
                <sup>
                    <xref ref-type="bibr" rid="ref-14">14</xref>
                </sup> 0.35, using recommended parameters. The simulated Soneson data was not trimmed.</p>
            <p>To obtain transcript abundance counts, 
                <ext-link ext-link-type="uri" xlink:href="https://combine-lab.github.io/salmon/">Salmon</ext-link>
                <sup>
                    <xref ref-type="bibr" rid="ref-5">5</xref>
                </sup> v0.13.0 (development version) was run on the drosophila, human and Bottomly datasets in quant mode using default parameters. To obtain EC counts, the 
                <italic toggle="yes">--dumpEq</italic> argument was used, as well as the 
                <italic toggle="yes">--skipQuant</italic> to skip the quantification step. 
                <ext-link ext-link-type="uri" xlink:href="https://pachterlab.github.io/kallisto/">Kallisto</ext-link>
                <sup>
                    <xref ref-type="bibr" rid="ref-4">4</xref>
                </sup> 0.43.0 was run in 
                <italic toggle="yes">pseudo</italic> mode with the 
                <italic toggle="yes">--batch</italic> argument to run all samples simultaneously. Fragment length and standard deviation were estimated from all reads of a single sample from the Bottomly data (
                <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/?term=SRR099223">SRR099223</ext-link>). Equivalence classes were then matched between samples and compiled into a matrix using the python scripts (create_salmon_ec_count_matrix.py and create_kallisto_ec_count_matrix.py), available on 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/Oshlack/ec-dtu-paper">GitHub</ext-link> and archived on 
                <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.2561549">Zenodo</ext-link>
                <sup>
                    <xref ref-type="bibr" rid="ref-22">22</xref>
                </sup>. Equivalence classes mapping to more than a single gene were removed. No other filtering was performed on any of the data types.</p>
            <p>To perform the exon-based counts, raw reads were first aligned using 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/alexdobin/STAR">STAR</ext-link>
                <sup>
                    <xref ref-type="bibr" rid="ref-15">15</xref>
                </sup> v2.5.2a, then the DEXSeq-count annotation was prepared excluding overlapping exon-parts, from different genes, on the same strand (--aggregate=&#x2019;no&#x2019;). DEXSeq-count was then run using default parameters. The same genome and transcriptome references for drosophila and human were used as in Soneson 
                <italic toggle="yes">et al.</italic>
                <sup>
                    <xref ref-type="bibr" rid="ref-3">3</xref>
                </sup>, with only protein-coding transcripts considered for the Salmon index. For the Bottomly data, we used the NCBIM37 mm9 mouse genome and Ensembl release 67 transcriptome. Non-protein-coding transcripts were filtered out, as with the Soneson transcriptome reference. 
                <ext-link ext-link-type="uri" xlink:href="https://bioconductor.org/packages/release/bioc/html/DEXSeq.html">DEXSeq</ext-link> v1.26 was used to run all DTU analyses.</p>
            <p>An earlier version of this article can be found on bioRxiv (DOI: 
                <ext-link ext-link-type="uri" xlink:href="https://dx.doi.org/10.1101/501106">https://doi.org/10.1101/501106</ext-link>).</p>
        </sec>
        <sec>
            <title>Data availability</title>
            <sec>
                <title>Underlying data</title>
                <p>The Soneson 
                    <italic toggle="yes">et al.</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref-3">3</xref>
                    </sup> drosophila and human simulation data was obtained from ArrayExpress repository, accession number 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-3766/">E-MTAB-3766</ext-link>.</p>
                <p>Truth data was obtained from 
                    <ext-link ext-link-type="uri" xlink:href="http://imlspenticton.uzh.ch/robinson_lab/splicing_comparison/">http://imlspenticton.uzh.ch/robinson_lab/splicing_comparison/</ext-link>, files 
                    <ext-link ext-link-type="uri" xlink:href="http://imlspenticton.uzh.ch/robinson_lab/splicing_comparison/supplementary_data_ms/diff_splicing_comparison_drosophila.zip">diff_splicing_comparison_drosophila.zip</ext-link> and 
                    <ext-link ext-link-type="uri" xlink:href="http://imlspenticton.uzh.ch/robinson_lab/splicing_comparison/supplementary_data_ms/diff_splicing_comparison_human.zip">diff_splicing_comparison_human.zip</ext-link>.</p>
                <p>The Bottomly 
                    <italic toggle="yes">et al.</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref-8">8</xref>
                    </sup> dataset was obtained from the NCBI Sequence Read Archive, accession number 
                    <ext-link ext-link-type="uri" xlink:href="https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP004777">SRP004777</ext-link>.</p>
            </sec>
            <sec>
                <title>Extended data</title>
                <p>Zenodo: Supplementary Material for "Fast and accurate differential transcript usage by testing equivalence class counts". 
                    <ext-link ext-link-type="uri" xlink:href="https://dx.doi.org/10.5281/zenodo.2561546">https://doi.org/10.5281/zenodo.2561546</ext-link>
                    <sup>
                        <xref ref-type="bibr" rid="ref-21">21</xref>
                    </sup>. The following extended data are available:</p>
                <list list-type="bullet">
                    <list-item>
                        <p>Supplementary Figure 1: Shows the dispersion versus mean normalised counts for all features across the three data sets, generated using DEXSeq&#x2019;s &#x2018;plotDispEsts&#x2019; function. As described in Love 
                            <italic toggle="yes">et al</italic>., the red line shows the fitted dispersion-mean trend, the blue dots indicate the shrunken dispersion estimates, and the blue circles indicate outliers not shrunk towards the prior.</p>
                    </list-item>
                    <list-item>
                        <p>Supplementary Figure 2: Shows the significant genes (FDR &lt; 0.05) shared between the methods, obtained from DEXSeq run on the full Bottomly 
                            <italic toggle="yes">et al</italic>. data set for each feature.</p>
                    </list-item>
                    <list-item>
                        <p>Supplementary Figure 3: Shows the ability of the three methods to recreate the results of a full comparison (10 vs. 11) of the Bottomly 
                            <italic toggle="yes">et al</italic>. data using random subsets of 3 vs. 3 samples across 20 iterations. The lines between the plots join data points from the same iteration. Each row uses a different &#x2018;truth&#x2019; set: union is the set of genes called significant by any method, intersect is the set of genes called significant by all methods, and individual is the set of genes called significant by that method only.</p>
                    </list-item>
                    <list-item>
                        <p>Supplementary Figure 4: The number of false positives versus each gene&#x2019;s rank (by FDR) for one iteration (3 vs. 3) of the Bottomly subset tests for the top 500 genes. The union of significant genes across all methods was used as the truth set.</p>
                    </list-item>
                    <list-item>
                        <p>Supplementary Figure 5: Kallisto versus Salmon&#x2019;s performance on the Bottomly subset testing experiments, using each method&#x2019;s significant genes from the full (10 vs. 11) run as the truth set for calculating both metrics.</p>
                    </list-item>
                    <list-item>
                        <p>Supplementary Table 1: Maximum RAM usage for each job in GB. Each task was run as specified in the compute times table in the main paper (
                            <xref ref-type="table" rid="T1">Table 1</xref>).</p>
                    </list-item>
                </list>
                <p>Extended data are available under the terms of the 
                    <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/legalcode">Creative Commons Attribution 4.0 International license</ext-link> (CC-BY 4.0).</p>
            </sec>
        </sec>
        <sec>
            <title>Software availability</title>
            <p>
                <bold>Pipeline used to reproduce the quantification data generated in this paper:</bold>
                <ext-link ext-link-type="uri" xlink:href="https://github.com/Oshlack/ec-dtu-pipe">https://github.com/Oshlack/ec-dtu-pipe</ext-link>.</p>
            <p>
                <bold>Archived source code at time of publication:</bold>
                <ext-link ext-link-type="uri" xlink:href="https://dx.doi.org/10.5281/zenodo.2567596">https://doi.org/10.5281/zenodo.2567596</ext-link>
                <sup>
                    <xref ref-type="bibr" rid="ref-23">23</xref>
                </sup>.</p>
            <p>
                <bold>Source code to run the analyses and generate the paper figures:</bold>
                <ext-link ext-link-type="uri" xlink:href="https://github.com/Oshlack/ec-dtu-paper">https://github.com/Oshlack/ec-dtu-paper</ext-link>.</p>
            <p>
                <bold>Archived source code at time of publication:</bold>
                <ext-link ext-link-type="uri" xlink:href="https://dx.doi.org/10.5281/zenodo.2561549">https://doi.org/10.5281/zenodo.2561549</ext-link>
                <sup>
                    <xref ref-type="bibr" rid="ref-22">22</xref>
                </sup>.</p>
            <p>
                <bold>License:</bold> 
                <ext-link ext-link-type="uri" xlink:href="https://opensource.org/licenses/MIT">MIT license</ext-link>.</p>
        </sec>
    </body>
    <back>
        <ack>
            <title>Acknowledgments</title>
            <p>We would like to thank Rob Patro for discussions on using equivalence classes in salmon, for providing us with a version to bypass transcript quantification and feedback on our manuscript. We would also like to acknowledge members of the twitter community who provided constructive feedback on the first version of this manuscript.</p>
        </ack>
        <ref-list>
            <ref id="ref-1">
                <label>1</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Gonz&#x00e0;lez-Porta</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Frankish</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Rung</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Transcriptome analysis of human tissues and cell lines reveals one dominant transcript per gene.</article-title>
                    <source>

                        <italic toggle="yes">Genome Biol.</italic>
</source>
                    <year>2013</year>;<volume>14</volume>(<issue>7</issue>):<fpage>R70</fpage>.
                    <pub-id pub-id-type="pmid">23815980</pub-id>
                    <pub-id pub-id-type="doi">10.1186/gb-2013-14-7-r70</pub-id>
                    <pub-id pub-id-type="pmcid">4053754</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-2">
                <label>2</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Anders</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Reyes</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Huber</surname>
                            <given-names>W</given-names>
                        </name>
</person-group>:
                    <article-title>Detecting differential usage of exons from RNA-seq data.</article-title>
                    <source>

                        <italic toggle="yes">Genome Res.</italic>
</source>
                    <year>2012</year>;<volume>22</volume>(<issue>10</issue>):<fpage>2008</fpage>&#x2013;<lpage>2017</lpage>.
                    <pub-id pub-id-type="pmid">22722343</pub-id>
                    <pub-id pub-id-type="doi">10.1101/gr.133744.111</pub-id>
                    <pub-id pub-id-type="pmcid">3460195</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-3">
                <label>3</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Soneson</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Matthes</surname>
                            <given-names>KL</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Nowicka</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Isoform prefiltering improves performance of count-based methods for analysis of differential transcript usage.</article-title>
                    <source>

                        <italic toggle="yes">Genome Biol.</italic>
</source>
                    <year>2016</year>;<volume>17</volume>(<issue>1</issue>):<fpage>12</fpage>.
                    <pub-id pub-id-type="pmid">26813113</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13059-015-0862-3</pub-id>
                    <pub-id pub-id-type="pmcid">4729156</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-4">
                <label>4</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Bray</surname>
                            <given-names>NL</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Pimentel</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Melsted</surname>
                            <given-names>P</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Near-optimal probabilistic RNA-seq quantification.</article-title>
                    <source>

                        <italic toggle="yes">Nat Biotechnol.</italic>
</source>
                    <year>2016</year>;<volume>34</volume>(<issue>5</issue>):<fpage>525</fpage>&#x2013;<lpage>527</lpage>.
                    <pub-id pub-id-type="pmid">27043002</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nbt.3519</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-5">
                <label>5</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Patro</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Duggal</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Love</surname>
                            <given-names>MI</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>
                        <italic toggle="yes">Salmon</italic> provides fast and bias-aware quantification of transcript expression.</article-title>
                    <source>

                        <italic toggle="yes">Nat Methods.</italic>
</source>
                    <year>2017</year>;<volume>14</volume>(<issue>4</issue>):<fpage>417</fpage>&#x2013;<lpage>419</lpage>.
                    <pub-id pub-id-type="pmid">28263959</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nmeth.4197</pub-id>
                    <pub-id pub-id-type="pmcid">5600148</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-6">
                <label>6</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Patro</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mount</surname>
                            <given-names>SM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kingsford</surname>
                            <given-names>C</given-names>
                        </name>
</person-group>:
                    <article-title>Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms.</article-title>
                    <source>

                        <italic toggle="yes">Nat Biotechnol.</italic>
</source>
                    <year>2014</year>;<volume>32</volume>(<issue>5</issue>):<fpage>462</fpage>&#x2013;<lpage>464</lpage>.
                    <pub-id pub-id-type="pmid">24752080</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nbt.2862</pub-id>
                    <pub-id pub-id-type="pmcid">4077321</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-7">
                <label>7</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ntranos</surname>
                            <given-names>V</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kamath</surname>
                            <given-names>GM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zhang</surname>
                            <given-names>JM</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Fast and accurate single-cell RNA-seq analysis by clustering of transcript-compatibility counts.</article-title>
                    <source>

                        <italic toggle="yes">Genome Biol.</italic>
</source>
                    <year>2016</year>;<volume>17</volume>(<issue>1</issue>):<fpage>112</fpage>.
                    <pub-id pub-id-type="pmid">27230763</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13059-016-0970-8</pub-id>
                    <pub-id pub-id-type="pmcid">4881296</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-8">
                <label>8</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ntranos</surname>
                            <given-names>V</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Yi</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Melsted</surname>
                            <given-names>P</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A discriminative learning approach to differential expression analysis for single-cell RNA-seq.</article-title>
                    <source>

                        <italic toggle="yes">Nat Methods.</italic>
</source>
                    <year>2019</year>;<volume>16</volume>(<issue>2</issue>):<fpage>163</fpage>&#x2013;<lpage>166</lpage>.
                    <pub-id pub-id-type="pmid">30664774</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41592-018-0303-9</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-9">
                <label>9</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Yi</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Pimentel</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Bray</surname>
                            <given-names>NL</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Gene-level differential analysis at transcript-level resolution.</article-title>
                    <source>

                        <italic toggle="yes">Genome Biol.</italic>
</source>
                    <year>2018</year>;<volume>19</volume>(<issue>1</issue>):<fpage>53</fpage>.
                    <pub-id pub-id-type="pmid">29650040</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13059-018-1419-z</pub-id>
                    <pub-id pub-id-type="pmcid">5896116</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-10">
                <label>10</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Katz</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>ET</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Airoldi</surname>
                            <given-names>EM</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Analysis and design of RNA sequencing experiments for identifying isoform regulation.</article-title>
                    <source>

                        <italic toggle="yes">Nat Methods.</italic>
</source>
                    <year>2010</year>;<volume>7</volume>(<issue>12</issue>):<fpage>1009</fpage>&#x2013;<lpage>15</lpage>.
                    <pub-id pub-id-type="pmid">21057496</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nmeth.1528</pub-id>
                    <pub-id pub-id-type="pmcid">3037023</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-11">
                <label>11</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Bottomly</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Walter</surname>
                            <given-names>NA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hunter</surname>
                            <given-names>JE</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Evaluating gene expression in C57BL/6J and DBA/2J mouse striatum using RNA-Seq and microarrays.</article-title>
                    <source>

                        <italic toggle="yes">PLoS One.</italic>
</source>
                    <year>2011</year>;<volume>6</volume>(<issue>3</issue>):<fpage>e17820</fpage>.
                    <pub-id pub-id-type="pmid">21455293</pub-id>
                    <pub-id pub-id-type="doi">10.1371/journal.pone.0017820</pub-id>
                    <pub-id pub-id-type="pmcid">3063777</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-12">
                <label>12</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Pimentel</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Bray</surname>
                            <given-names>NL</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Puente</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Differential analysis of RNA-seq incorporating quantification uncertainty.</article-title>
                    <source>

                        <italic toggle="yes">Nat Methods.</italic>
</source>
                    <year>2017</year>;<volume>14</volume>(<issue>7</issue>):<fpage>687</fpage>&#x2013;<lpage>690</lpage>.
                    <pub-id pub-id-type="pmid">28581496</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nmeth.4324</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-13">
                <label>13</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Zakeri</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Srivastava</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Almodaresi</surname>
                            <given-names>F</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Improved data-driven likelihood factorizations for transcript abundance estimation.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2017</year>;<volume>33</volume>(<issue>14</issue>):<fpage>i142</fpage>&#x2013;<lpage>i151</lpage>.
                    <pub-id pub-id-type="pmid">28881996</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btx262</pub-id>
                    <pub-id pub-id-type="pmcid">5870700</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-14">
                <label>14</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Davidson</surname>
                            <given-names>NM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Oshlack</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>Corset: enabling differential gene expression analysis for 
                        <italic toggle="yes">de novo</italic> assembled transcriptomes.</article-title>
                    <source>

                        <italic toggle="yes">Genome Biol.</italic>
</source>
                    <year>2014</year>;<volume>15</volume>(<issue>7</issue>):<fpage>410</fpage>.
                    <pub-id pub-id-type="pmid">25063469</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13059-014-0410-6</pub-id>
                    <pub-id pub-id-type="pmcid">4165373</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-15">
                <label>15</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Yi</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Liu</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Melsted</surname>
                            <given-names>P</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A direct comparison of genome alignment and transcriptome pseudoalignment.</article-title>
                    <source>

                        <italic toggle="yes">BioRxiv.</italic>
</source>
                    <year>2018</year>.
                    <pub-id pub-id-type="doi">10.1101/444620</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-16">
                <label>16</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Davidson</surname>
                            <given-names>NM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hawkins</surname>
                            <given-names>ADK</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Oshlack</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>SuperTranscripts: a data driven reference for analysis and visualisation of transcriptomes.</article-title>
                    <source>

                        <italic toggle="yes">Genome Biol.</italic>
</source>
                    <year>2017</year>;<volume>18</volume>(<issue>1</issue>):<fpage>148</fpage>.
                    <pub-id pub-id-type="pmid">28778180</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13059-017-1284-1</pub-id>
                    <pub-id pub-id-type="pmcid">5543425</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-17">
                <label>17</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Garber</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Grabherr</surname>
                            <given-names>MG</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Guttman</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Computational methods for transcriptome annotation and quantification using RNA-seq.</article-title>
                    <source>

                        <italic toggle="yes">Nat Methods.</italic>
</source>
                    <year>2011</year>;<volume>8</volume>(<issue>6</issue>):<fpage>469</fpage>&#x2013;<lpage>477</lpage>.
                    <pub-id pub-id-type="pmid">21623353</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nmeth.1613</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-18">
                <label>18</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Dobin</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Davis</surname>
                            <given-names>CA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Schlesinger</surname>
                            <given-names>F</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>STAR: ultrafast universal RNA-seq aligner.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2013</year>;<volume>29</volume>(<issue>1</issue>):<fpage>15</fpage>&#x2013;<lpage>21</lpage>.
                    <pub-id pub-id-type="pmid">23104886</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/bts635</pub-id>
                    <pub-id pub-id-type="pmcid">3530905</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-19">
                <label>19</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Melsted</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ntranos</surname>
                            <given-names>V</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Pachter</surname>
                            <given-names>L</given-names>
                        </name>
</person-group>:
                    <article-title>The Barcode, UMI, Set format and BUStools.</article-title>
                    <source>

                        <italic toggle="yes">BioaRxiv.</italic>
</source>
                    <year>2018</year>.
                    <pub-id pub-id-type="doi">10.1101/472571</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-20">
                <label>20</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Vu</surname>
                            <given-names>TN</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Deng</surname>
                            <given-names>W</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Trac</surname>
                            <given-names>QT</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A fast detection of fusion genes from paired-end RNA-seq data.</article-title>
                    <source>

                        <italic toggle="yes">BMC Genomics.</italic>
</source>
                    <year>2018</year>;<volume>19</volume>(<issue>1</issue>):<fpage>786</fpage>.
                    <pub-id pub-id-type="pmid">30382840</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s12864-018-5156-1</pub-id>
                    <pub-id pub-id-type="pmcid">6211471</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-21">
                <label>21</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Cmero</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Davidson</surname>
                            <given-names>N</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Oshlack</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>Supplementary Material for "Fast and accurate differential transcript usage by testing equivalence class counts" (Version v1.0.0).</article-title>
                    <source>

                        <italic toggle="yes">Zenodo.</italic>
</source>
                    <year>2019</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://www.doi.org/10.5281/zenodo.2561546">http://www.doi.org/10.5281/zenodo.2561546</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-22">
                <label>22</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Cmero</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <article-title>Oshlack/ec-dtu-paper: f1000 submission (Version v1.0.0).</article-title>
                    <source>

                        <italic toggle="yes">Zenodo.</italic>
</source>
                    <year>2019</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://www.doi.org/10.5281/zenodo.2561550">http://www.doi.org/10.5281/zenodo.2561550</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-23">
                <label>23</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Cmero</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <article-title>Oshlack/ec-dtu-pipe: f1000 submission (Version v0.1.0).</article-title>
                    <source>

                        <italic toggle="yes">Zenodo.</italic>
</source>
                    <year>2019</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://www.doi.org/10.5281/zenodo.2567597">http://www.doi.org/10.5281/zenodo.2567597</ext-link>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report45467">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.19992.r45467</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Collado-Torres</surname>
                        <given-names>Leonardo</given-names>
                    </name>
                    <xref ref-type="aff" rid="r45467a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-2140-308X</uri>
                </contrib>
                <aff id="r45467a1">
                    <label>1</label>Lieber Institute for Brain Development, Baltimore, MD, USA</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>12</day>
                <month>4</month>
                <year>2019</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2019 Collado-Torres L</copyright-statement>
                <copyright-year>2019</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport45467" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.18276.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>In this manuscript the authors Marek Cmero, Nadia M. Davidson and Alicia Oshlack describe in detail their proposed approach for identifying genes with differential transcript usage (DTU, particularly isoform switching) using equivalence classes obtained through pseudo-alignment methods such as Salmon and Kallisto. By doing so, the authors leverage the computational advantages of pseudo-alignment methods, particularly speed and RAM requirements, together with statistical methods initially developed for differential exon usage (mainly DEXSeq) to identify genes with DTU events at a comparable (or even lower) error rates than exon based analyses which are more precise than transcript-level analyses. That is, their proposed method is fast, has low computational requirements (measured by RAM usage), and has error rates comparable if not better than state of the art alternatives. If time and computational resources are not limiting factors, the method the authors propose still gains an advantage over exon based methods by taking advantage of the nature of the human and mouse transcriptomes where genes can have more exons than transcripts, thus leading to power gains by their method. However, as presented their method also relies on a correct annotation of the transcriptome since un-annotated isoforms that involve new exons or new exon boundaries could potentially affect the results.</p>
            <p> </p>
            <p> Nevertheless, I think that it should be possible to apply their method in combination with others in order to minimize this issue. Overall the authors of this manuscript did an excellent job explaining their new method, comparing against earlier work, and explaining the different implications of their work. I look forward to their future software for applying this method as 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/Oshlack/ec-dtu-paper">https://github.com/Oshlack/ec-dtu-paper</ext-link> has all the foundations for making an R/Bioconductor package.</p>
            <p> </p>
            <p> 
                <bold>Minor points</bold> 
                <list list-type="bullet">
                    <list-item>
                        <p>Figures 2 and 3 are missing labels for each sub-panel. For example, the legend for Figure 2 talks about (a) and (b) and while one can assume that the top panel is (a), it's best to be explicit about this type of information.</p>
                    </list-item>
                    <list-item>
                        <p>Figure 2 top panel. Maybe show the data points in case the boxplots mask some information about the distribution. See 
                            <ext-link ext-link-type="uri" xlink:href="http://simplystatistics.org/2019/02/21/dynamite-plots-must-die/">here</ext-link>&#x00a0;for some code by Rafael Irizarry or 
                            <ext-link ext-link-type="uri" xlink:href="https://github.com/LieberInstitute/brainseq_phase2/search?q=%22outlier.shape%22&amp;unscoped_q=%22outlier.shape%22">here</ext-link>&#x00a0;for longer code examples that I wrote. If it looks like a bell-shaped distribution, then I think that it could be okay to simply mention that in the text (in the case that the figure has many points and you prefer not to include it). From Figure 3 bottom panel, I can see that you already plotted the points in that case.</p>
                    </list-item>
                    <list-item>
                        <p>Figure 2 bottom panel. 
                            <ext-link ext-link-type="uri" xlink:href="https://simplystatistics.org/2019/02/21/dynamite-plots-must-die/">This</ext-link>&#x00a0;also has some code for showing density plots with little bars in the bottom for the observed points.</p>
                    </list-item>
                    <list-item>
                        <p>Page 5, bottom left. "Count variability of ECs was on average closer to the exon count variability distribution than ECs." is incorrect. I believe that it should read "Count variability of ECs was on average closer to the exon count variability distribution than transcripts".</p>
                    </list-item>
                    <list-item>
                        <p>Figure 3, top panel. I can't distinguish the colors between `featurecounts_flat` and `salmon`.&#x00a0;</p>
                    </list-item>
                    <list-item>
                        <p>Figure 3, top panel. I don't know what the dotted lines represent: maybe FDR 0.01, 0.05 and 0.1?</p>
                    </list-item>
                    <list-item>
                        <p>Figure 3, top panel. You might want to consider annotating with text the highest TPR point for each dataset which is quoted in the text in page 5 right side.</p>
                    </list-item>
                    <list-item>
                        <p>I appreciate that Figure 3 (top panel) shows the full range, but maybe it would be useful to have a zoomed-in version in the supplementary material in order to see the differences more clearly. Maybe have a ylim from 0.5 to 0.8, and an xlim from 0 to 0.6 (or something like that).</p>
                    </list-item>
                    <list-item>
                        <p>Page 6, bottom left. "ECs called the highest number of genes with significant DTU (1485 genes, in contrast to the 748 and 391 genes called significant by the transcript and exon-based methods respectively)." That sentence is incorrect based on Supplementary Figure 2. The numbers for genes with significant DTU match for the transcript and exon based methods, but they don't for the EC based method since 228 + 204 + 147 + 96 = 675. This numerical change affects the conclusions drawn from Supplementary Figure 2.&#x00a0;</p>
                    </list-item>
                    <list-item>
                        <p>Page 6, bottom right. Were all samples from the full bottomly dataset used in any of the 20 iterations? Or were there some samples that were used in many of the replications? With 20 iterations I guess that there's a small chance that some samples were under-represented or over-represented in the iterations.</p>
                    </list-item>
                    <list-item>
                        <p>Figure 3, bottom panel. I really liked the lines you show in Supplementary Figure 3 to identify the different comparable replicates. The lines helped me visualize that the ranks were consistent across replicates since the lines rarely intersect each other. I suggest mentioning those lines in the caption for Figure 3 where you refer to Supplementary Figure 3, or maybe even swapping the panels from Figure 3 (bottom) for the equivalent ones from Supplementary Figure 3 (no need to change Figure S3 in that case, that is, it's okay to repeat the panels).</p>
                    </list-item>
                    <list-item>
                        <p>Page 7, right side. Typo "psuedo-" instead of "pseudo-".</p>
                    </list-item>
                    <list-item>
                        <p>Page 8, left side. "We also found the analysis was quick to run and we provide code to convert [...]". I highly recommend including the URL here for the code or mention in a parenthesis in which section of the paper can one find the link to the code.</p>
                    </list-item>
                    <list-item>
                        <p>Page 8, right side. I recommend also citing the Bottomly et al&#x00a0;paper when you mention that the data was downloaded from SRR099223. You already cite the paper in other parts of your manuscript.</p>
                    </list-item>
                    <list-item>
                        <p>From the link to the bioRxiv pre-print I was able to find tweets citing the pre-print and have to agree with this&#x00a0;
                            <ext-link ext-link-type="uri" xlink:href="http://twitter.com/hipsterelectron/statuses/1075602782600622080">tweet</ext-link>&#x00a0;saying "this type of stuff is what the field needs".</p>
                    </list-item>
                    <list-item>
                        <p>
                            <ext-link ext-link-type="uri" xlink:href="https://github.com/Oshlack/ec-dtu-paper">Here</ext-link>,&#x00a0;I didn't find the actual versions of the packages used. I suggest including the output of "options(width = 120); sessioninfo::session_info()" somewhere in that repository.</p>
                    </list-item>
                    <list-item>
                        <p>I think that you don't need to call gc() manually in your function calls 
                            <ext-link ext-link-type="uri" xlink:href="https://github.com/Oshlack/ec-dtu-pipe/blob/master/R/dtu.R#L19">here</ext-link>. Normally R takes care of it.</p>
                    </list-item>
                    <list-item>
                        <p>Since I see 
                            <ext-link ext-link-type="uri" xlink:href="http://github.com/Oshlack/ec-dtu-pipe/blob/master/R/dtu.R#L1">here</ext-link>&#x00a0;that 8 cores were used for your method, I'm curious now looking at Supplementary Table 1 if the RAM presented there is by thread (core) or by process, and if so, how many cores were used for the other steps.</p>
                    </list-item>
                </list>
            </p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Yes</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>Yes</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Yes</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Yes</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>RNA-seq, Bioinformatics</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <back>
            <ref-list>
                <title>References</title>
                <ref id="rep-ref-45467-1">
                    <label>1</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Salmon provides fast and bias-aware quantification of transcript expression.</article-title>
                        <source>
                            <italic>Nat Methods</italic>
                        </source>.<year>2017</year>;<volume>14</volume>(<issue>4</issue>) :
                        <elocation-id>10.1038/nmeth.4197</elocation-id>
                        <fpage>417</fpage>-<lpage>419</lpage>
                        <pub-id pub-id-type="pmid">28263959</pub-id>
                        <pub-id pub-id-type="doi">10.1038/nmeth.4197</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-45467-2">
                    <label>2</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Near-optimal probabilistic RNA-seq quantification.</article-title>
                        <source>
                            <italic>Nat Biotechnol</italic>
                        </source>.<volume>34</volume>(<issue>5</issue>) :
                        <elocation-id>10.1038/nbt.3519</elocation-id>
                        <fpage>525</fpage>-<lpage>7</lpage>
                        <pub-id pub-id-type="pmid">27043002</pub-id>
                        <pub-id pub-id-type="doi">10.1038/nbt.3519</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-45467-3">
                    <label>3</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Detecting differential usage of exons from RNA-seq data.</article-title>
                        <source>
                            <italic>Genome Res</italic>
                        </source>.<year>2012</year>;<volume>22</volume>(<issue>10</issue>) :
                        <elocation-id>10.1101/gr.133744.111</elocation-id>
                        <fpage>2008</fpage>-<lpage>17</lpage>
                        <pub-id pub-id-type="pmid">22722343</pub-id>
                        <pub-id pub-id-type="doi">10.1101/gr.133744.111</pub-id>
                    </mixed-citation>
                </ref>
            </ref-list>
        </back>
        <sub-article article-type="response" id="comment4584-45467">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Oshlack</surname>
                            <given-names>Alicia</given-names>
                        </name>
                        <aff>Peter MacCallum Cancer Centre, Australia</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>19</day>
                    <month>4</month>
                    <year>2019</year>
                </pub-date>
            </front-stub>
            <body>
                <p>Thank you for taking the time to review our paper and for the helpful suggestions.</p>
                <p>Minor points: 
                    <list list-type="bullet">
                        <list-item>
                            <p>We have fixed the issues with Figures 2 and 3, and have added a description of the dotted lines in Figure 3a.</p>
                        </list-item>
                        <list-item>
                            <p>We now use more distinguishable colour-palettes in all cases where colours identify data points.</p>
                        </list-item>
                        <list-item>
                            <p>Supplementary Figure 6 has been added (showing filtered vs. unfiltered results), and uses zoomed-in axes. This plot contains the original results for ECC, exon and transcript counts, which should now be easier to distinguish.</p>
                        </list-item>
                        <list-item>
                            <p>We agree with the reviewer that boxplots can mask information. However, due to the discrete nature of the data, combined with the log-scale, results in a stepwise artefact. Please see Soneson 
                                <italic>et al.</italic>[1] Supplementary Figure 3 for an example. We have therefore opted to retain boxplots. We note that the source code for generating these plots is available in the ec-dtu-paper github repository should readers want to inspect the raw data.</p>
                        </list-item>
                        <list-item>
                            <p>We also like the suggestion of the density ridges, however, due to the high number of data points (&gt;1 million), this did not add any additional information to the visualisation.</p>
                        </list-item>
                        <list-item>
                            <p>Instead of annotating TPR points for clarity on the plots in Figure 3a, we have added Supplementary Figure 6, which contains the same data points for ECs, transcripts and exons, as well as their respective performance using filtered features.</p>
                        </list-item>
                        <list-item>
                            <p>We inspected the random samples selected for the Bottomly analyses (we have provided random seeds in the R markdown notebook) and noted that all samples were used at least one time. We have added code to the paper R markdown notebook to show sample usage across iterations. As is apparent in Supplementary Figure 3, the usage of particular samples is less important relative to the performance ranks observed of the method types across the iterations.</p>
                        </list-item>
                        <list-item>
                            <p>The number of significant genes found, reflected in Supplementary Figure 2, has been corrected in the main text.</p>
                        </list-item>
                        <list-item>
                            <p>We now mention the lines between replications in Figure 3&#x2019;s caption.</p>
                        </list-item>
                        <list-item>
                            <p>We have added all suggested links, references and fixed the typos pointed out in the paper.</p>
                        </list-item>
                        <list-item>
                            <p>Session info has been added to the main paper, and we have removed the gc() statements from the code.</p>
                        </list-item>
                        <list-item>
                            <p>Supplementary Table 1 lists RAM by process; this has been clarified in the caption.</p>
                        </list-item>
                    </list> &#x00a0;</p>
                <p>References:</p>
                <p>[1] Soneson, C., Matthes, K. L., Nowicka, M., Law, C. W., &amp; Robinson, M. D. (2016). Isoform prefiltering improves performance of count-based methods for analysis of differential transcript usage. 
                    <italic>Genome Biology</italic>, 
                    <italic>17</italic>(1), 1&#x2013;15. https://doi.org/10.1186/s13059-015-0862-3</p>
            </body>
        </sub-article>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report45466">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.19992.r45466</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Reyes</surname>
                        <given-names>Alejandro</given-names>
                    </name>
                    <xref ref-type="aff" rid="r45466a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-8717-6612</uri>
                </contrib>
                <aff id="r45466a1">
                    <label>1</label>Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>25</day>
                <month>3</month>
                <year>2019</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2019 Reyes A</copyright-statement>
                <copyright-year>2019</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport45466" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.18276.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>Cmero, Davidson and Oshlack propose a novel approach to use RNA-seq data to test for differences in transcript usage between conditions. Instead of using exon-level or transcript-level counts, the authors propose using equivalence class counts (ECCs) resulting from pseudo-aligning/quasi-mapping to reference transcriptomes as input to existing methods to test for differences in exon usage. Using both simulated and real datasets, the authors show that using ECCs is comparable to using exon-level counts in terms of false discovery rates and true positive rates. They show that the ECC approach is computationally more efficient, although its results are more difficult to interpret. The analyses are reproducible and available through Github.</p>
            <p> The manuscript is well written and easy to follow. The whole idea is straightforward and very clever.</p>
            <p> Below are two suggestions for improving the implementation of the software: 
                <list list-type="order">
                    <list-item>
                        <p>Although some python scripts are available, they need better documentation and examples with toy datasets. From the code in the Github repository, it is not clear what steps one should follow to use the ECC approach for DTU. I would suggest writing a Bioconductor-like vignette that explains how to run kallisto/salmon with the parameters to get equivalence classes, how to use the python scripts to generate the equivalence class matrices, and how to transform these matrices into objects from the DEXSeq, DRIMseq and similar packages.</p>
                    </list-item>
                    <list-item>
                        <p>As the authors acknowledge, a strong limitation of the ECC approach is result interpretation, which could be improved by visualizing the ECC equivalence classes. The interpretation of the ECC approach would be much easier if the authors provide code to plot transcripts and ECC classes of a gene (as it is done in the cartoon of Figure 1) linked with the counts of each equivalence class for each sample.</p>
                    </list-item>
                </list> Minor points: 
                <list list-type="order">
                    <list-item>
                        <p>It would be helpful for the reader if the authors improved figure labels and figure legends. For example, in Figure 3a, rather than just referring to the paper by Soneson et al.
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-45466-1">1</xref>
                            </sup>, I would suggest to describe what each point represents, what each axis is and how the metrics shown were defined.</p>
                    </list-item>
                    <list-item>
                        <p>In the introduction, the authors say &#x201c;Typically, there will be more counting bins than transcripts, resulting in lower power to detect differences between samples.&#x201d; Could the authors either explain further this statement or cite a reference that explains it?</p>
                    </list-item>
                    <list-item>
                        <p>I understand the logic behind defining a &#x201c;truth set&#x201d; of genes with DTU in the analysis of the real data. However, the real number of true positives is likely larger and thus the resulting metric is not strictly a true positive rate. Perhaps it would be more accurate to call it differently (see for example, Norton et al.
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-45466-2">2</xref>
                            </sup>).</p>
                    </list-item>
                </list>
            </p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Yes</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>Yes</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Yes</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Yes</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>Bioinformatics</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <back>
            <ref-list>
                <title>References</title>
                <ref id="rep-ref-45466-1">
                    <label>1</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Isoform prefiltering improves performance of count-based methods for analysis of differential transcript usage.</article-title>
                        <source>
                            <italic>Genome Biol</italic>
                        </source>.<year>2016</year>;<volume>17</volume>:
                        <elocation-id>10.1186/s13059-015-0862-3</elocation-id>
                        <fpage>12</fpage>
                        <pub-id pub-id-type="pmid">26813113</pub-id>
                        <pub-id pub-id-type="doi">10.1186/s13059-015-0862-3</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-45466-2">
                    <label>2</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Outlier detection for improved differential splicing quantification from RNA-Seq experiments with replicates</article-title>.
                        <source>
                            <italic>Bioinformatics</italic>
                        </source>.<year>2018</year>;<volume>34</volume>(<issue>9</issue>) :
                        <elocation-id>10.1093/bioinformatics/btx790</elocation-id>
                        <fpage>1488</fpage>-<lpage>1497</lpage>
                        <pub-id pub-id-type="doi">10.1093/bioinformatics/btx790</pub-id>
                    </mixed-citation>
                </ref>
            </ref-list>
        </back>
        <sub-article article-type="response" id="comment4583-45466">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Oshlack</surname>
                            <given-names>Alicia</given-names>
                        </name>
                        <aff>Peter MacCallum Cancer Centre, Australia</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>19</day>
                    <month>4</month>
                    <year>2019</year>
                </pub-date>
            </front-stub>
            <body>
                <p>Thank you for taking the time to review our paper and for the helpful suggestions.</p>
                <p>Major comments: 
                    <list list-type="bullet">
                        <list-item>
                            <p>We have created a step-by-step Bioconductor-style vignette to allow users to easily reproduce ECC-based DTU testing with a toy data set. We include instructions for running each step manually, as well as an automated analysis using the ec-dtu-pipe pipeline we have developed. The vignette can be found 
                                <ext-link ext-link-type="uri" xlink:href="http://github.com/Oshlack/ec-dtu-paper/wiki/Vignette">here</ext-link>,&#x00a0;which we note in the paper.</p>
                        </list-item>
                        <list-item>
                            <p>Figure 1 in the original paper shows a highly simplified version of how ECs can be derived from a small set of transcripts and exons. In reality, genes have on average many more transcripts, exons and, consequently, equivalence classes. Furthermore, ECs may be disjoint (not connected by intervening sequence) or require a junction. As ECs are determined by kmers, creating a direct mapping between ECs and the genome is challenging. Given the complexity of ECs, even a clean mapping between EC and genome position may be difficult to interpret. Given these limitations, we have instead opted to include a simple visualisation option, similar to DEXSeq, plotting EC names and their relative log counts across conditions, per gene. Such a visualisation example can be found in Supplementary Figure 7 (note the large number of ECs present in this gene). The function to create these plots (plot_ec_usage) is found in the ec-dtu-paper repository (and is referenced in the vignette) will also print all significant ECs of the gene, and their associated transcripts. In the example, one of the significant ECs has a single associated transcript, making DTU inference relatively straight-forward.</p>
                        </list-item>
                    </list> Minor comments: 
                    <list list-type="bullet">
                        <list-item>
                            <p>We have added explanatory text in Figure 3a to explain the FDR/TPR plots and their respective FDR cutoffs. We have also added (a) and (b) labels for Figures 1-3.</p>
                        </list-item>
                        <list-item>
                            <p>We have further explained the sentence about how the number of counting bins affects power. We have also added Supplementary Table 2 to illustrate this point, which shows the average number of exons and transcripts per gene for the Ensembl human gene reference.</p>
                        </list-item>
                        <list-item>
                            <p>We have rename the TPR to &#x2018;Fraction recalled&#x2019; (also in Supplementary Figure 3) to indicate that the metric does not strictly measure false positive rate.</p>
                        </list-item>
                    </list>
                </p>
            </body>
        </sub-article>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report45465">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.19992.r45465</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Vitting-Seerup</surname>
                        <given-names>Kristoffer</given-names>
                    </name>
                    <xref ref-type="aff" rid="r45465a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-6450-0608</uri>
                </contrib>
                <aff id="r45465a1">
                    <label>1</label>Department of Biology, &#x00a0;University of Copenhagen, Copenhagen, Denmark</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>20</day>
                <month>3</month>
                <year>2019</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2019 Vitting-Seerup K</copyright-statement>
                <copyright-year>2019</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport45465" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.18276.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>
                <bold>Summary</bold>
            </p>
            <p> In the manuscript &#x201c;Fast and accurate differential transcript usage by testing equivalence class counts&#x201d; by Cmero 
                <italic>et al</italic> suggest to use the ability of modern lightweight RNA-seq aligners to produce transcript compatibility counts (TCC) in combination with standard tools designed for differential transcript usage (DTU). Although the idea, as described in the introduction of the article, have been partly touched on by previous publications from the Pachter Lab, the approach used in this manuscript is novel since it describes a direct DTU analysis whereas the previous publications only inferred DTU indirectly. In this manuscript Cmero 
                <italic>et al</italic> compares a TCC based DTU workflow against at transcript based and an exon based workflow using both simulated and real data reaching the conclusion that a TCC based workflow is superior &#x2013; a novel and important finding. The manuscript is overall well presented and the analysis approach is state-of-the art. Unfortunately the analysis is not quite extensive enough and it suffers from a few major technical problems which together with a general lack of clarity in the writing means the manuscript requires major revisions.</p>
            <p> 
                <bold>Major comments</bold>: 
                <list list-type="bullet">
                    <list-item>
                        <p>The authors should also evaluate on the simulated data from Love 
                            <italic>et al</italic> 2018
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-45465-1">1</xref>
                            </sup>&#x00a0;to test the effect of: 
                            <list list-type="bullet">
                                <list-item>
                                    <p>A different simulation scheme (since the FDRs are so high for Soneson 
                                        <italic>et al</italic> data).</p>
                                </list-item>
                                <list-item>
                                    <p>Investigate the stability of the results using different number of replicates</p>
                                </list-item>
                            </list> </p>
                    </list-item>
                    <list-item>
                        <p>All analysis presented is performed on unfiltered data which is problematic. Firstly it does not reflect typical RNA-seq analysis workflows which always include a step which filters out lowly expressed features before continued analysis. Furthermore, and more problematically, the lack of expression filtering will affect all analysis presented since many lowly or zero expressed features will be analyzed thereby skewing the global comparison due to the difference in the proportion of low/zero in the different datasets/pipelines
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-45465-2">2</xref>
                            </sup>. Therefore, the authors should include (or replace the current analysis with) an analysis based on dataset which have been pre-filtered for expression. For inspiration of expression filtering
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-45465-1">1</xref>
                            </sup>
                            <sup>,</sup>
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-45465-2">2</xref>
                            </sup>, the edgeR::filterByExpr() function or use the classical 1 TPM cutoff. Naturally the 3 dataset should be also filtered to be comparable (same transcripts/genes tested with all methods).</p>
                    </list-item>
                    <list-item>
                        <p>To ensure correct quantification and to make the genome based (STAR) and lightweight based (Salmon) analysis comparable the Salmon index should be build from all transcripts and subsequently (after quantification) the data should be reduced to only protein coding genes. This is necessary to ensure that reads mapping to both protein coding genes and lncRNAs are correctly quantified (and are quantified in a manner comparable to the genome based approach).</p>
                    </list-item>
                    <list-item>
                        <p>The manuscript is in general not concise enough. Throughout, the manuscript is very hard to follow which workflow is referred to and the order in which workflows they are presented is not logical (e.g. starting a section with explain about the alternative workflow does not make sense). Figures contain data never mention or used. Especially the discussion falls short of the mark as major parts are either repetitive non-informative.</p>
                    </list-item>
                </list> 
                <bold>Minor comments</bold>: 
                <list list-type="bullet">
                    <list-item>
                        <p>Generally: 
                            <list list-type="bullet">
                                <list-item>
                                    <p>The authors should use &#x201c;transcript compatibility counts&#x201d; (TCC) (aka not &#x201c;equivalence class read counts&#x201d; (EC) and derivations thereof) since TCC is the terminology used in the field when ECs are used for quantification
                                        <sup>
                                            <xref ref-type="bibr" rid="rep-ref-45465-3">3</xref>
                                        </sup>.</p>
                                </list-item>
                            </list> </p>
                    </list-item>
                    <list-item>
                        <p>Title: 
                            <list list-type="bullet">
                                <list-item>
                                    <p>The title seems to lack a word after such as &#x201c;analysis&#x201d; or &#x201c;testing&#x201d; after &#x201c;differential transcript usage&#x201d;</p>
                                </list-item>
                            </list> </p>
                    </list-item>
                    <list-item>
                        <p>Abstract: 
                            <list list-type="bullet">
                                <list-item>
                                    <p>In the sentence &#x201c;However, recent evaluations show lower sensitivity in DTU analysis&#x201d; I guess the authors mean compared to exon-level analysis but this needs to be specified.</p>
                                </list-item>
                                <list-item>
                                    <p>The conclusion is to broad. The authors investigate DTU but conclude about &#x201c;many&#x201d; analysis. Such a sentence should probably be saved for a review paper.</p>
                                </list-item>
                            </list> </p>
                    </list-item>
                    <list-item>
                        <p>Introduction: 
                            <list list-type="bullet">
                                <list-item>
                                    <p>In addition to exon and transcript based analysis approaches the authors also need to mention analysis of individual splice events (via tools such as SUPPA2 and rMATS) as well as the types of analysis which groups multiple features together (such as Leafcutter and MAJIQ) to make clear that there are 4 different approaches (with TCC based approach being a fifth (or a deviation of transcript based)). I do not require the authors to also compared the TCC based approach to the two omitted workflows &#x2013; but they should be mentioned in the introduction for completeness.</p>
                                </list-item>
                                <list-item>
                                    <p>I think it could be beneficial to refer more the lower part of Figure 1 in the Introduction since it very clearly present the 3 different workflows in question?</p>
                                </list-item>
                                <list-item>
                                    <p>The drawbacks of pseudo/quasi alignment should be mentioned/discussed either in the introduction or discussion.</p>
                                </list-item>
                                <list-item>
                                    <p>In the sentence &#x201c;Depending on its sequence, a read can align to all three transcripts, only two of the transcripts or just one transcript. These different combinations result in four possible equivalence classes, containing read counts, for this gene&#x201d; the last statement is wrong. There are 6 possible (the authors omit uniquely t2 and uniquely t3). This should either be mention or it should be highlighted the example reads in Figure 1 give rise to 4 possibilities.</p>
                                </list-item>
                                <list-item>
                                    <p>The authors should provide a reference&#x00a0;
                                        <sup>
                                            <xref ref-type="bibr" rid="rep-ref-45465-3">3</xref>
                                        </sup>&#x00a0;for the term &#x201c;transcript compatibility count&#x201d;.</p>
                                </list-item>
                                <list-item>
                                    <p>The authors should also discuss the ideas presented in Ntranos 
                                        <italic>et al</italic> 2019
                                        <sup>
                                            <xref ref-type="bibr" rid="rep-ref-45465-4">4</xref>
                                        </sup>&#x00a0;in the discussion of the Yi et al 2018 paper. Specifically the &#x201c;catch-it-all&#x201d; and &#x201c;any transcript-level phenomena&#x201d; part of the sentence in Cmero 
                                        <italic>et al</italic>: &#x201c;Yi and colleagues have recently introduced direct differential testing on equivalence classes in a catch-all method to identify genes that display any transcript-level phenomena&#x201d; needs to be changed as aggregation of DTE p-values cannot identify isoform switches if the gene expression is also changing (as discussed in detail in Ntranos 
                                        <italic>et al</italic> 2019) &#x2013; hence the need for methods specifically designed for DTU detection and thereby also the need for the workflow presented by the authors Cmero 
                                        <italic>et al.</italic>
                                    </p>
                                </list-item>
                                <list-item>
                                    <p>For Figure 1: Could it be beneficial to divide Figure 1 into A and B referring to respectively TCC and analysis pipelines?</p>
                                </list-item>
                            </list> </p>
                    </list-item>
                    <list-item>
                        <p>Methods: 
                            <list list-type="bullet">
                                <list-item>
                                    <p>It needs to be described in&#x00a0;detail how the fragment length and standard deviation were estimated from the Bottomly data since it is single end data. The actual values should also be reported for reproducibility.</p>
                                </list-item>
                                <list-item>
                                    <p>Is there a particular reason why Salmon/Kallisto was not run with the bias correction algorithms?</p>
                                </list-item>
                                <list-item>
                                    <p>Since the authors have to rerun salmon anyway (see major comments) it might be beneficial to update to Salmon v0.13.1 and also use the &#x201c;--validateMappings&#x201d; option.</p>
                                </list-item>
                                <list-item>
                                    <p>Please state the parameters used with Trimmomatic for reproducibility.</p>
                                </list-item>
                                <list-item>
                                    <p>Please provide info on how the transcript-level counts was obtained (and specify if any scaling was done with e.g. tximport).</p>
                                </list-item>
                                <list-item>
                                    <p>Please also indicate how the exon/transcript level analysis was summarized to gene-level for each of the 3 workflows.</p>
                                </list-item>
                                <list-item>
                                    <p>Please provide the unfiltered salmon quantification results (the &#x201c;quant.sf&#x201d; files) from the Bottomly 
                                        <italic>et al</italic> data as supplementary files to facilitate reproducibility.</p>
                                </list-item>
                                <list-item>
                                    <p>Please provide details of how the STAR mapped data was converted to DEXSeq ready counts (currently only implied in the result section).</p>
                                </list-item>
                            </list> </p>
                    </list-item>
                    <list-item>
                        <p>Results: 
                            <list list-type="bullet">
                                <list-item>
                                    <p>From the first paragraph in results it is not clear that you are actually doing all 3 types of analysis and comparing them. And starting with mentioning the &#x201c;alternative approach&#x201d; is not reader friendly.</p>
                                </list-item>
                                <list-item>
                                    <p>For references to previous DTU benchmarking please also cite Love et al 2018
                                        <sup>
                                            <xref ref-type="bibr" rid="rep-ref-45465-1">1</xref>
                                        </sup>.</p>
                                </list-item>
                                <list-item>
                                    <p>For the &#x201c;Fewer equivalence classes are expressed than exons&#x201d; analysis it is unclear whether it is the number of exons or disjointed exon bins necessary for a standard DEXSeq workflow (due to alternative 3&#x2019; and 5&#x2019; splice sites) which are quantified.</p>
                                </list-item>
                                <list-item>
                                    <p>Figure 2: Was any pseudocounts or transformation used to calculate the normalized cross replicate variance (Log2(var / mean))?</p>
                                </list-item>
                                <list-item>
                                    <p>Figure 3: Please add &#x201c;A&#x201d; and &#x201c;B&#x201d; to the figure in accordance with the figure legend.</p>
                                </list-item>
                                <list-item>
                                    <p>Figure 3A: 
                                        <list list-type="bullet">
                                            <list-item>
                                                <p>What is visualized is not explained in figure legend.</p>
                                            </list-item>
                                            <list-item>
                                                <p>It is currently not possible to distinguish between the different methods on the plot. Please provide zoom in versions of the plot to enhance visual comparison.</p>
                                            </list-item>
                                            <list-item>
                                                <p>From the point of this paper (comparing the 3 workflows depicted in the lower half of figure 1) it is very strange that multiple exon-based workflows as well as the result of a MISO based workflow (which is never discussed) are also shown. Would it not make more sense to only show exon-based workflow used and omit the MISO based workflow? Furthermore since the supplementary figures show that Salmon and Kallisto produce the same results why not only show one of them?</p>
                                            </list-item>
                                        </list> </p>
                                </list-item>
                                <list-item>
                                    <p>Figure 3B: 
                                        <list list-type="bullet">
                                            <list-item>
                                                <p>Please report which FDR cutoff was used to call significance.</p>
                                            </list-item>
                                            <list-item>
                                                <p>Please also report the result analysis on the feature level (transcript/exon) and not just for the gene-level.</p>
                                            </list-item>
                                            <list-item>
                                                <p>The authors should discuss the much larger variance in FDR for TCC and exon based approaches as well as the generally large FDR values (gussing the target value was 0.05)</p>
                                            </list-item>
                                        </list> </p>
                                </list-item>
                            </list> </p>
                    </list-item>
                    <list-item>
                        <p>Discussion: 
                            <list list-type="bullet">
                                <list-item>
                                    <p>&#x201c;We propose that equivalence classes are the optimal unit for performing count based differential testing&#x201d; is to broad a claim since this article is about DTU analysis. Save it for a review :-).</p>
                                </list-item>
                                <list-item>
                                    <p>Please refer to 
                                        <ext-link ext-link-type="uri" xlink:href="https://miso.readthedocs.io/en/fastmiso/sashimi.html">sashimi plots</ext-link> in addition to superTranscripts &#x2013; they had the visualization idea first.</p>
                                </list-item>
                            </list> </p>
                    </list-item>
                </list>
            </p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Partly</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>Yes</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Partly</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Partly</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Yes</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>No</p>
            <p>Reviewer Expertise:</p>
            <p>Bioinformatics with a focus of analysis of transcripts from RNA-seq data</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <back>
            <ref-list>
                <title>References</title>
                <ref id="rep-ref-45465-1">
                    <label>1</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Swimming downstream: statistical analysis of differential transcript usage following Salmon quantification.</article-title>
                        <source>
                            <italic>F1000Res</italic>
                        </source>.<year>2018</year>;<volume>7</volume>:
                        <elocation-id>10.12688/f1000research.15398.3</elocation-id>
                        <fpage>952</fpage>
                        <pub-id pub-id-type="pmid">30356428</pub-id>
                        <pub-id pub-id-type="doi">10.12688/f1000research.15398.3</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-45465-2">
                    <label>2</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>A direct comparison of genome alignment and transcriptome pseudoalignment</article-title>.
                        <source>
                            <italic>bioRxiv</italic>
                        </source>.<year>2018</year>;
                        <elocation-id>10.1101/444620</elocation-id>
                        <pub-id pub-id-type="doi">10.1101/444620</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-45465-3">
                    <label>3</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>Fast and accurate single-cell RNA-seq analysis by clustering of transcript-compatibility counts</article-title>.
                        <source>
                            <italic>Genome Biology</italic>
                        </source>.<year>2016</year>;<volume>17</volume>(<issue>1</issue>) :
                        <elocation-id>10.1186/s13059-016-0970-8</elocation-id>
                        <pub-id pub-id-type="doi">10.1186/s13059-016-0970-8</pub-id>
                    </mixed-citation>
                </ref>
                <ref id="rep-ref-45465-4">
                    <label>4</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>A discriminative learning approach to differential expression analysis for single-cell RNA-seq.</article-title>
                        <source>
                            <italic>Nat Methods</italic>
                        </source>.<year>2019</year>;<volume>16</volume>(<issue>2</issue>) :
                        <elocation-id>10.1038/s41592-018-0303-9</elocation-id>
                        <fpage>163</fpage>-<lpage>166</lpage>
                        <pub-id pub-id-type="pmid">30664774</pub-id>
                        <pub-id pub-id-type="doi">10.1038/s41592-018-0303-9</pub-id>
                    </mixed-citation>
                </ref>
            </ref-list>
        </back>
        <sub-article article-type="response" id="comment4582-45465">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Oshlack</surname>
                            <given-names>Alicia</given-names>
                        </name>
                        <aff>Peter MacCallum Cancer Centre, Australia</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>19</day>
                    <month>4</month>
                    <year>2019</year>
                </pub-date>
            </front-stub>
            <body>
                <p>Thank you for taking the time to review our paper and for the helpful suggestions.</p>
                <p>Major comments: 
                    <list list-type="bullet">
                        <list-item>
                            <p>We ran our EC-based method, as well as the transcript and exon-based methods, through DTU analysis on the simulated data from Love 
                                <italic>et al.</italic>[1]. EC-based results can be seen to be on par with transcript-based results (Supplementary Figure 8). As we note in the revised paper, the simulations were based on baseline abundances derived using Salmon, which may have favoured Salmon-derived transcript quantifications in the downstream analysis. Together with the Soneson simulation, this indicates that ECCs perform as well as the best method regardless of the assumptions and biases in the simulated datasets.&#x00a0;</p>
                        </list-item>
                        <list-item>
                            <p>Regarding filtering, we note that DEXSeq automatically filters out zero-count features and low-count data. In order to show the effects of basic filtering on the EC, transcript and exon-based approaches, we present Supplementary Figure 6. This figure shows that filtering performs slightly better in controlling FDR per approach. Importantly, ECC-based DTU still out-performs transcript-based DTU in both the drosophila and human data.</p>
                        </list-item>
                        <list-item>
                            <p>We agree with the reviewer&#x2019;s comment that the pseudo-alignment index should be built on all transcripts and only subsequently filtered. The Soneson simulation data, however, is restricted to protein-coding genes only. All downstream results obtained from the Soneson data were run on references containing protein-coding genes only, therefore we opted to keep references consistent with the EC-based approach for optimal fairness. As we also compared the Bottomly data with the Soneson data in Figure 2, we opted to take the same approach with the Bottomly data. We have noted this decision in the methods. We used the whole transcript index without gene filtering for the Love data.</p>
                        </list-item>
                        <list-item>
                            <p>We have updated the manuscript for conciseness, and updated labels figures and captions to improve clarity.</p>
                        </list-item>
                    </list> Minor comments: 
                    <list list-type="bullet">
                        <list-item>
                            <p>General, title and abstract: 
                                <list list-type="bullet">
                                    <list-item>
                                        <p>We have opted to retain the use of &#x2018;equivalence class counts&#x2019;, noting that both &#x2018;transcript compatibility counts&#x2019; and &#x2018;equivalence class counts&#x2019; are used in the literature.</p>
                                    </list-item>
                                    <list-item>
                                        <p>We have updated the manuscript title for clarity.</p>
                                    </list-item>
                                    <list-item>
                                        <p>We have addressed the points regarding the abstract.</p>
                                    </list-item>
                                </list> </p>
                        </list-item>
                        <list-item>
                            <p>Introduction: 
                                <list list-type="bullet">
                                    <list-item>
                                        <p>We have now cited transcript-assembly and spliced-in DTU approaches.</p>
                                    </list-item>
                                    <list-item>
                                        <p>We now discuss some of the limitations of pseudo-alignment in the Discussion.</p>
                                    </list-item>
                                    <list-item>
                                        <p>We opted to show four equivalence classes in Figure 1 for simplicity. We have noted the possibility of ECs containing solely t2 and t3 in the main text.</p>
                                    </list-item>
                                    <list-item>
                                        <p>We have added a reference for the term &#x2018;transcript compatibility counts&#x2019;.</p>
                                    </list-item>
                                    <list-item>
                                        <p>We have corrected the discussion on ideas presented in Ntranos 
                                            <italic>et al</italic>.[2]</p>
                                    </list-item>
                                    <list-item>
                                        <p>We have added (a) and (b) labels for Figures 1-3</p>
                                    </list-item>
                                </list> </p>
                        </list-item>
                        <list-item>
                            <p>Methods: 
                                <list list-type="bullet">
                                    <list-item>
                                        <p>Fragment lengths and standard deviations were estimated directly from the read lengths (as these varied between reads due to trimming). The length and standard deviation values have been added to the methods section.</p>
                                    </list-item>
                                    <list-item>
                                        <p>Salmon/Kallisto were run with default arguments (apart from returning equivalence class counts) in order to run the software more-or-less &#x2018;out of the box&#x2019; without parameter tuning, which may take focus away from the conceptual advance of using equivalence classes. Additionally, --validateMappings can be seen as a further optimisation to EC-derivation.</p>
                                    </list-item>
                                    <list-item>
                                        <p>Trimmomatic parameters have been added to the methods section. STAR was run with default parameters, which has also been added to the methods section.</p>
                                    </list-item>
                                    <list-item>
                                        <p>Tximport with &#x2018;scaledTPM&#x2019; scaling was used to obtain transcript abundances from Salmon. This is now reflected in the methods.</p>
                                    </list-item>
                                    <list-item>
                                        <p>DEXSeq&#x2019;s 
                                            <italic>perGeneQValue </italic>function was used to obtain gene-level significance values. This is now reflected in the methods.</p>
                                    </list-item>
                                    <list-item>
                                        <p>Salmon&#x2019;s &#x201c;quant.sf&#x201d; files are available in the ec-dtu-paper github repository. This is now reflected in the methods.</p>
                                    </list-item>
                                    <list-item>
                                        <p>We have clarified how exon counts are obtained from STAR counts.</p>
                                    </list-item>
                                </list> </p>
                        </list-item>
                        <list-item>
                            <p>Results: 
                                <list list-type="bullet">
                                    <list-item>
                                        <p>We have revised the first paragraph for clarity.</p>
                                    </list-item>
                                    <list-item>
                                        <p>We now cite Love 
                                            <italic>et al.</italic>[1] in reference to DTU method benchmarking.</p>
                                    </list-item>
                                    <list-item>
                                        <p>The &#x201c;Fewer equivalence classes are expressed than exons&#x201d; analysis considers exon counting bins. This has now been clarified in the main text.</p>
                                    </list-item>
                                    <list-item>
                                        <p>Cross-replicate log2(var / mean) calculations were performed on CPM-transformed and lightly filtered data. This is now reflected in the methods section.</p>
                                    </list-item>
                                    <list-item>
                                        <p>Figure 3 
                                            <list list-type="bullet">
                                                <list-item>
                                                    <p>We have described the figure in greater detail in the caption.</p>
                                                </list-item>
                                                <list-item>
                                                    <p>Supplementary Figure 6 has been added, which uses zoomed-in axes and shows results for ECC, exon and transcript counts.</p>
                                                </list-item>
                                                <list-item>
                                                    <p>We show MISO as the way the method was used in Soneson 
                                                        <italic>et al.</italic> is conceptually similar and have removed feaureCounts and kallisto results to remove clutter.</p>
                                                </list-item>
                                                <list-item>
                                                    <p>FDR cutoff is now stated in the figure legend.</p>
                                                </list-item>
                                                <list-item>
                                                    <p>Reporting the results on the feature level is not feasible as truth data is not available at the feature level. Additionally, equivalence classes do not map cleanly to features, which would make it difficult to asses the truth of features even if exon and transcript-level truth were available.</p>
                                                </list-item>
                                                <list-item>
                                                    <p>For the Bottomly replication data, we now note the FDR variance of the ECC and exon-count based methods, indicating that this may be the result of substructure in the data. Importantly, FDR is lower in all iterations but one for ECCs compared with transcript counts.</p>
                                                </list-item>
                                            </list> </p>
                                    </list-item>
                                </list> </p>
                        </list-item>
                        <list-item>
                            <p>Discussion: 
                                <list list-type="bullet">
                                    <list-item>
                                        <p>We have addressed the suggestions for the discussion.</p>
                                    </list-item>
                                </list> </p>
                        </list-item>
                    </list> &#x00a0;</p>
                <p>References:</p>
                <p>[1] Love, M. I., Soneson, C., &amp; Patro, R. (2018). Swimming downstream: statistical analysis of differential transcript usage following Salmon quantification. 
                    <italic>F1000Research</italic>, 
                    <italic>7</italic>, 952. 
                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.12688/f1000research.15398.1">https://doi.org/10.12688/f1000research.15398.1</ext-link>
                </p>
                <p>[2] Ntranos, V., Yi, L., Melsted, P., &amp; Pachter, L. (2019). A discriminative learning approach to differential expression analysis for single-cell RNA-seq. 
                    <italic>Nature Methods</italic>, 
                    <italic>16</italic>(February), 1. 
                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1038/s41592-018-0303-9">https://doi.org/10.1038/s41592-018-0303-9</ext-link>
                </p>
            </body>
        </sub-article>
    </sub-article>
</article>
