<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="other" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.11982.2</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Software Tool Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                    <subj-group>
                        <subject>Bioinformatics</subject>
                    </subj-group>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>
                    <italic>RNAtor</italic>: an Android-based application for biologists to plan RNA sequencing experiments</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 2; peer review: 1 approved, 1 approved with reservations]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Kane</surname>
                        <given-names>Shruti</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Garg</surname>
                        <given-names>Himanshu</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Krishnan</surname>
                        <given-names>Neeraja M.</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Singh</surname>
                        <given-names>Aditya</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Panda</surname>
                        <given-names>Binay</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-5136-2090</uri>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>Ganit Labs, Bio-IT Centre, Institute of Bioinformatics and Applied Biotechnology, Bangalore, India</aff>
                <aff id="a2">
                    <label>2</label>Strand Life Sciences, Bangalore, India</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:binay@ganitlabs.in">binay@ganitlabs.in</email>
                </corresp>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>16</day>
                <month>11</month>
                <year>2017</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2017</year>
            </pub-date>
            <volume>6</volume>
            <elocation-id>997</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>19</day>
                    <month>6</month>
                    <year>2026</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2017 Kane S et al.</copyright-statement>
                <copyright-year>2017</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/6-997/pdf"/>
            <abstract>
                <p>RNA sequencing (RNA-seq) is a powerful technology that allows one to assess the RNA levels in a sample. Analysis of these levels can help in identifying novel transcripts (coding, non-coding and splice variants), understanding transcript structures, and estimating gene/allele expression. Biologists face specific challenges while designing RNA-seq experiments. The nature of these challenges lies in determining the total number of sequenced reads and technical replicates required for detecting marginally differentially expressed transcripts. Despite previous attempts to address these challenges, easily-accessible and biologist-friendly mobile applications do not exist. Thus, we developed 
                    <italic toggle="yes">RNAtor</italic>, a mobile application for Android platforms, to aid biologists in correctly designing their RNA-seq experiments. The recommendations from 
                    <italic toggle="yes">RNAtor</italic> are based on simulations and real data.</p>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>RNA-seq</kwd>
                <kwd>Android-based</kwd>
                <kwd>simulations</kwd>
                <kwd>mobile application</kwd>
                <kwd>recommendations</kwd>
                <kwd>experimental design</kwd>
            </kwd-group>
            <funding-group>
                <award-group id="fund-1">
                    <funding-source>Government of Karnataka, India</funding-source>
                    <award-id>3451-00-090-2-22</award-id>
                </award-group>
                <award-group id="fund-2">
                    <funding-source>Department of Electronics and Information Technology, Government of India</funding-source>
                    <award-id>18(4)/2010-E-Infra.</award-id>
                    <award-id>31-03-2010</award-id>
                </award-group>
                <funding-statement>Research presented in this article is funded by the Department of Electronics and Information Technology, Government of India (Ref No: 18(4)/2010-E-Infra., 31-03-2010) and Department of IT, BT and ST, Government of Karnataka, India (Ref No: 3451-00-090-2-22).</funding-statement>
                <funding-statement>
                    <italic>The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</italic>
                </funding-statement>
            </funding-group>
        </article-meta>
        <notes>
            <sec sec-type="version-changes">
                <label>Revised</label>
                <title>Amendments from Version 1</title>
                <p>Keeping in view the reviewers&#x2019; suggestions, we have made the following changes in the revised version of the manuscript. 
                    <list list-type="order">
                        <list-item>
                            <p>Portions of Abstract, Results and Discussion were re-written to correctly reflect the advantages and limitations of the tool, compared to the other existing web-based tools like EDDA and Scotty.</p>
                        </list-item>
                        <list-item>
                            <p>Legend to Figure 3 is added that was missing earlier and the legends for other figures were revised to correctly reflect the data.</p>
                        </list-item>
                        <list-item>
                            <p>Provided better description of data presented in Figure 2.</p>
                        </list-item>
                        <list-item>
                            <p>Described the method by which the DEGs were calculated.</p>
                        </list-item>
                        <list-item>
                            <p>Defined replicates.</p>
                        </list-item>
                        <list-item>
                            <p>Described the use of simulated data.</p>
                        </list-item>
                        <list-item>
                            <p>Defined true and false positives.</p>
                        </list-item>
                        <list-item>
                            <p>Defined transcript recovery.</p>
                        </list-item>
                        <list-item>
                            <p>Supplementary Figure 4 was replaced at a higher resolution.</p>
                        </list-item>
                    </list>
                </p>
            </sec>
        </notes>
    </front>
    <body>
        <sec sec-type="intro">
            <title>Introduction</title>
            <p>RNA-seq offers several advantages over low-throughput technologies such as quantitative PCR and annotation-dependent methods such as microarrays. Designing RNA-seq experiments accurately, however, poses challenge to biologists. This is particularly true when prior knowledge on genome or transcriptome of the organism of choice is not available. It is important to determine the number of technical replicates and the number of sequencing reads, and choose the right analytical tool, to estimate subtle differences between expression levels of transcripts.</p>
            <p>Web-based tools, Scotty (
                <xref ref-type="bibr" rid="ref-3">Busby 
                    <italic toggle="yes">et al.</italic>, 2013</xref>) and EDDA (
                <xref ref-type="bibr" rid="ref-7">Luo 
                    <italic toggle="yes">et al.</italic>, 2014</xref>), have an established precedence in aiding RNA-seq design. While Scotty relies solely on pilot or prototype data, EDDA relies on either pilot data or a simulate-and-test paradigm to account for variability across experimental conditions. Scotty has a built-in t-test based module, whereas EDDA has been linked to five other DE tools, post mode-normalization of the data. Both can detect DEGs upto 2-fold difference.</p>
            <p>In the current manuscript, we describe 
                <italic toggle="yes">RNAtor</italic>, an Android app with a user-friendly graphical user interface (GUI) that helps biologists design RNA-seq experiments. A mobile application offers a lot more flexibility, ease of navigation, user-friendliness, and offline features compared to a web-based tool, even when the latter can also be accessed or computed on the mobile. RNAtor can be linked to any existing differential expression analysis tool, and can help design experiments to estimate expression differences with as low as 0.8&#x2013;1.2X fold change. 
                <italic toggle="yes">RNAtor&#x2019;s</italic> recommendations are based on an exhaustive combination of discovery with simulated reads for transcriptomes of varying sizes (3 to 100 Mb). These recommendations are subsequently validated with sequenced data from 
                <italic toggle="yes">Saccharomyces cerevisiae</italic>, while comparing expression profiles of wild-type and mutant strains.</p>
        </sec>
        <sec sec-type="methods">
            <title>Methods</title>
            <sec>
                <title>Implementation</title>
                <p>We simulated varying numbers of Illumina-like reads with technical replicates, with fold changes ranging from 1.2&#x2013;5X between the control and treatment samples, in both directions, on a 3 Mb human chr14 (hg19) transcriptome, using Polyester (
                    <xref ref-type="bibr" rid="ref-4">Frazee 
                        <italic toggle="yes">et al.</italic>, 2015</xref>). We detected differentially expressed genes (DEGs) on all the simulations using Tophat v2.1.1-Cufflinks v2.2.1 (
                    <xref ref-type="bibr" rid="ref-11">Trapnell 
                        <italic toggle="yes">et al.</italic>, 2012</xref>) based genome-guided workflow followed by differential expression analyses using five tools: Deseq v1.28.0 (
                    <xref ref-type="bibr" rid="ref-1">Anders &amp; Huber, 2010</xref>); Deseq2 v1.16.1 (
                    <xref ref-type="bibr" rid="ref-6">Love 
                        <italic toggle="yes">et al.</italic>, 2014</xref>); EdgeR v3.18.1 (
                    <xref ref-type="bibr" rid="ref-9">Robinson 
                        <italic toggle="yes">et al.</italic>, 2010</xref>); Cuffdiff-Cufflinks v2.2.1 (
                    <xref ref-type="bibr" rid="ref-11">Trapnell 
                        <italic toggle="yes">et al.</italic>, 2012</xref>); and Kallisto v0.43.1 (
                    <xref ref-type="bibr" rid="ref-2">Bray 
                        <italic toggle="yes">et al.</italic>, 2016</xref>) and a 
                    <italic toggle="yes">de novo</italic> assembly-based tool, Trinity v2.3.2 (
                    <xref ref-type="bibr" rid="ref-5">Grabherr 
                        <italic toggle="yes">et al.</italic>, 2011</xref>) followed by differential expression analyses using Kallisto v0.43.1 (
                    <xref ref-type="bibr" rid="ref-2">Bray 
                        <italic toggle="yes">et al.</italic>, 2016</xref>). Thus, Kallisto was used twice; first, with the genome-guided paradigm and second, with 
                    <italic toggle="yes">de novo</italic> assembly using Trinity. In the first scenario, the Tophat-Cufflinks alignments (.bam) were converted to reads (.fastq) to be used with Kallisto along with the 3 Mb transcriptome as the reference. In the second scenario, the 
                    <italic toggle="yes">de novo</italic> assembled transcriptome as the reference along with the simulated reads was used with Kallisto. All differential expression analysis softwares were run with default cut-offs. We studied results from these simulations on the number of DEGs detected reliably and the extent of recovery of those DEGs. Transcript recovery refers to the length the transcript as assembled by Tophat, found to be differentially expressed by EdgeR or CuffDiff or DESeq2, in relation to the actual length as per simulations. It is possible to estimate this parameter only for these three tools, since they offer a handle to the actual transcript IDs. Based on these simulations, we arrived at recommendations on the number of reads, number of replicates, and the tool(s) needed to identify DEGs reliably. We validated these recommendations using simulated reads from larger transcriptomes (10Mb, 30Mb and 100Mb), created by combining transcriptomes from more than one hg19 chromosome, and using a real 
                    <italic toggle="yes">Sacharomyces cerevisiae</italic> dataset (ENA accession: 
                    <ext-link ext-link-type="uri" xlink:href="http://www.ebi.ac.uk/ena/data/view/PRJEB5348">ERP004763</ext-link>) comprising of 48 biological replicates, for two conditions; wild-type (WT) and a 
                    <italic toggle="yes">snf2</italic> knock-out (KO) mutant (
                    <xref ref-type="bibr" rid="ref-10">Schurch 
                        <italic toggle="yes">et al.</italic>, 2016</xref>).</p>
            </sec>
            <sec>
                <title>Operation</title>
                <p>The size of the transcriptome (or genome if the transcriptome size is not known), taken from a user-defined or from a backend database, the number of replicates to use and the fold change of DEGs are user-defined parameters in 
                    <italic toggle="yes">RNAtor</italic> (
                    <xref ref-type="fig" rid="f1">Figure 1</xref>). An 
                    <italic toggle="yes">RNAtor</italic> flowchart highlighting simulation conditions and analytical tools used is provided in 
                    <xref ref-type="other" rid="SF1">Supplementary Figure S1</xref>.</p>
                <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                    <label>Figure 1. </label>
                    <caption>
                        <title>Screenshots of the 
                            <italic toggle="yes">RNAtor</italic> mobile application.</title>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/14320/4e652a73-b05f-47bf-81d7-9eb2e4dcf974_figure1.gif"/>
                </fig>
            </sec>
        </sec>
        <sec sec-type="results">
            <title>Results</title>
            <p>
                <italic toggle="yes">RNAtor</italic> was evaluated using questions that a biologist would typically ask before starting an experiment, followed by the recommendations provided by 
                <italic toggle="yes">RNAtor</italic>.</p>
            <sec>
                <title>Read requirements for optimal DEG detection</title>
                <p>One, 1.5, 6, 10, 14 and 20 million reads are needed for detection of differential expression of DEGs at 5-fold, 4-fold, 3-fold, 2-fold, 1.5-fold and 1.2-fold change, respectively, for a 3Mb transcriptome with 3 technical replicates.</p>
                <p>We simulated 0.2&#x2013;20 million reads for human chromosome 14 (~3Mb) and observed that the numbers of detected DEGs simulated at a given fold change peaked for a certain coverage before plateauing (
                    <xref ref-type="fig" rid="f2">Figure 2</xref>). This observation remained valid for the real data (
                    <xref ref-type="fig" rid="f3">Figure 3</xref>) and the large simulated transcriptomes (10Mb, 30Mb and 100Mb) (
                    <xref ref-type="other" rid="SF2">Supplementary Figure S2</xref>). Increasing the number of sequencing reads increased the sensitivity of detection. The final recommendations from 
                    <italic toggle="yes">RNAtor</italic> correspond to the number of DEGs at its peak, and are therefore, a good compromise between sensitivity and keeping the cost of sequencing low. Changing the number of technical replicates does change the recommendation. For example, with more than three replicates, 
                    <italic toggle="yes">RNAtor</italic> suggests producing fewer reads to obtain the same information (
                    <xref ref-type="table" rid="T1">Table 1</xref>).</p>
                <fig fig-type="figure" id="f2" orientation="portrait" position="float">
                    <label>Figure 2. </label>
                    <caption>
                        <title>Number of differentially expressed genes (DEGs) detected for simulated datasets (hg19 chr14) by Deseq, Deseq2, EdgeR, Cuffdiff, Kallisto-Sleuth and Trinity-Kallisto tools.</title>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/14320/4e652a73-b05f-47bf-81d7-9eb2e4dcf974_figure2.gif"/>
                </fig>
                <fig fig-type="figure" id="f3" orientation="portrait" position="float">
                    <label>Figure 3. </label>
                    <caption>
                        <title>Number of differentially expressed genes (DEGs) detected using a real dataset (
                            <italic toggle="yes">Saccharomyces cerevisiae</italic>) with the Kallisto-Sleuth pipeline.</title>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/14320/4e652a73-b05f-47bf-81d7-9eb2e4dcf974_figure3.gif"/>
                </fig>
                <table-wrap id="T1" orientation="portrait" position="anchor">
                    <label>Table 1. </label>
                    <caption>
                        <title>
						
                            <italic toggle="yes">RNAtor</italic> output on the number of sequencing reads (in millions) to be produced for 2&#x2013;5 technical replicates to detect differentially expressed genes at a given fold change.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th colspan="1" rowspan="1"/>
                                <th align="left" colspan="1" rowspan="1" valign="top">2 replicates</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">3 replicates</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">4 replicates</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">5 replicates</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td colspan="1" rowspan="1">
								
                                    <bold>5fold</bold>
							</td>
                                <td colspan="1" rowspan="1">6</td>
                                <td colspan="1" rowspan="1">2</td>
                                <td colspan="1" rowspan="1">1.5</td>
                                <td colspan="1" rowspan="1">1.5</td>
                            </tr>
                            <tr>
                                <td colspan="1" rowspan="1">
								
                                    <bold>4fold</bold>
							</td>
                                <td colspan="1" rowspan="1">10</td>
                                <td colspan="1" rowspan="1">6</td>
                                <td colspan="1" rowspan="1">2</td>
                                <td colspan="1" rowspan="1">1.5</td>
                            </tr>
                            <tr>
                                <td colspan="1" rowspan="1">
								
                                    <bold>3fold</bold>
							</td>
                                <td colspan="1" rowspan="1">10</td>
                                <td colspan="1" rowspan="1">6</td>
                                <td colspan="1" rowspan="1">6</td>
                                <td colspan="1" rowspan="1">6</td>
                            </tr>
                            <tr>
                                <td colspan="1" rowspan="1">
								
                                    <bold>2fold</bold>
							</td>
                                <td colspan="1" rowspan="1">14</td>
                                <td colspan="1" rowspan="1">10</td>
                                <td colspan="1" rowspan="1">10</td>
                                <td colspan="1" rowspan="1">6</td>
                            </tr>
                            <tr>
                                <td colspan="1" rowspan="1">
								
                                    <bold>1.5fold</bold>
							</td>
                                <td colspan="1" rowspan="1">30</td>
                                <td colspan="1" rowspan="1">20</td>
                                <td colspan="1" rowspan="1">20</td>
                                <td colspan="1" rowspan="1">14</td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
            </sec>
            <sec>
                <title>Detection sensitivity of DE tools</title>
                <p>Kallisto detected optimal number of DEGs with the highest sensitivity. Focusing purely on the number of DEGs detected between WT and KO, Kallisto performed best over the other tools tested (
                    <xref ref-type="fig" rid="f2">Figure 2</xref> and 
                    <xref ref-type="other" rid="SF3">Supplementary Figure 3</xref>).</p>
            </sec>
            <sec>
                <title>Detection specificity and transcript recovery by DE tools</title>
                <p>Cuffdiff can be used for high specificity and DeSeq2 and EdgeR, for high transcript recovery. Although Kallisto-Sleuth was fast and produced results with high sensitivity; we observed that this was at the expense of specificity of detection (
                    <xref ref-type="other" rid="SF3">Supplementary Figure S3</xref>). Cuffdiff produced results with high specificity albeit with a loss of sensitivity (
                    <xref ref-type="other" rid="SF3">Supplementary Figure S3</xref>). The transcript recovery was best for EdgeR for shorter (&lt;742 bases) and medium-sized (742&#x2013;1456 bases) transcripts, and best for CuffDiff for longer transcripts (&gt;1456 bases), among the 3 tools tested (CuffDiff, DeSeq and EdgeR, 
                    <xref ref-type="other" rid="SF4">Supplementary Figure S4</xref>).</p>
            </sec>
            <sec>
                <title>Performance of assembly-based pipeline over that of genome-guided tools</title>
                <p>The assembly-based pipeline yields more DEGs with higher sensitivity and specificity. Using Trinity (
                    <xref ref-type="bibr" rid="ref-5">Grabherr 
                        <italic toggle="yes">et al.</italic>, 2011</xref>) as an assembly pipeline along with Kallisto enhanced the number of DEGs detected when compared with the genome-guided Kallisto-Sleuth pipeline (
                    <xref ref-type="fig" rid="f2">Figure 2</xref>). While the sensitivity of Trinity-Kallisto was marginally better, its specificity was visibly better when compared to the Kallisto-Sleuth pipeline (
                    <xref ref-type="other" rid="SF3">Supplementary Figure S3</xref>).</p>
            </sec>
        </sec>
        <sec sec-type="discussion">
            <title>Discussion</title>
            <p>Although some of the challenges with RNA-seq experiments have been addressed previously (
                <xref ref-type="bibr" rid="ref-3">Busby 
                    <italic toggle="yes">et al.</italic>, 2013</xref>; 
                <xref ref-type="bibr" rid="ref-7">Luo 
                    <italic toggle="yes">et al.</italic>, 2014</xref>), currently there is no easy-to-use, biologist-friendly mobile phone-based app. Scotty, a previously reported, useful, interactive web-based tool aids RNA-seq experimental design. However, it has a dependence on pilot or prototype data, closely matching the actual experimental conditions (
                <xref ref-type="bibr" rid="ref-3">Busby 
                    <italic toggle="yes">et al.</italic>, 2013</xref>). EDDA, another web-based interactive RNA-seq experimental design aiding tool, offers more flexibility in terms of the use either providing pilot data or using a simulate-and-test paradigm as per the desired experimental conditions (
                <xref ref-type="bibr" rid="ref-7">Luo 
                    <italic toggle="yes">et al.</italic>, 2014</xref>). Both can detect genes or transcripts of only up to 2X fold change in the test condition relative to the control. RNAtor addresses some of these gaps as a user-friendly mobile app. Hhowever, it has certain limitations. For example, it does not take into account the dynamic nature of any transcriptome (where the exact size of transcriptome is not known and cannot simply be derived from the genome size), the throughput of different sequencing instruments, the presence of spliced variants, and the relative abundance of transcript, for e.g. in relation to a control gene or any other gene of interest. We also recognize that the RNAtor v1.0 is based on simple assumptions that can affect the recommendations. Nevertheless, the validation of the recommendations resulting from training on simulated RNA-seq data that has not yet incorporated various biological biases, with real data from 
                <italic toggle="yes">Saccharomyces cerevisiae</italic> provides strong evidence that our assumptions do not significantly impact RNAtor's guidance to users. That said, there is a prevailing need for a simple tool for biologists, who have simple questions. RNA-seq is not necessarily used to answer complex questions always, but also often as a superior substitute to qPCR. We intend to expand the scope of the tool in its future releases, by introducing biases that mimick various experimental conditions into the simulation phase.</p>
        </sec>
        <sec>
            <title>Data availability</title>
            <p>The Android version of RNAtor is available on Google Play Store.</p>
            <p>Latest source code: 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/binaypanda/RNAtor">https://github.com/binaypanda/RNAtor</ext-link>.</p>
            <p>Archived source code as at the time of publication: 
                <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.814905">https://doi.org/10.5281/zenodo.814905</ext-link> (
                <xref ref-type="bibr" rid="ref-8">Panda, 2017</xref>).</p>
            <p>License: RNAtor v1.0 is distributed under GNU GPLv3 licence.</p>
        </sec>
    </body>
    <back>
        <sec sec-type="supplementary-material">
            <title>Supplementary material</title>
            <p id="SF1">
                <bold>Supplementary Figure S1:</bold> 
                <italic toggle="yes">RNAtor</italic> flowchart highlighting simulation conditions (reads, technical replicates, and fold change of differential expression) and analytical tools used.</p>
            <p>
                <ext-link ext-link-type="uri" xlink:href="https://f1000researchdata.s3.amazonaws.com/supplementary/11982/f6e36b38-b63b-4721-964b-7ee447f923e6.pdf">Click here to access the data</ext-link>.</p>
            <p id="SF2">
                <bold>Supplementary Figure S2:</bold> Number of differentially expressed genes (DEGs) detected for various simulated dataset on 10Mb, 30Mb and 100Mb transcriptomes using the Kallisto-Sleuth pipeline.</p>
            <p>
                <ext-link ext-link-type="uri" xlink:href="https://f1000researchdata.s3.amazonaws.com/supplementary/11982/0de02352-a3a5-4f5d-83db-dacd4d4024c0.pdf">Click here to access the data</ext-link>.</p>
            <p id="SF3">
                <bold>Supplementary Figure S3:</bold> True/false positive curves for differentially expressed genes (DEGs) recovered under various simulation conditions, created by combining reads (0.1M&#x2013;20M), technical replicates (2&#x2013;5) and fold change of differential expression (1.2&#x2013;5X) by Cuffdiff, Deseq2, EdgeR, Kallisto and Trinity-Kallisto tools.</p>
            <p>
                <ext-link ext-link-type="uri" xlink:href="https://f1000researchdata.s3.amazonaws.com/supplementary/11982/2f069619-8989-4dc3-9eb1-f45b3f7915e3.pdf">Click here to access the data</ext-link>.</p>
            <p id="SF4">
                <bold>Supplementary Figure S4:</bold> Percentage recovery of transcripts under various simulation conditions, created by combining reads (0.1M&#x2013;20M), technical replicates (0&#x2013;5) and folds change of differential expression (1.2&#x2013;5X) with CuffDiff, DeSeq and EdgeR. The size of the bubble represents the extent of transcript recovery.</p>
            <p>
                <ext-link ext-link-type="uri" xlink:href="https://f1000researchdata.s3.amazonaws.com/supplementary/11982/8c32f369-3c8d-47cd-bfb6-ba23f6cd0958.pdf">Click here to access the data</ext-link>.</p>
        </sec>
        <ref-list>
            <ref id="ref-1">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Anders</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Huber</surname>
                            <given-names>W</given-names>
                        </name>
</person-group>:
                    <article-title>Differential expression analysis for sequence count data.</article-title>
                    <source>

                        <italic toggle="yes">Genome Biol.</italic>
</source>
                    <year>2010</year>;<volume>11</volume>(<issue>10</issue>):<fpage>R106</fpage>.
                    <pub-id pub-id-type="pmid">20979621</pub-id>
                    <pub-id pub-id-type="doi">10.1186/gb-2010-11-10-r106</pub-id>
                    <pub-id pub-id-type="pmcid">3218662</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-2">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Bray</surname>
                            <given-names>NL</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Pimentel</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Melsted</surname>
                            <given-names>P</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Near-optimal probabilistic RNA-seq quantification.</article-title>
                    <source>

                        <italic toggle="yes">Nat Biotechnol.</italic>
</source>
                    <year>2016</year>;<volume>34</volume>(<issue>5</issue>):<fpage>525</fpage>&#x2013;<lpage>527</lpage>.
                    <pub-id pub-id-type="pmid">27043002</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nbt.3519</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-3">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Busby</surname>
                            <given-names>MA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Stewart</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Miller</surname>
                            <given-names>CA</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Scotty: a web tool for designing RNA-Seq experiments to measure differential gene expression.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2013</year>;<volume>29</volume>(<issue>5</issue>):<fpage>656</fpage>&#x2013;<lpage>657</lpage>.
                    <pub-id pub-id-type="pmid">23314327</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btt015</pub-id>
                    <pub-id pub-id-type="pmcid">3582267</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-4">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Frazee</surname>
                            <given-names>AC</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Jaffe</surname>
                            <given-names>AE</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Langmead</surname>
                            <given-names>B</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>
                        <italic toggle="yes">Polyester</italic>: simulating RNA-seq datasets with differential transcript expression.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2015</year>;<volume>31</volume>(<issue>17</issue>):<fpage>2778</fpage>&#x2013;<lpage>2784</lpage>.
                    <pub-id pub-id-type="pmid">25926345</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btv272</pub-id>
                    <pub-id pub-id-type="pmcid">4635655</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-5">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Grabherr</surname>
                            <given-names>MG</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Haas</surname>
                            <given-names>BJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Yassour</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Full-length transcriptome assembly from RNA-Seq data without a reference genome.</article-title>
                    <source>

                        <italic toggle="yes">Nat Biotechnol.</italic>
</source>
                    <year>2011</year>;<volume>29</volume>(<issue>7</issue>):<fpage>644</fpage>&#x2013;<lpage>652</lpage>.
                    <pub-id pub-id-type="pmid">21572440</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nbt.1883</pub-id>
                    <pub-id pub-id-type="pmcid">3571712</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-6">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Love</surname>
                            <given-names>MI</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Huber</surname>
                            <given-names>W</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Anders</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.</article-title>
                    <source>

                        <italic toggle="yes">Genome Biol.</italic>
</source>
                    <year>2014</year>;<volume>15</volume>(<issue>12</issue>):<fpage>550</fpage>.
                    <pub-id pub-id-type="pmid">25516281</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13059-014-0550-8</pub-id>
                    <pub-id pub-id-type="pmcid">4302049</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-7">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Luo</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Li</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Chia</surname>
                            <given-names>BK</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The importance of study design for detecting differentially abundant features in high-throughput experiments.</article-title>
                    <source>

                        <italic toggle="yes">Genome Biol.</italic>
</source>
                    <year>2014</year>;<volume>15</volume>(<issue>12</issue>):<fpage>527</fpage>.
                    <pub-id pub-id-type="pmid">25517037 </pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13059-014-0527-7</pub-id>
                    <pub-id pub-id-type="pmcid">4253014</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-8">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Panda</surname>
                            <given-names>B</given-names>
                        </name>
</person-group>:
                    <article-title>binaypanda/RNAtor: RNAtor.</article-title>
                    <source>

                        <italic toggle="yes">Zenodo.</italic>
</source>
                    <year>2017</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.5281/zenodo.814905">Data Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-9">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Robinson</surname>
                            <given-names>MD</given-names>
                        </name>

                        <name name-style="western">
                            <surname>McCarthy</surname>
                            <given-names>DJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Smyth</surname>
                            <given-names>GK</given-names>
                        </name>
</person-group>:
                    <article-title>edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2010</year>;<volume>26</volume>(<issue>1</issue>):<fpage>139</fpage>&#x2013;<lpage>140</lpage>.
                    <pub-id pub-id-type="pmid">19910308</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btp616</pub-id>
                    <pub-id pub-id-type="pmcid">2796818</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-10">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Schurch</surname>
                            <given-names>NJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Schofield</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gierli&#x0144;ski</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use?</article-title>
                    <source>

                        <italic toggle="yes">RNA.</italic>
</source>
                    <year>2016</year>;<volume>22</volume>(<issue>6</issue>):<fpage>839</fpage>&#x2013;<lpage>51</lpage>.
                    <pub-id pub-id-type="pmid">27022035</pub-id>
                    <pub-id pub-id-type="doi">10.1261/rna.053959.115</pub-id>
                    <pub-id pub-id-type="pmcid">4878611</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-11">
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Trapnell</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Roberts</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Goff</surname>
                            <given-names>L</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks.</article-title>
                    <source>

                        <italic toggle="yes">Nat Protoc.</italic>
</source>
                    <year>2012</year>;<volume>7</volume>(<issue>3</issue>):<fpage>562</fpage>&#x2013;<lpage>578</lpage>.
                    <pub-id pub-id-type="pmid">22383036</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nprot.2012.016</pub-id>
                    <pub-id pub-id-type="pmcid">3334321</pub-id>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report28051">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.14320.r28051</article-id>
            <title-group>
                <article-title>Reviewer response for version 2</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Komura</surname>
                        <given-names>Daisuke</given-names>
                    </name>
                    <xref ref-type="aff" rid="r28051a1">1</xref>
                    <role>Referee</role>
                </contrib>
                <aff id="r28051a1">
                    <label>1</label>Department of Genomic Pathology, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>27</day>
                <month>12</month>
                <year>2017</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2017 Komura D</copyright-statement>
                <copyright-year>2017</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport28051" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.11982.2"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The quality of the revised manuscript has greatly improved. The authors addressed most of the remarks mentioned during the first review. I have one additional minor comment.</p>
            <p> </p>
            <p> 1) Supplementary Figure 2 : #reads=0 data points in the figure should be removed as in Figure 3.</p>
            <p>Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?</p>
            <p>Yes</p>
            <p>Is the rationale for developing the new software tool clearly explained?</p>
            <p>Partly</p>
            <p>Is the description of the software tool technically sound?</p>
            <p>Yes</p>
            <p>Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?</p>
            <p>Partly</p>
            <p>Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>NA</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report28052">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.14320.r28052</article-id>
            <title-group>
                <article-title>Reviewer response for version 2</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Nagarajan</surname>
                        <given-names>Niranjan</given-names>
                    </name>
                    <xref ref-type="aff" rid="r28052a1">1</xref>
                    <role>Referee</role>
                </contrib>
                <aff id="r28052a1">
                    <label>1</label>Genome Institute of Singapore, Singapore, Singapore</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>My lab developed the software EDDA (http://edda.gis.a-star.edu.sg/; https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0527-7) which has partly overlapping functionality.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>29</day>
                <month>11</month>
                <year>2017</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2017 Nagarajan N</copyright-statement>
                <copyright-year>2017</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport28052" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.11982.2"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>I thank the authors for carefully considering my comments and revising accordingly. I have a few major comments that still remain to be addressed:</p>
            <p> </p>
            <p> 1) I believe the manuscript needs at least a few convincing examples to show that it does the right thing in terms of providing recommendations to users. I don't see how the yeast datasets used currently serve this purpose.</p>
            <p> </p>
            <p> 2) I do not see how the simulation used is going to be appropriate for the diversity of users that this app hopes to cater. How will a user know that the recommendations from the app are not appropriate for their system?</p>
            <p> </p>
            <p> 3) The relative abundance of a transcript is indeed a critical parameter that determines how easily it can be picked up as being differentially abundant. Ignoring this aspect is likely to give a misleading impression to a user.</p>
            <p>Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?</p>
            <p>No</p>
            <p>Is the rationale for developing the new software tool clearly explained?</p>
            <p>No</p>
            <p>Is the description of the software tool technically sound?</p>
            <p>No</p>
            <p>Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?</p>
            <p>Partly</p>
            <p>Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>Genomics, Computational Biology</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report23782">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.12955.r23782</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Nagarajan</surname>
                        <given-names>Niranjan</given-names>
                    </name>
                    <xref ref-type="aff" rid="r23782a1">1</xref>
                    <role>Referee</role>
                </contrib>
                <aff id="r23782a1">
                    <label>1</label>Genome Institute of Singapore, Singapore, Singapore</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>My lab developed the software EDDA (http://edda.gis.a-star.edu.sg/;&#x00a0;https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0527-7) which has partly overlapping functionality.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>30</day>
                <month>8</month>
                <year>2017</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2017 Nagarajan N</copyright-statement>
                <copyright-year>2017</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport23782" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.11982.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>reject</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>RNAtor looks like a simple and easy to use application. However this could also be its drawback in that it may be too simplistic. The authors should show some evidence that the parts they omit do not significantly impact RNAtor's guidance to users.</p>
            <p> </p>
            <p> Abstract:</p>
            <p> </p>
            <p> 1) Perhaps the first sentence of the abstract should be reworded as it is a bit odd to say that "RNA Sequencing ... understands transcript structures" etc.</p>
            <p> </p>
            <p> 2) Its unnecessary to mention the "number of lanes" when the "number of reads" has been highlighted.</p>
            <p> </p>
            <p> 3) It is s not clear why a mobile application is necessary. Is a web-based tool like EDDA not sufficient?</p>
            <p> </p>
            <p> Introduction:</p>
            <p> </p>
            <p> 4) The last sentence is not very clear in terms of what is being done here.</p>
            <p> </p>
            <p> 5) As it is typical to present and discuss prior work in the introduction, I believe it is important to add this component. Some of this could perhaps come from the current discussion section.</p>
            <p> </p>
            <p> Implementation:</p>
            <p> </p>
            <p> 6) What were the parameters used for running Polyester? In particular, different experimental conditions will likely have different expression profiles, biological variability dictating the dispersion parameters and perhaps other factors like splicing complexity, gene families etc. that all affect RNA-seq analysis. How does RNAtor account for these? How were the various differential analysis softwares run? They usually have cutoffs that can be chosen to make their results more or less stringent.</p>
            <p> </p>
            <p> Results:</p>
            <p> </p>
            <p> 7) The detection of differentially abundant RNAs using RNA-seq is impacted by the relative abundance of the transcript. How is this taken into account in table 1?</p>
            <p> </p>
            <p> 8) In figure 2, are these all true positives? What about false positives? Usually there is a tradeoff and a user needs to be aware of this.</p>
            <p> </p>
            <p> 9) Figure 3 does not have a legend.</p>
            <p> </p>
            <p> 10) It is not clear to me what transcript recovery is referring to and why that is relevant here.</p>
            <p> </p>
            <p> 11) Supplementary figure 4 needs better resolution and font sizes.</p>
            <p> </p>
            <p> 12) The claims made in the section "Detection specificity and transcript recovery by DE tools" are not obvious from the figures shown.</p>
            <p> </p>
            <p> Discussion:</p>
            <p> </p>
            <p> 13) It seems to me that there are many more limitations in RNAtor than are discussed here. Also, it is appropriate to discuss the strengths and weaknesses of EDDA as well and perhaps some of this should be in the introduction, as noted earlier.</p>
            <p> </p>
            <p> </p>
            <p> </p>
            <p>Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?</p>
            <p>No</p>
            <p>Is the rationale for developing the new software tool clearly explained?</p>
            <p>No</p>
            <p>Is the description of the software tool technically sound?</p>
            <p>No</p>
            <p>Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?</p>
            <p>Partly</p>
            <p>Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>Genomics, Computational Biology</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.</p>
        </body>
        <sub-article article-type="response" id="comment3174-23782">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Panda</surname>
                            <given-names>Binay</given-names>
                        </name>
                        <aff>Ganit Labs, Bio-IT Centre, Institute of Bioinformatics and Applied Biotechnology, India</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>10</day>
                    <month>11</month>
                    <year>2017</year>
                </pub-date>
            </front-stub>
            <body>
                <p>We thank Dr. Niranjan Nagarajan for reviewing the manuscript and providing his comments. Following his suggestions, we have revised the manuscript and have responded to all his queries below. We believe this has improved the manuscript.</p>
                <p> </p>
                <p> Reviewer&#x2019;s comments:</p>
                <p> RNAtor looks like a simple and easy to use application. However this could also be its drawback in that it may be too simplistic. The authors should show some evidence that the parts they omit do not significantly impact RNAtor's guidance to users.</p>
                <p> </p>
                <p> Authors&#x2019; response:</p>
                <p> We agree with the reviewer that the RNAtor v1.0 is simplistic and some of the assumptions can affect the recommendations. However, the tool is intended to be used by biologists who use RNA-seq as a replacement to existing technology like microarray and qPCR, to ask simpler biological questions. Many small biology labs, we believe, are still in this group. The tool is not intended to serve advanced users. We are well aware of its limitations and intend to expand the scope of the tool in its future versions, by introducing biases that mimick various experimental conditions into the simulation phase. Nevertheless, we believe that the validation of the recommendations resulting from training on simulated RNA-seq data that has not yet incorporated various biological biases, with real data from 
                    <italic>Saccharomyces cerevisiae</italic> provides strong evidence that our assumptions do not significantly impact RNAtor's guidance to users.</p>
                <p> </p>
                <p> Abstract:</p>
                <p> </p>
                <p> Reviewer&#x2019;s comments:</p>
                <p> </p>
                <p> 1) Perhaps the first sentence of the abstract should be reworded as it is a bit odd to say that "RNA Sequencing ... understands transcript structures" etc.</p>
                <p> Authors&#x2019; response:</p>
                <p> We thank the reviewer to point this. We have re-worded this sentence in the revised manuscript.</p>
                <p> </p>
                <p> Reviewer&#x2019;s comments:</p>
                <p> </p>
                <p> 2) Its unnecessary to mention the "number of lanes" when the "number of reads" has been highlighted.</p>
                <p> Authors&#x2019; response:</p>
                <p> Following the reviewer&#x2019;s suggestion, we have removed the use of the term in the revised manuscript.</p>
                <p> </p>
                <p> Reviewer&#x2019;s comments:</p>
                <p> </p>
                <p> 3) It is s not clear why a mobile application is necessary. Is a web-based tool like EDDA not sufficient?</p>
                <p> Authors&#x2019; response:</p>
                <p> We agree with the reviewer that there are web-based tools, like EDDA and Scotty, which we have now elaborated in the revised manuscript, that can do similar job. However, we strongly believe that a&#x00a0; mobile application offers a lot more flexibility, ease of navigation, and user-friendliness, compared to a web-based tool. Besides, many features of the App are available for offline use and the user has the flexibility to use it on the work bench or anywhere convenient. Additionally, Apps use a different framework to run code and hence, run faster than mobile websites that typically use javascript to run their code, and store the data locally on the mobile device, unlike websites that store data on web servers.</p>
                <p> </p>
                <p> Introduction:</p>
                <p> </p>
                <p> Reviewer&#x2019;s comments:</p>
                <p> </p>
                <p> 4) The last sentence is not very clear in terms of what is being done here.</p>
                <p> Authors&#x2019; response:</p>
                <p> Following the reviewer&#x2019;s suggestion, we have modified the sentence in the revised manuscript.</p>
                <p> </p>
                <p> Reviewer&#x2019;s comments:</p>
                <p> </p>
                <p> 5) As it is typical to present and discuss prior work in the introduction, I believe it is important to add this component. Some of this could perhaps come from the current discussion section.</p>
                <p> Authors&#x2019; response:</p>
                <p> We have now modified the introduction to include references on tools aiding RNA-seq experimental design.</p>
                <p> </p>
                <p> </p>
                <p> Implementation:</p>
                <p> </p>
                <p> Reviewer&#x2019;s comments:</p>
                <p> </p>
                <p> 6) What were the parameters used for running Polyester? In particular, different experimental conditions will likely have different expression profiles, biological variability dictating the dispersion parameters and perhaps other factors like splicing complexity, gene families etc. that all affect RNA-seq analysis. How does RNAtor account for these? How were the various differential analysis softwares run? They usually have cutoffs that can be chosen to make their results more or less stringent.</p>
                <p> Authors&#x2019; response:</p>
                <p> The per_reads_transcript, num_reps, fold_changes, in addition to the input &#x2018;fasta&#x2019; and output &#x2018;outdir&#x2019; parameters were exercised in polyester simulations. All differential expression analysis softwares were run with default cut-offs.</p>
                <p> We have now mentioned these in the revised manuscript</p>
                <p> </p>
                <p> Results:</p>
                <p> </p>
                <p> Reviewer&#x2019;s comments:</p>
                <p> 7) The detection of differentially abundant RNAs using RNA-seq is impacted by the relative abundance of the transcript. How is this taken into account in table 1?</p>
                <p> </p>
                <p> Authors&#x2019; response:</p>
                <p> In the current version of the tool implementation, the simulated data does not take into account the relative abundance of transcript, for e.g. in relation to a control gene or any other gene of interest. Hence, our recommendations (Table 1) are not affected by these. However, differences in relative abundances of true control genes between treatment and control would lead to over- or under-estimation of DEGs, and therefore, the base-line assumption of log
                    <sub>2</sub>FC=0 should be accordingly adjusted.</p>
                <p> We have acknowledged this limitation in the revised manuscript.</p>
                <p> Reviewer&#x2019;s comments:</p>
                <p> 8) In figure 2, are these all true positives? What about false positives? Usually there is a tradeoff and a user needs to be aware of this.</p>
                <p> Authors&#x2019; response:</p>
                <p> Figure 2 are all detected DEGs, 
                    <italic>i.e.,</italic> both true and false positives.&#x00a0; Supplementary Figure S3 gives the true positive and false positive trends separately. Yes, there is a tradeoff between sensitivity (true postives) and specificity (absence of false positives).&#x00a0; Kallisto performs with a high sensitivity while compromising on specificity, while CuffDiff performs with a high specificity while compromising on sensitivity. We have elaborated this in the revised manuscript.</p>
                <p> Reviewer&#x2019;s comments:</p>
                <p> 9) Figure 3 does not have a legend.</p>
                <p> Authors&#x2019; response:</p>
                <p> We thank the reviewer to point this out and have now added a legend to Figure 3.</p>
                <p> </p>
                <p> Reviewer&#x2019;s comments:</p>
                <p> </p>
                <p> 10) It is not clear to me what transcript recovery is referring to and why that is relevant here.</p>
                <p> </p>
                <p> Authors&#x2019; response:</p>
                <p> Transcript recovery refers to the length of the transcript as assembled by Tophat, detected&#x00a0; as differentially expressed by EdgeR or CuffDiff or DESeq2, in relation to the actual length, as per simulations. It is possible to estimate this parameter only for these three tools, since they offer a handle to the actual transcript Ids. This has been clarified in the revised manuscript.</p>
                <p> Reviewer&#x2019;s comments:</p>
                <p> 11) Supplementary figure 4 needs better resolution and font sizes.</p>
                <p> Authors&#x2019; response:</p>
                <p> Supplementary Figure 4 with improved resolution has now been added.</p>
                <p> Reviewer&#x2019;s comments:</p>
                <p> </p>
                <p> 12) The claims made in the section "Detection specificity and transcript recovery by DE tools" are not obvious from the figures shown.</p>
                <p> </p>
                <p> Authors&#x2019; response:</p>
                <p> The results in this section have been elaborated further with reference to Supplementary Figure S4 in the revised manuscript. With reference to the claim made about detection specificity citing Supplementary Figure S3, we found that CuffDiff performs with high specificity at the loss of sensitivity, with the opposite being true for Kallisto-Sleuth.</p>
            </body>
        </sub-article>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report24054">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.12955.r24054</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Komura</surname>
                        <given-names>Daisuke</given-names>
                    </name>
                    <xref ref-type="aff" rid="r24054a1">1</xref>
                    <role>Referee</role>
                </contrib>
                <aff id="r24054a1">
                    <label>1</label>Department of Genomic Pathology, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>6</day>
                <month>7</month>
                <year>2017</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2017 Komura D</copyright-statement>
                <copyright-year>2017</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport24054" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.11982.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The authors developed a new mobile application named RNAtor to assist in the designing of RNA-seq experiments. It provides users with the number of reads that is required for optimal detection of differentially expressed genes at a given fold-change threshold based on simulations and real data.</p>
            <p> </p>
            <p> This is a useful tool for NGS users. However there are some errors and suggestions that need to be fixed.</p>
            <p> Major: 
                <list list-type="order">
                    <list-item>
                        <p>In general, some reads in RNA-seq analysis are removed due to their low sequencing quality or cannot be mapped probably due to library contamination of rRNA or other species&#x2019; RNA. Does this software consider the effect? If not, it could underestimate the number of required reads.</p>
                    </list-item>
                    <list-item>
                        <p>How does RNAtor calculate the number of reads required for DEG detection? The authors claim that it is calculated based on the number of DEGs at its peak in both real and simulated data but the exact algorithm is not shown. Specifically, how were the peak values calculated and how were the peak values of real and simulated data merged?&#x00a0;</p>
                    </list-item>
                </list> &#x00a0;</p>
            <p> Minor:&#x00a0; 
                <list list-type="order">
                    <list-item>
                        <p>Which does the term &#x2018;replicates&#x2019; mean in this manuscript, technical replicate or biological replicate?</p>
                    </list-item>
                    <list-item>
                        <p>Implementation: &#x201c;&#x2026; workflow followed by differential expression analysis using five tools:&#x2026;&#x201d; I think four instead of five is correct because Kallisto does not use outputs of Tophat-Cufflinks pipeline (Supplementary Figure 1).</p>
                    </list-item>
                    <list-item>
                        <p>Figure 2: How many DEGs were included in total in the simulated datasets? Sensitivity (%) is preferable to the number of DEGs in this case. The number of replicate is a discrete value but the slope in this figure is smooth. What do the numbers in fold change (1.2 0.83-5, 0.2) mean?</p>
                    </list-item>
                    <list-item>
                        <p>Figure 3: Why were the result of 5x and 4x merged? What does each line indicate? &#x00a0;# reads = 0 means nothing thus should be removed.</p>
                    </list-item>
                </list>
            </p>
            <p>Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?</p>
            <p>Yes</p>
            <p>Is the rationale for developing the new software tool clearly explained?</p>
            <p>Partly</p>
            <p>Is the description of the software tool technically sound?</p>
            <p>Yes</p>
            <p>Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?</p>
            <p>Partly</p>
            <p>Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>NA</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <sub-article article-type="response" id="comment3175-24054">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Panda</surname>
                            <given-names>Binay</given-names>
                        </name>
                        <aff>Ganit Labs, Bio-IT Centre, Institute of Bioinformatics and Applied Biotechnology, India</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>10</day>
                    <month>11</month>
                    <year>2017</year>
                </pub-date>
            </front-stub>
            <body>
                <p>We thank Dr. Daisuke Komura for his time and timely review. We have addressed all his queries below and have incorporated his suggestions in the revised version of the manuscript, which we believe is much improved now.</p>
                <p> </p>
                <p> Response to Reviewers&#x2019; comments</p>
                <p> Major:</p>
                <p> Reviewer&#x2019;s comments:</p>
                <p> 1) In general, some reads in RNA-seq analysis are removed due to their low sequencing quality or cannot be mapped probably due to library contamination of rRNA or other species&#x2019; RNA. Does this software consider the effect? If not, it could underestimate the number of required reads.&#x00a0;</p>
                <p> Authors&#x2019; response:</p>
                <p> No, the software does not consider this effect. The simulated reads correspond to error-free/-corrected reads. The add_error and add_platform_error features are added in a recent release of Polyester, not available when we used Polyester.</p>
                <p> Reviewer&#x2019;s comments:</p>
                <p> 2) How does RNAtor calculate the number of reads required for DEG detection? The authors claim that it is calculated based on the number of DEGs at its peak in both real and simulated data but the exact algorithm is not shown. Specifically, how were the peak values calculated and how were the peak values of real and simulated data merged?&#x00a0;</p>
                <p> Authors&#x2019; response:</p>
                <p> The detected DEGs by various tools were plotted as a function of simulated reads and replicates (Figure 2). The saturation point of the DEG curves, i.e. the peak was used to recommend a certain no of reads given the no of replicates, to detect optimal DEGs. Validation for this recommendation was obtained from extrapolation of the read numbers according to the Saccharomyces transcriptome size (Figure 3).</p>
                <p> Minor:&#x00a0;</p>
                <p> Reviewer&#x2019;s comments:</p>
                <p> 3) Which does the term &#x2018;replicates&#x2019; mean in this manuscript, technical replicate or biological replicate?</p>
                <p> Authors&#x2019; response:</p>
                <p> We thank the reviewer to point this out. The replicates used were technical. We have made this change in the revised manuscript.</p>
                <p> Reviewer&#x2019;s comments:</p>
                <p> 4) Implementation: &#x201c;&#x2026; workflow followed by differential expression analysis using five tools:&#x2026;&#x201d; I think four instead of five is correct because Kallisto does not use outputs of Tophat-Cufflinks pipeline (Supplementary Figure 1).</p>
                <p> Authors&#x2019; response:</p>
                <p> Kallisto is used twice; first, with the genome-guided paradigm and second, with 
                    <italic>de novo</italic> assembly using Trinity. In the first scenario, the Tophat-Cufflinks alignments (.bam) were converted to reads (.fastq) to be used with Kallisto along with the 3MB transcriptome as the reference. In the second scenario, the 
                    <italic>de novo</italic> assembled transcriptome was used as the reference along with the simulated reads with Kallisto. This has been clarified in the revised manuscript.</p>
                <p> Reviewer&#x2019;s comments:</p>
                <p> 5) Figure 2: How many DEGs were included in total in the simulated datasets? Sensitivity (%) is preferable to the number of DEGs in this case. The number of replicate is a discrete value but the slope in this figure is smooth. What do the numbers in fold change (1.2 0.83-5, 0.2) &#x00a0;mean? Figure 2: I think "fold change (1.2,0.83 - 5,0.2)" is confusing so it might be better to change it to another one such as ((&gt;1.2 or &lt;1/1.2) to (&gt;5 or &lt;1/5)).</p>
                <p> Authors&#x2019; response:</p>
                <p> </p>
                <p> The total no of simulated DEGs are 363. TP and FP curves are represented in Supplementary Figure S3. We have changed the representation for replicate numbers. The simulated fold changes were 1.2X to 5X, in both directions, i.e. 1.2,1/1.2=0.83 and 5,1/5=0.2. Fold change (1.2,0.83 - 5,0.2)" is changed to ((&gt;1.2 or &lt;1/1.2) to (&gt;5 or &lt;1/5)).</p>
                <p> Reviewer&#x2019;s comments:</p>
                <p> 6) Figure 3: Why were the result of 5x and 4x merged? What does each line indicate? &#x00a0;# reads = 0 means nothing thus should be removed. Figure 3:&#x00a0;What do the lines indicate in each subplot?</p>
                <p> Authors&#x2019; response:</p>
                <p> </p>
                <p> The number of DEGs obtained separately at 4X and 5X were not sufficient to draw any conclusions, Hence they were merged as 4X+5X or &gt;4X. We have removed the #reads=0 data point. We have updated the Figure 3 legend.</p>
            </body>
        </sub-article>
    </sub-article>
</article>
