<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="other" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.11355.2</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Software Tool Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                    <subj-group>
                        <subject>Bioinformatics</subject>
                    </subj-group>
                    <subj-group>
                        <subject>Genomics</subject>
                    </subj-group>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>Arkas: Rapid reproducible RNAseq analysis</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 2; peer review: 2 approved]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Colombo</surname>
                        <given-names>Anthony R.</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>J. Triche Jr</surname>
                        <given-names>Timothy</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-5665-946X</uri>
                    <xref ref-type="corresp" rid="c2">b</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Ramsingh</surname>
                        <given-names>Giridharan</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-0706-3584</uri>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>Jane Anne Nohl Division of Division of Hematology and Center for the Study of Blood Diseases, Keck School of Medicine of University of Southern California, Los Angeles, CA, 90033, USA</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:anthonycolombo60@gmail.com">anthonycolombo60@gmail.com</email>
                </corresp>
                <corresp id="c2">
                    <label>b</label>
                    <email xlink:href="mailto:tim.triche@gmail.com">tim.triche@gmail.com</email>
                </corresp>
                <fn fn-type="con">
                    <p>AC wrote the manuscript, and developed the web-application and related software. TJ developed software, and helped the project design. GR wrote the manuscript and contributed to the development of software.</p>
                </fn>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>21</day>
                <month>6</month>
                <year>2017</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2017</year>
            </pub-date>
            <volume>6</volume>
            <elocation-id>586</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>13</day>
                    <month>6</month>
                    <year>2017</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2017 Colombo AR et al.</copyright-statement>
                <copyright-year>2017</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/6-586/pdf"/>
            <abstract>
                <p>The recently introduced Kallisto pseudoaligner has radically simplified the quantification of transcripts in RNA-sequencing experiments. We offer cloud-scale RNAseq pipelines 
                    <italic toggle="yes">Arkas-Quantification</italic>, and 
                    <italic toggle="yes">Arkas-Analysis</italic> available within Illumina&#x2019;s BaseSpace cloud application platform which expedites Kallisto preparatory routines, reliably calculates differential expression, and performs gene-set enrichment of REACTOME pathways. Due to inherit inefficiencies of scale, Illumina's BaseSpace computing platform offers a massively parallel distributive environment improving data management services and data importing. 
                    <italic toggle="yes">Arkas-Quantification</italic> deploys Kallisto for parallel cloud computations and is conveniently integrated downstream from the BaseSpace 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/">Sequence Read Archive</ext-link> (SRA) import/conversion application titled 
                    <italic toggle="yes">
                        <ext-link ext-link-type="uri" xlink:href="https://blog.basespace.illumina.com/2014/12/12/import-data-from-sra-into-basespace/">SRA Import</ext-link>
                    </italic>. 
                    <italic toggle="yes">Arkas-Analysis</italic> annotates the Kallisto results by extracting structured information directly from source FASTA files with per-contig metadata, calculates the differential expression and gene-set enrichment analysis on both coding genes and transcripts. The 
                    <italic toggle="yes">Arkas</italic> cloud pipeline supports ENSEMBL transcriptomes and can be used downstream from the 
                    <italic toggle="yes">SRA Import</italic> facilitating raw sequencing importing, SRA FASTQ conversion, RNA quantification and analysis steps.</p>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>transcriptome</kwd>
                <kwd>sequencing</kwd>
                <kwd>RNAseq</kwd>
                <kwd>automation</kwd>
                <kwd>cloud computing</kwd>
            </kwd-group>
            <funding-group>
                <award-group id="fund-1">
                    <funding-source>Leukemia and Lymphoma Society</funding-source>
                    <award-id>0863-15</award-id>
                </award-group>
                <award-group id="fund-2">
                    <funding-source>Stop Cancer</funding-source>
                </award-group>
                <award-group id="fund-3">
                    <funding-source>Tower Cancer Research Foundation</funding-source>
                </award-group>
                <award-group id="fund-4">
                    <funding-source>Illumina</funding-source>
                </award-group>
                <funding-statement>This project was funded by grants from Leukemia Lymphoma Society-Quest for Cures (0863-15), Illumina (San Diego), STOP Cancer and Tower Cancer Research Foundation.  </funding-statement>
                <funding-statement>
                    <italic>The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</italic>
                </funding-statement>
            </funding-group>
        </article-meta>
        <notes>
            <sec sec-type="version-changes">
                <label>Revised</label>
                <title>Amendments from Version 1</title>
                <p>This revised manuscript eliminates&#x00a0;previously&#x00a0;unrelated discussion points such as, an in-depth examination of Docker and its role in data sharing.&#x00a0; This&#x00a0;version 2 is concise, explicitly stating&#x00a0;motivations for developing the Docker software. &#x00a0;It&#x00a0;includes updated Figure 1, which has swapped the order of images from the previous version Figure 1.&#x00a0; In addition, Supplementary Figures S1 and S2 were added showing the application interface.&#x00a0; Further, this revised manuscript&#x00a0;discusses more relevant topics such as comparing cloud computing platforms.</p>
            </sec>
        </notes>
    </front>
    <body>
        <sec sec-type="intro">
            <title>Introduction</title>
            <p>High-performance computing based bioinformatic workflows have three main subfamilies: in-house computational packages, virtual-machines (VMs), and cloud based computational environments. The in-house approaches are substantially less expensive when raw hardware is in constant use and dedicated support is available, but internal dependencies can limit reproducibility of computational experiments. Specifically, &#x201c;superuser&#x2019;&#x201d; access needed to deploy container-based, succinct code encapsulations (often referred to as "microservices" elsewhere) can run afoul of normal permissions, and the maintenance of broadly usable sets of libraries across nodes for users can lead to shared code dynamically linking to different libraries under various user environments. By contrast, modern cloud-based approaches and parallel computing are forced by necessity to offer a user-friendly platform with high availability to the broadest audience. Platform-as-a-service approaches take this one step further, offering controlled deployment and fault tolerance across potentially unreliable instances provided by third parties such as 
                <ext-link ext-link-type="uri" xlink:href="https://aws.amazon.com/ec2/?sc_channel=PS&amp;sc_campaign=acquisition_US&amp;sc_publisher=google&amp;sc_medium=ec2_b&amp;sc_content=ec2_e&amp;sc_detail=amazon%20aws%20ec2&amp;sc_category=ec2&amp;sc_segment=175055296304&amp;sc_matchtype=e&amp;sc_country=US&amp;s_kwcid=AL!4422!3!175055296304!e!!g!!amazon%20aws%20ec2&amp;ef_id=WO1kEwAAAXzoMabZ:20170411231803:s">Amazon Web Service Elastic Compute Cloud</ext-link> (AWS EC2) and enforcing a standard for encapsulation of developers' services such as 
                <ext-link ext-link-type="uri" xlink:href="https://www.docker.com/">Docker</ext-link>. Within this framework, the user or developer cedes some control of the platform and interface, in exchange for the platform provider handling the details of workflow distribution and execution. This has provided the best compromise of usability and reproducibility when dealing with general audiences. In this regard, the lightweight-container approach exemplified by Docker lead to rapid development and deployment compared to VMs. Combined with versioning of deployments, it is feasible for users to reconstruct results from an earlier point in time, while simultaneously re-evaluating the generated data under state-of-the-art implementations.</p>
            <p>Docker offers advantages for reproducible research practices, and also is the principal infrastructure to leading platforms such as 
                <ext-link ext-link-type="uri" xlink:href="https://blog.basespace.illumina.com/">Illumina's BaseSpace</ext-link> platform, 
                <ext-link ext-link-type="uri" xlink:href="https://cloud.google.com/genomics/">Google Genomics</ext-link>, 
                <ext-link ext-link-type="uri" xlink:href="https://galaxyproject.org/">Galaxy</ext-link> and 
                <ext-link ext-link-type="uri" xlink:href="https://www.sevenbridges.com/">SevenBridges</ext-link>. Cloud computational ecosystems preserve developmental environments using Docker containerization framework, and improves bioinformatic validation. Containerized cloud applications form part of the global distributive effort and are favorable over local in-house computational pipelines because they offer rapid access to numerous public workflows, easy interfacing to archived read databases, and accelerate the upholding process of raw data.</p>
            <p>A major bottleneck in RNAseq analysis is the processing steps for importing raw data. The majority of RNAseq analysis pipelines consist of read preparation steps, followed by computationally expensive alignment against a reference. Software for calculating transcript abundance and assembly can surpass 30 hours of computational time
                <sup>
                    <xref ref-type="bibr" rid="ref-1">1</xref>
                </sup>. If known or putative transcripts of defined sequences are the primary interest, then pseudoalignment, which is defined as near-optimal RNAseq transcript quantification, is achievable in minutes on a standard laptop using Kallisto software
                <sup>
                    <xref ref-type="bibr" rid="ref-1">1</xref>
                </sup>. 
                <italic toggle="yes">Arkas</italic> was developed using a simple framework, yet massively parallel, for RNAseq transcript quantification that would allow users to expedite pseudoalignment on arbitrary datasets, and significantly reduce the amount of required preparatory routines.</p>
            <p> In collaboration with Illumina (San Diego, USA) the available 
                <ext-link ext-link-type="uri" xlink:href="https://basespace.illumina.com/apps/">BaseSpace</ext-link> platform was already well-suited for parallel transformation of raw sequencing data into analytical results. BaseSpace has an available application 
                <italic toggle="yes">SRA Import</italic> which automates SRA importing and FASTQ conversion pre-processing steps. The application 
                <italic toggle="yes">SRA Import</italic> is simple requiring the SRA accession number and limits imports to 25gb per application call. 
                <italic toggle="yes">Arkas</italic> can ingest successfully imported samples avoiding all raw data handling. For example, if one were interested in re-analyzing an experiment from SRA with reads totaling 141.3GB, 
                <italic toggle="yes">Arkas</italic> facilitates SRA processing and state-of-the-art pseudoalignment by reducing raw sequencing data to summary quantifications totaling 1.63GB and includes an extensive analysis report of less than 10MB. The total data reduction exceeds 4 orders of magnitude with little or no loss of user-visible information. Moreover, the untouched original data is never discarded unless the user explicitly demands it. The appropriate placement of 
                <italic toggle="yes">Arkas</italic> applications adjacent to the origin of sequencing data removes cumbersome data relocation costs and greatly facilitates sequencing archive re-analysis using state-of-the-art pseudoalignment.</p>
            <p>
				
                <italic toggle="yes">Arkas</italic>, encapsulates Kallisto, automates the construction of composite transcriptomes from, quantifies transcript abundances, and implements reproducible rapid differential expression analysis coupled with gene set enrichment analysis. The 
                <italic toggle="yes">Arkas</italic> workflow is versionized into Docker containers and publicly deployed within Illumina's BaseSpace platform which ingests raw RNA sequencing data and completes a full analysis in approximately 2 hours.</p>
        </sec>
        <sec sec-type="methods">
            <title>Methods</title>
            <p>The first step in the 
                <italic toggle="yes">Arkas</italic> pipeline requires 
                <italic toggle="yes">Arkas-Quantification</italic> to transform all the raw RNA sequencing data of an entire experiment into Kallisto pseudoaligned quantification output data. The second step, 
                <italic toggle="yes">Arkas-Analysis</italic>, requires the pseudoaligned data to be input with respect to a comparison and control group, and returns a comprehensive analysis including differential expression, and gene-set enrichment.</p>
            <p>If the user selects the defaults, 
                <italic toggle="yes">Arkas-Quantification</italic> will complete pseudoalignment in approximately 43&#x2013;60 minutes. 
                <italic toggle="yes">Arkas-Quantification</italic> completion time is independent of the number of samples input, but is restricted to node availability (AWS EC2 node availability is fairly high). 
                <italic toggle="yes">Arkas-Analysis,</italic> will complete using a single node in approximately 1&#x2013;1.5 hours for moderate sample group sizes (N &#x2264; 20), and under 2&#x2013;2.5 hours for much larger designs.</p>
            <sec>
                <title>Arkas-Quantification implementation</title>
                <p>Arkas is a two-step cloud pipeline. 
                    <italic toggle="yes">Arkas-Quantification</italic> is the first step, which reduces the computational steps required to quantify and annotate large numbers of samples against large catalogs of transcriptomes. 
                    <italic toggle="yes">Arkas-Quantification</italic> calls Kallisto for on-the-fly transcriptome indexing and quantification recursively for numerous sample directories. Kallisto quantifies transcript abundance from input RNAseq reads by using pseudoalignment, which identifies the read-transcript compatibility matrix
                    <sup>
                        <xref ref-type="bibr" rid="ref-1">1 </xref>
                    </sup>. The compatibility matrix is formed by counting the number of reads with the matching alignment; the equivalence class matrix has a much smaller dimension compared to matrices formed by transcripts and read coverage. Computational speed is gained by performing the Expectation Maximization (EM) algorithm over a smaller matrix.</p>
                <p>For RNAseq projects with many sequenced samples, 
                    <italic toggle="yes">Arkas-Quantification</italic> encapsulates expensive transcript quantification preparatory routines, while uniformly preparing Kallisto execution commands within a versionized environment encouraging reproducible protocols. The quantification step automates the index caching, annotation, and quantification associated while running the Kallisto pseudoaligner integrated within the BaseSpace environment. For users interested in quality control checks, BaseSpace offers an independent application 
                    <italic toggle="yes">FastQC</italic> which performs 
                    <ext-link ext-link-type="uri" xlink:href="http://www.bioinformatics.babraham.ac.uk/projects/fastqc/">fastqc</ext-link> on sequencing data. The first step in the pipeline can process raw reads into transcript and pathway collection results within Illumina&#x2019;s BaseSpace cloud platform, quantifying against default transcriptomes such as ERCC spike-ins, ENSEMBL non-coding RNA, or cDNA build 88 for both 
                    <italic toggle="yes">Homo sapiens</italic> and 
                    <italic toggle="yes">Mus musculus</italic>; further the first step supports user uploaded FASTA files for customized analyses. 
                    <italic toggle="yes">Arkas-Quantification</italic> can support microRNAs (miRNA), however we encourage users to analyze miRNAs separately because pseudoalignment requires reducing k-mer size in the Target-DeBruijn Graph (TDBG) to miRNA sequence lengths (ranging from 16&#x2013;22) which can increase path ambiguities. 
                    <italic toggle="yes">Arkas-Quantification</italic> is packaged into a Docker container and is publicly available as a cloud application within BaseSpace.</p>
            </sec>
            <sec>
                <title>Arkas-Analysis implementation</title>
                <p>Previous work
                    <sup>
                        <xref ref-type="bibr" rid="ref-2">2</xref>
                    </sup> has revealed that filtering transcriptomes to exclude lowly-expressed isoforms can improve statistical power, while more-complete transcriptome assemblies improve sensitivity in detecting differential transcript usage. Based on earlier work by Bourgon 
                    <italic toggle="yes">et al.</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref-3">3</xref>
                    </sup>, we included this type of filtering for both gene- and transcript-level analyses within 
                    <italic toggle="yes">Arkas-Analysis</italic>. The analysis pipeline automates annotations of quantification results, resulting in more accurate interpretation of coding and transcript sequences in both basic and clinical studies by just-in-time annotation and visualization.</p>
                <p>
                    <italic toggle="yes">Arkas-Analysis</italic> integrates quality control analysis for experiments that include Ambion spike-in controls, multiple normalization selections for both coding gene and transcript differential expression analysis, and differential gene-set analysis. If ERCC spike-ins, defined by the External RNA Control Consortium
                    <sup>
                        <xref ref-type="bibr" rid="ref-4">4</xref>
                    </sup>, are detected then 
                    <italic toggle="yes">Arkas-Analysis</italic> will calculate Receiver Operator Characteristic (ROC) plots using 'erccdashboard'
                    <sup>
                        <xref ref-type="bibr" rid="ref-5">5</xref>
                    </sup>. The ERCC analysis reports average ERCC Spike amount volume, comparison plots of ERCC volume amount, and normalized ERCC counts (
                    <xref ref-type="fig" rid="f1">Figure 1</xref>).</p>
                <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                    <label>Figure 1. </label>
                    <caption>
                        <title>
                            <italic toggle="yes">Arkas-Analysis</italic> ERCC spike-in Controls Report.</title>
                        <p>
                            <bold>A</bold>) The Receiver Operator Characteristic plot of (detected) ERCC ratios in gene expression experiments. The X-axis shows the False Positive Rate, the Y-axis shows True Positive Rate. 
                            <bold>B</bold>) and 
                            <bold>C</bold>) shows the dynamic range of abundances of ERCC RNA amounts with a linear model fit, and ERCC RNA counts. 
                            <bold>D</bold>) shows a dispersion of mean transcript abundances and the estimated dispersion.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/12854/558d2033-7c3a-4f0b-96b9-14c8a58b8a3f_figure1.gif"/>
                </fig>
                <p>Subsequent analyses import the data structure from SummarizedExperiment and creates a sub-class titled KallistoExperiment that preserves the S4 structure and is convenient for handling assays, phenotypic and genomic data. KallistoExperiment includes GenomicRanges
                    <sup>
                        <xref ref-type="bibr" rid="ref-6">6</xref>
                    </sup>, preserving the ability to handle genomic annotations and alignments, supporting efficient methods for analyzing high-throughput sequencing data. The KallistoExperiment sub-class serves as a general-purpose container for storing feature genomic intervals and pseudoalignment quantification results against a reference genome called by Kallisto. By default KallistoExperiment couples assay data such as the estimated counts, effective length, estimated median absolute deviation, and transcript per million count where each assay data is generated by a Kallisto run; the stored feature data is a GenomicRanges object from 
                    <xref ref-type="bibr" rid="ref-6">6</xref>, storing transcript length, GC content, and genomic intervals.</p>
                <p>Given a KallistoExperiment containing the Kallisto sample abundances, principal component analysis (PCA) is performed
                    <sup>
                        <xref ref-type="bibr" rid="ref-7">7</xref>
                    </sup> on trimmed mean of M-value (TMM) normalized counts
                    <sup>
                        <xref ref-type="bibr" rid="ref-8">8</xref>
                    </sup> (
                    <xref ref-type="fig" rid="f2">Figure 2A</xref>). Differential expression (DE) is calculated on the library normalized transcript expression values, and the aggregated transcript bundles of corresponding coding genes using limma/voom linear model
                    <sup>
                        <xref ref-type="bibr" rid="ref-9">9</xref>
                    </sup> (
                    <xref ref-type="fig" rid="f3">Figure 3A</xref>). In addition to library normalization, we wished to add an optional data driven normalization. In the analysis pipeline, an unsupervised normalization method would not require more than a two group experimental design which was favorable due to its simplicity. Alternatively, supervised data driven normalization is a specialized task which requires users to define batch groups, and/or additional experimental groups. Further, the adjusted data must be evaluated in the context of the experiment. In-silico normalization, using factor analysis, effectively removes unwanted variation driven entirely by data
                    <sup>
                        <xref ref-type="bibr" rid="ref-10">10</xref>
                    </sup>.</p>
                <fig fig-type="figure" id="f2" orientation="portrait" position="float">
                    <label>Figure 2. </label>
                    <caption>
                        <title>
                            <italic toggle="yes">Arkas-Analysis</italic> Normalization Report: Normalization Analysis Using TMM and RUV.</title>
                        <p>
                            <bold>A</bold>) TMM normalization is performed on sample data and depicts the sample quantiles on normalized sample expression, PCA plot, and histogram of the adjusted p-values calculated from the DE analysis. Orange is the comparison group and green is the control group. 
                            <bold>B</bold>) A similar analysis is performed with RUV in-silico normalization.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/12854/558d2033-7c3a-4f0b-96b9-14c8a58b8a3f_figure2.gif"/>
                </fig>
                <p>The analysis report returns comprehensive visualization results. PCA and DE analysis of both transcripts and coding genes is performed with easily interpretable images (
                    <xref ref-type="fig" rid="f2">Figure 2B</xref>, 
                    <xref ref-type="fig" rid="f3">Figure 3B</xref>, 
                    <xref ref-type="fig" rid="f3">Figure 3C</xref>). In each DE analysis FDR filtering method is defaulted to 'Benjamini-Hochberg', if there are no resultant DE genes/transcripts the FDR methods is switched to 'none'. 
                    <italic toggle="yes">Arkas-Analysis</italic> consumes the Kallisto data output from 
                    <italic toggle="yes">Arkas-Quantification</italic>, and automates DE analysis using TMM normalization and in-silico normalization on both transcript and coding gene expression in a defaulted two group experimental design, allowing customized selections. One must examine and compare the PCA sample clustering, and sample boxplots between the two methods to determine the improvement of in-silico normalization (
                    <xref ref-type="fig" rid="f2">Figure 2</xref>). If RUV improves the PCA clustering within the context of an experiment, and reduces the number of outliers observed in boxplots then it is likely that the normalization weights are useful.</p>
                <fig fig-type="figure" id="f3" orientation="portrait" position="float">
                    <label>Figure 3. </label>
                    <caption>
                        <title>
                            <italic toggle="yes">Arkas-Analysis</italic> Differential Expression Report: DE using TMM and RUV.</title>
                        <p>
                            <bold>A</bold>) DE analysis using TMM normalization. The X-axis is the sample names (test data), the Y-axis are Gene symbols (HUGO). Expression values are plotted in log
                            <sub>10</sub> 1+TPM. 
                            <bold>B</bold>) Similar analysis using RUV normalization. 
                            <bold>C</bold>) The design matrix with the RUV adjusted weights. The sample names are test data used in demonstrating the general analysis report output.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/12854/558d2033-7c3a-4f0b-96b9-14c8a58b8a3f_figure3.gif"/>
                </fig>
                <p>Gene set differential expression, which includes gene-gene correlation inflation corrections, is calculated using Qusage
                    <sup>
                        <xref ref-type="bibr" rid="ref-11">11</xref>
                    </sup>. Qusage calculates the variance inflation factor, which corrects the inter-gene correlation that results in high type 1 errors using pooled or non-pooled variances between experimental groups. The gene set enrichment is conducted using 
                    <ext-link ext-link-type="uri" xlink:href="http://www.reactome.org/">Reactome</ext-link> pathways constructed using ENSEMBL transcript/gene identifiers (
                    <xref ref-type="fig" rid="f4">Figure 4</xref> and 
                    <xref ref-type="table" rid="T1">Table 1</xref>); REACTOME gene sets are not as large as other databases, so 
                    <italic toggle="yes">Arkas-Analysis</italic> outputs DE analysis in formats compatible with more exhaustive databases such as 
                    <ext-link ext-link-type="uri" xlink:href="http://www.advaitabio.com/">Advaita</ext-link>. The DE files are compatible as a custom upload into Advaita iPathway guide, which offers an extensive Gene Ontology (GO) pathway analysis. Pathway enrichment analysis can be performed from the BaseSpace cloud system downstream from parallel differential expression analysis and can integrate with other pathway analysis software tools.</p>
                <fig fig-type="figure" id="f4" orientation="portrait" position="float">
                    <label>Figure 4. </label>
                    <caption>
                        <title>
                            <italic toggle="yes">Arkas-Analysis</italic> Gene-Set Enrichment Plot.</title>
                        <p>Gene-Set enrichment output report, each point represents the differential mean activity of each gene-set with 95% confidence intervals. The X-axis are individual gene-sets. The Y-axis is the log
                            <sub>2</sub> fold change.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/12854/558d2033-7c3a-4f0b-96b9-14c8a58b8a3f_figure4.gif"/>
                </fig>
                <table-wrap id="T1" orientation="portrait" position="anchor">
                    <label>Table 1. </label>
                    <caption>
                        <title>
                            <italic toggle="yes">Arkas-Analysis</italic> Gene-Set Enrichment Statistics.</title>
                        <p>The columns represent the Reactome pathway name corresponding to the depicted pathways in 
                            <xref ref-type="fig" rid="f4">Figure 4</xref>, the log
                            <sub>2</sub>fold change, p-value, adjusted FDR, and an active link to the Reactome website with visual depictions of the gene/transcript pathway. 
                            <italic toggle="yes">Arkas-Analysis</italic> will output a similar report testing transcript-level sets.</p>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1">Pathway name</th>
                                <th align="left" colspan="1" rowspan="1">Log fold
                                    <break/>change</th>
                                <th align="left" colspan="1" rowspan="1">P.value</th>
                                <th align="left" colspan="1" rowspan="1">FDR</th>
                                <th align="left" colspan="1" rowspan="1">Gene URL</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">R-HAS-1989781</td>
                                <td align="right" colspan="1" rowspan="1">-0.87</td>
                                <td align="right" colspan="1" rowspan="1">0.0008</td>
                                <td align="right" colspan="1" rowspan="1">0.06</td>
                                <td align="left" colspan="1" rowspan="1">
                                    <ext-link ext-link-type="uri" xlink:href="http://www.reactome.org/PathwayBrowser/#/R-HSA-1989781">http://www.reactome.org/PathwayBrowser/#/R-HSA-1989781</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">R-HAS-2173796</td>
                                <td align="right" colspan="1" rowspan="1">-0.51</td>
                                <td align="right" colspan="1" rowspan="1">0.007</td>
                                <td align="right" colspan="1" rowspan="1">0.217</td>
                                <td align="left" colspan="1" rowspan="1">
                                    <ext-link ext-link-type="uri" xlink:href="http://www.reactome.org/PathwayBrowser/#/R-HSA-2173796">http://www.reactome.org/PathwayBrowser/#/R-HSA-2173796</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">R-HAS-6804759</td>
                                <td align="right" colspan="1" rowspan="1">-1.62</td>
                                <td align="right" colspan="1" rowspan="1">0.009</td>
                                <td align="right" colspan="1" rowspan="1">0.217</td>
                                <td align="left" colspan="1" rowspan="1">
                                    <ext-link ext-link-type="uri" xlink:href="http://www.reactome.org/PathwayBrowser/#/R-HSA-6804759">http://www.reactome.org/PathwayBrowser/#/R-HSA-6804759</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">R-HAS-381038</td>
                                <td align="right" colspan="1" rowspan="1">-0.43</td>
                                <td align="right" colspan="1" rowspan="1">0.013</td>
                                <td align="right" colspan="1" rowspan="1">0.226</td>
                                <td align="left" colspan="1" rowspan="1">
                                    <ext-link ext-link-type="uri" xlink:href="http://www.reactome.org/PathwayBrowser/#/R-HSA-381038">http://www.reactome.org/PathwayBrowser/#/R-HSA-381038</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">R-HAS-2559585</td>
                                <td align="right" colspan="1" rowspan="1">-0.4</td>
                                <td align="right" colspan="1" rowspan="1">0.032</td>
                                <td align="right" colspan="1" rowspan="1">0.341</td>
                                <td align="left" colspan="1" rowspan="1">
                                    <ext-link ext-link-type="uri" xlink:href="http://www.reactome.org/PathwayBrowser/#/R-HSA-2559585">http://www.reactome.org/PathwayBrowser/#/R-HSA-2559585</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">R-HAS-4086398</td>
                                <td align="right" colspan="1" rowspan="1">-0.95</td>
                                <td align="right" colspan="1" rowspan="1">0.033</td>
                                <td align="right" colspan="1" rowspan="1">0.341</td>
                                <td align="left" colspan="1" rowspan="1">
                                    <ext-link ext-link-type="uri" xlink:href="http://www.reactome.org/PathwayBrowser/#/R-HSA-4086398">http://www.reactome.org/PathwayBrowser/#/R-HSA-4086398</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">R-HAS-4641265</td>
                                <td align="right" colspan="1" rowspan="1">-0.95</td>
                                <td align="right" colspan="1" rowspan="1">0.033</td>
                                <td align="right" colspan="1" rowspan="1">0.341</td>
                                <td align="left" colspan="1" rowspan="1">
                                    <ext-link ext-link-type="uri" xlink:href="http://www.reactome.org/PathwayBrowser/#/R-HSA-4641265">http://www.reactome.org/PathwayBrowser/#/R-HSA-4641265</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">R-HAS-422085</td>
                                <td align="right" colspan="1" rowspan="1">-1.17</td>
                                <td align="right" colspan="1" rowspan="1">0.04</td>
                                <td align="right" colspan="1" rowspan="1">0.361</td>
                                <td align="left" colspan="1" rowspan="1">
                                    <ext-link ext-link-type="uri" xlink:href="http://www.reactome.org/PathwayBrowser/#/R-HSA-422085">http://www.reactome.org/PathwayBrowser/#/R-HSA-422085</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">R-HAS-5467345</td>
                                <td align="right" colspan="1" rowspan="1">-0.56</td>
                                <td align="right" colspan="1" rowspan="1">0.069</td>
                                <td align="right" colspan="1" rowspan="1">0.389</td>
                                <td align="left" colspan="1" rowspan="1">
                                    <ext-link ext-link-type="uri" xlink:href="http://www.reactome.org/PathwayBrowser/#/R-HSA-5467345">http://www.reactome.org/PathwayBrowser/#/R-HSA-5467345</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">R-HAS-6804754</td>
                                <td align="right" colspan="1" rowspan="1">-0.57</td>
                                <td align="right" colspan="1" rowspan="1">0.07</td>
                                <td align="right" colspan="1" rowspan="1">0.389</td>
                                <td align="left" colspan="1" rowspan="1">
                                    <ext-link ext-link-type="uri" xlink:href="http://www.reactome.org/PathwayBrowser/#/R-HSA-6804754">http://www.reactome.org/PathwayBrowser/#/R-HSA-6804754</ext-link>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1">R-HAS-6803204</td>
                                <td align="right" colspan="1" rowspan="1">-1.19</td>
                                <td align="right" colspan="1" rowspan="1">0.081</td>
                                <td align="right" colspan="1" rowspan="1">0.389</td>
                                <td align="left" colspan="1" rowspan="1">
                                    <ext-link ext-link-type="uri" xlink:href="http://www.reactome.org/PathwayBrowser/#/R-HSA-6803204">http://www.reactome.org/PathwayBrowser/#/R-HSA-6803204</ext-link>
                                </td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
            </sec>
            <sec>
                <title>Data variance between software versions</title>
                <p>We wished to show the importance of enforcing matching versions of Kallisto when quantifying transcripts because there is deviation of data between versions. Due to updated versions and improvements of Kallisto software, there obviously exists variation of data between algorithm versions (
                    <xref ref-type="fig" rid="f5">Figure 5</xref>, 
                    <xref ref-type="other" rid="ST1">Supplementary Table 1</xref>, 
                    <xref ref-type="other" rid="ST2">Supplementary Table 2</xref>). We calculated the standardized mean differences, and the variation of the differences between the same 5 samples from Kallisto (setting bootstraps = 42) versions 0.43 and 0.43.1 (
                    <xref ref-type="other" rid="ST2">Supplementary Table 2</xref>), and found large variation of differences between raw values generated by differing Kallisto versions, signifying the importance of version analysis of Kallisto results.</p>
                <fig fig-type="figure" id="f5" orientation="portrait" position="float">
                    <label>Figure 5. </label>
                    <caption>
                        <title>Quantile-Quantile Plots of Data Variation Comparing Differences in Kallisto Data from Versions 0.43.1 and 0.43.0.</title>
                        <p>The X-axis depicts the theoretical quantiles of the standardized mean differences. The Y-axis represents the observed quantiles of standardized mean differences.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/12854/558d2033-7c3a-4f0b-96b9-14c8a58b8a3f_figure5.gif"/>
                </fig>
                <p>
                    <xref ref-type="other" rid="ST1">Supplementary Table 1</xref> shows the variation of the errors of the raw values such as estimated counts, effective length, and estimated median absolute deviation using the same Kallisto version 0.43.0. As expected, Kallisto data generated by the same Kallisto version had very low variation of errors within the same version 0.43.0 for every transcript across all samples. However, upon comparing Kallisto version 0.43.1 to version 43.0 using the raw data such as estimate abundance counts, effective length, estimated median absolute deviation, and transcript per million values, we found, as expected, large variation of data.</p>
                <p>
                    <xref ref-type="other" rid="ST2">Supplementary Table 2</xref> shows that there is large variation of the differences of Kallisto data calculated between differing versions. 
                    <xref ref-type="fig" rid="f5">Figure 5</xref> depicts the standardized mean differences, i.e. errors, between Kallisto versions fitted to a theoretical normal distribution. The quantile-quantile plots show that the errors are marginally normal, with a consistent line centered near 0 but also large outliers (
                    <xref ref-type="fig" rid="f5">Figure 5</xref>). As expected, containerizing analysis pipelines will enforce versionized software, which benefits reproducible analyses.</p>
                <p>The Dockerization of Arkas BaseSpace applications versionizes the Kallisto reference index to enforce that the Kallisto software versions are identical, and further documents the Kallisto version used in every cloud analysis. The enforcement of reference versions and Kallisto software versions prevents errors when comparing experiments.</p>
            </sec>
            <sec>
                <title>Operation</title>
                <p>
				
                    <italic toggle="yes">Arkas-Quantification</italic> instructions are provided within BaseSpace (details for new users can be found 
                    <ext-link ext-link-type="uri" xlink:href="https://blog.basespace.illumina.com/2015/09/30/newfeatures/">here</ext-link>). 
                    <italic toggle="yes">Arkas</italic> is a web style format, but can also be launched using the command line using 
                    <ext-link ext-link-type="uri" xlink:href="https://help.basespace.illumina.com/articles/descriptive/basespace-cli/">BaseSpace Command Line Interface</ext-link>. The inputs are RNA sequencing samples, which may include SRA imported reads, and the outputs include the Kallisto data, .tar.gz files of the Kallisto sample data, and a report summary (
                    <xref ref-type="other" rid="SF1">Supplementary Figure 1</xref> and 
                    <xref ref-type="other" rid="SF2">Supplementary Figure 2</xref>). Users may select for species type (
                    <italic toggle="yes">Homo sapiens</italic> or 
                    <italic toggle="yes">Mus musculus</italic>), optionally correct for read length bias, and optionally select for the generation of pseudoBAMs. More significantly, users have the option to use the default transcriptome (ENSEMBL build 88) or to upload a custom FASTA of their choosing. For users that wish for local analysis, they can download the sample .tar.gz Kallisto files and analyze the data locally.</p>
                <p>The 
                    <italic toggle="yes">Arkas-Analysis</italic> instructions are provided within the BaseSpace environment. The input for the analysis app is the 
                    <italic toggle="yes">Arkas-Quantification</italic> sample data, and the output files are separated into corresponding folders. The analysis also depicts figures for each respective analysis (
                    <xref ref-type="fig" rid="f1">Figure 1</xref>&#x2013;
                    <xref ref-type="fig" rid="f4">Figure 4</xref>) and the images can be downloaded as a HTML format.</p>
            </sec>
            <sec>
                <title>Annotation of coding genes and transcripts</title>
                <p>The extraction of genomic and functional annotations directly from FASTA contig comments, eliding sometimes-unreliable dependencies on services such as 
                    <ext-link ext-link-type="uri" xlink:href="http://www.biomart.org/">BioMart</ext-link>, are calculated rapidly. The annotations were performed with a run time of 2.336 seconds (
                    <xref ref-type="other" rid="ST3">Supplementary Table 3</xref>) which merged the previous Kallisto data from 5 samples, creating a KallistoExperiment class with feature data containing a GenomicRanges
                    <sup>
                        <xref ref-type="bibr" rid="ref-11">11</xref>
                    </sup> object with 213782 ranges and 9 metadata columns. The system runtime for creating a merged KallistoExperiment class for 5 samples was 23.551 seconds (
                    <xref ref-type="other" rid="ST4">Supplementary Table 4</xref>).</p>
            </sec>
        </sec>
        <sec sec-type="discussion">
            <title>Discussion</title>
            <sec>
                <title>Complete transcriptomes enrich annotation information, improving downstream analyses</title>
                <p>The choice of catalog, and the type of quantification performed, influence the results of sequencing analysis. ENSEMBL reference genomes are provided to GENCODE as a merged database from Havana's manually curated annotations with ENSEMBL's automatic curated coordinates. AceView, UCSC, RefSeq, and GENCODE have approximately twenty thousand protein coding genes, however AceView and GENCODE have a greater number of protein coding transcripts in their databases. RefSeq and UCSC references have less than 60,000 protein coding transcripts, whereas GENCODE has 140,066 protein coding loci. AceView has 160,000 protein coding transcripts, but this database is not manually curated. GENCODE is annotated with special attention given to long non-coding RNAs (lncRNAs) and pseudogenes, improving annotations and coupling automated labeling with manual curating. The database selected for protein coding transcripts can influence the amount of annotation information returned when querying gene/transcript level databases.</p>
                <p>Although previously overlooked, lncRNAs have been shown to share features and alternate splice variants with mRNA, revealing that lncRNAs play a central role in metastasis, cell growth and cell invasion
                    <sup>
                        <xref ref-type="bibr" rid="ref-12">12</xref>
                    </sup>. LncRNA transcripts have been shown to be functional and are associated with cancer prognosis. 
                    <italic toggle="yes">Arkas&#x2019;</italic> default transcriptomes include ENSEMBL (build 88) cDNA and non-coding RNA reference sequences.</p>
            </sec>
            <sec>
                <title>Arkas cloud pipeline: modern and simple</title>
                <p>Recent developments for virtualized operating systems, such as Docker, allow for local software environments to be preserved, whereas cloud platforms deploy the preserved software. Docker allows users to build layers of read/write access files, creating a portable operating system which exhaustively controls software versions and data, while systematically preserving the pipeline software. Currently, Docker is the principal infrastructure for cloud bioinformatic computational software platforms such as Illumina's BaseSpace platform, 
                    <ext-link ext-link-type="uri" xlink:href="https://cloud.google.com/genomics/">Google Genomics</ext-link>, 
                    <ext-link ext-link-type="uri" xlink:href="https://www.sevenbridges.com/">SevenBridges</ext-link>, and 
                    <ext-link ext-link-type="uri" xlink:href="https://galaxyproject.org/">Galaxy</ext-link>. </p>
                <p>The Google Cloud Platform supports popular languages such as Python, Node, and Ruby with services related to computing, and storage. Google Genomics Platform has a steeper learning curve recommending familiarity with services such as 
                    <ext-link ext-link-type="uri" xlink:href="https://cloud.google.com/compute/docs/">Compute Engine</ext-link>, and 
                    <ext-link ext-link-type="uri" xlink:href="https://cloud.google.com/storage/docs/">Cloud Storage</ext-link>. Google Genomics hosts cloud storage transfer services for importing source data to storage buckets from HTTP/HTTPS locations. Data management services outside the Google Genomics platform, such as 
                    <ext-link ext-link-type="uri" xlink:href="https://www.globus.org">Globus</ext-link>, serves SRA database which can interact with Google Genomics applications reducing the bottleneck of SRA downloads.</p>
                <p>Google offers very cost effective means for analyzing data, but requires expensive preparatory routines. Tatlow 
                    <italic toggle="yes">et al.</italic> employed Kallisto to pseudoalign 12,307 RNA-sequencing samples by renting preemptible VMs from Google Cloud Platform for as little as $0.09 per sample. Tatlow 
                    <italic toggle="yes">et al.</italic> pseudoaligned 1,811 breast carcinoma samples completing on average in 101 minutes, and 934 Cancer Cell Line Encyclopedia (CCLE) BAM files completing in 84.7 minutes on average. However large scale efforts require specialized knowledge for controlling containers (e.g. 
                    <ext-link ext-link-type="uri" xlink:href="https://kubernetes.io/">Kubernetes</ext-link>), manage resources, and queues. Although Tatlow 
                    <italic toggle="yes">et al.</italic> skillfully employed a cost-effective implementation of RNA-Seq analysis of massive databases, they mention a critical need for reducing the preprocessing routines involved in cloud computing
                    <sup>
                        <xref ref-type="bibr" rid="ref-13">13</xref>
                    </sup>.</p>
                <p>Galaxy offers shared workflows and analytical pipelines but is limited in the services related to storage due to the usage of public servers. In this light private storage platforms can flexibly store experimental data, although the range of analysis tools is not as wide compared to open-source platforms. Galaxy offers many usable tools with a wide range of visualization pipelines. In contrast, BaseSpace offers tools to accomplish specific tasks at the expense of lowering the learning curve, which may be attractive for researchers interested in immediate, and verifiable, results.</p>
                <p>BaseSpace offers other RNAseq tools and another analysis pipeline 
                    <italic toggle="yes">RNAExpress</italic> which reduces preparatory routines
                    <italic toggle="yes">. RNAExpress</italic> runs DESeq2 and can be used to cross validate 
                    <italic toggle="yes">Arkas-Analysis</italic>. DESeq2 uses a negative binomial distribution to model differential expression, whereas 
                    <italic toggle="yes">Arkas</italic> implements limma/voom empirical Bayes analysis pipeline. 
                    <italic toggle="yes">RNAExpress</italic> completed in 109 minutes comparing 4 controls and 4 comparison samples. Using the same samples, 
                    <italic toggle="yes">Arkas-Quantification</italic> completed in 42 minutes, and 
                    <italic toggle="yes">Arkas-Analysis</italic> completed in 54 minutes. Illumina&#x2019;s BaseSpace catalog of modern, yet simple, tools are attractive for users wishing share sessions, and to rapidly (re)analyze entire experiment(s).</p>
            </sec>
        </sec>
        <sec sec-type="conclusions">
            <title>Conclusion</title>
            <p>
                <italic toggle="yes">Arkas</italic> integrates the Kallisto pseudoalignment algorithm into the BaseSpace cloud computation ecosystem that can implement large-scale parallel ultra-fast transcript abundance quantification. We reduce a computational bottleneck by freeing inefficiencies from utilizing rapid transcript abundance calculations and connecting accelerated quantification software to the Sequencing Read Archive. We remove the second bottleneck because we reduce the necessity of database downloading; instead we encourage users to download aggregated analysis results. We also expand the range of common sequencing protocols to include an improved gene-set enrichment algorithm, Qusage, and allow for exporting into an exhaustive pathway analysis platform, Advaita, over the AWS EC2 field in parallel.</p>
        </sec>
        <sec>
            <title>Data availability</title>
            <sec>
                <title>Data used in testing variation between versions</title>
                <p>
                    <bold>Controls:</bold> SRR1544480 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/SRX675930%5Bbaccn%5D">Immortal-1</ext-link>
                </p>
                <p>SRR1544481 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/SRX675931%5Bbaccn%5D">Immortal-2</ext-link>
                </p>
                <p>SRR1544482 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/SRX675932%5Bbaccn%5D">Immortal-3</ext-link>
                </p>
                <p>
                    <bold>Comparison:</bold> SRR1544501 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/SRX675951%5Bbaccn%5D">Qui-1</ext-link>
                </p>
                <p>SRR1544502 
                    <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/sra/SRX675952%5Bbaccn%5D">Qui-2</ext-link>
                </p>
            </sec>
        </sec>
        <sec>
            <title>Software availability</title>
            <p>Latest source code:</p>
            <p>
                <ext-link ext-link-type="uri" xlink:href="https://github.com/RamsinghLab/Arkas-RNASeq">https://github.com/RamsinghLab/Arkas-RNASeq</ext-link>
            </p>
            <p>Archived source code as at the time of publication:</p>
            <p>DOI: 
                <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.5281/zenodo.545654">10.5281/zenodo.545654</ext-link>
                <sup>21</sup>
			</p>
            <p>License:</p>
            <p>MIT license</p>
            <sec>
                <title>Reference FASTA annotation files</title>
                <p>For Homo-sapiens and Mus-musculus ENSEMBL FASTA files were downloaded 
                    <ext-link ext-link-type="uri" xlink:href="http://www.ensembl.org/index.html">here</ext-link> for release 88.</p>
            </sec>
            <sec>
                <title>ERCC sequences</title>
                <p>The ERCC sequences are provided in a SQL database format located 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/RamsinghLab/Arkas-RNASeq/tree/master/RLibraries/ErccDbLite.ERCC.97">here</ext-link>
                </p>
            </sec>
        </sec>
    </body>
    <back>
        <sec sec-type="supplementary-material">
            <title>Supplementary material</title>
            <p id="ST1">
                <bold>Supplementary Table 1: Data variation with matching Kallisto versions.</bold> This shows the variation of mean differences between data using the matching Kallisto version 0.43.0. The rows represent the samples from the first run using version 0.43.0. The columns represent the samples from an additional run with version 0.43.0.</p>
            <p>
                <ext-link ext-link-type="uri" xlink:href="https://f1000researchdata.s3.amazonaws.com/supplementary/11355/50fcb0bf-5010-495e-8979-78aa77f5017c.csv">Click here to access the data</ext-link>.</p>
            <p id="ST2">
                <bold>Supplementary Table 2: Data variation with non-matching Kallisto versions.</bold> Variation of mean differences between non-matching Kallisto versions and a randomly selected run previously generated (
                <xref ref-type="other" rid="ST1">Supplementary Table 1</xref>). The rows are samples run using version 0.43.0, the columns are runs using version 0.43.1.</p>
            <p>
                <ext-link ext-link-type="uri" xlink:href="https://f1000researchdata.s3.amazonaws.com/supplementary/11355/ba96d3f8-811b-40d7-af1b-e6d1bc6cb237.csv">Click here to access the data</ext-link>.</p>
            <p id="ST3">
                <bold>Supplementary Table 3: Annotation runtime.</bold> System runtime for full annotation of a merged KallistoExperiment (seconds). The columns represent system runtime, the Elapsed Time is the total runtime.</p>
            <p>
                <ext-link ext-link-type="uri" xlink:href="https://f1000researchdata.s3.amazonaws.com/supplementary/11355/0f170781-3b0d-4448-b22b-267b7f9bd9be.csv">Click here to access the data</ext-link>.</p>
            <p id="ST4">
                <bold>Supplementary Table 4: KallistoExperiment formation runtime.</bold> System runtime for the creation of a merged KallistoExperiment (seconds). The columns are similar to 
                <xref ref-type="other" rid="ST3">Supplementary Table 3</xref>.</p>
            <p>
                <ext-link ext-link-type="uri" xlink:href="https://f1000researchdata.s3.amazonaws.com/supplementary/11355/3db5ad94-7d32-4e64-a208-f33b7a735c83.csv">Click here to access the data</ext-link>.</p>
            <p id="SF1">
                <bold>Supplementary Figure S1: 
                    <italic toggle="yes">Arkas-Quantification</italic> Web-Style user interface.</bold> The input field for 
                <bold>A</bold>) 
                <italic toggle="yes">Arkas-Quantification</italic> and 
                <bold>B</bold>) 
                <italic toggle="yes">Arkas-Analysis</italic> demonstrating SRA re-quantification. The control and comparison samples were obtained using BaseSpace 
                <italic toggle="yes">SRA Import</italic> application, and were input into the 
                <italic toggle="yes">Arkas</italic> pipeline.</p>
            <p>
                <ext-link ext-link-type="uri" xlink:href="https://f1000researchdata.s3.amazonaws.com/supplementary/11355/30b449ba-29a9-46c3-8b32-9e39452552d5.pdf">Click here to access the data</ext-link>.</p>
            <p id="SF2">
                <bold>Supplementary Figure S2: 
                    <italic toggle="yes">Arkas-Quantification</italic> output directory. A</bold>) Depicts the 
                <italic toggle="yes">Arkas-Quantification</italic> output directory which includes sample folders containing Kallisto data. 
                <bold>B</bold>) Depicts the contents of a specific folder output by 
                <italic toggle="yes">Arkas-Quantification</italic>.</p>
            <p>
                <ext-link ext-link-type="uri" xlink:href="https://f1000researchdata.s3.amazonaws.com/supplementary/11355/d88cfb22-3bd3-4781-a7d7-c24ce6f6250b.pdf">Click here to access the data</ext-link>.</p>
        </sec>
        <ref-list>
            <ref id="ref-1">
                <label>1</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Bray</surname>
                            <given-names>NL</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Pimentel</surname>
                            <given-names>H</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Melsted</surname>
                            <given-names>P</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Near-optimal probabilistic RNA-seq quantification.</article-title>
                    <source>
						
                        <italic toggle="yes">Nat Biotechnol.</italic>
					</source>
                    <year>2016</year>;<volume>34</volume>(<issue>5</issue>):<fpage>525</fpage>&#x2013;<lpage>527</lpage>.
                    <pub-id pub-id-type="pmid">27043002</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nbt.3519</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-2">
                <label>2</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Soneson</surname>
                            <given-names>C</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Matthes</surname>
                            <given-names>KL</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Nowicka</surname>
                            <given-names>M</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Isoform prefiltering improves performance of count-based methods for analysis of differential transcript usage.</article-title>
                    <source>
						
                        <italic toggle="yes">Genome Biol.</italic>
					</source>
                    <year>2016</year>;<volume>17</volume>:<fpage>12</fpage>.
                    <pub-id pub-id-type="pmid">26813113</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13059-015-0862-3</pub-id>
                    <pub-id pub-id-type="pmcid">4729156</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-3">
                <label>3</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Bourgon</surname>
                            <given-names>R</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Gentleman</surname>
                            <given-names>R</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Huber</surname>
                            <given-names>W</given-names>
                        </name>
					</person-group>:
                    <article-title>Independent filtering increases detection power for high-throughput experiments.</article-title>
                    <source>
						
                        <italic toggle="yes">Proc Natl Acad Sci U S A.</italic>
					</source>
                    <year>2010</year>;<volume>107</volume>(<issue>21</issue>):<fpage>9546</fpage>&#x2013;<lpage>9551</lpage>.
                    <pub-id pub-id-type="pmid">20460310</pub-id>
                    <pub-id pub-id-type="doi">10.1073/pnas.0914005107</pub-id>
                    <pub-id pub-id-type="pmcid">2906865</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-4">
                <label>4</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Baker</surname>
                            <given-names>SC</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Bauer</surname>
                            <given-names>SR</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Beyer</surname>
                            <given-names>RP</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>The External RNA Controls Consortium: a progress report.</article-title>
                    <source>
						
                        <italic toggle="yes">Nat Methods.</italic>
					</source>
                    <year>2005</year>;<volume>2</volume>(<issue>10</issue>):<fpage>731</fpage>&#x2013;<lpage>734</lpage>.
                    <pub-id pub-id-type="pmid">16179916</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nmeth1005-731</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-5">
                <label>5</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Munro</surname>
                            <given-names>SA</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Lund</surname>
                            <given-names>SP</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Pine</surname>
                            <given-names>PS</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Assessing technical performance in differential gene expression experiments with external spike-in RNA control ratio mixtures.</article-title>
                    <source>
						
                        <italic toggle="yes">Nat Commun.</italic>
					</source>
                    <year>2014</year>;<volume>5</volume>:<fpage>5125</fpage>.
                    <pub-id pub-id-type="pmid">25254650</pub-id>
                    <pub-id pub-id-type="doi">10.1038/ncomms6125</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-6">
                <label>6</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Lawrence</surname>
                            <given-names>M</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Huber</surname>
                            <given-names>W</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Pag&#x00e8;s</surname>
                            <given-names>H</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Software for computing and annotating genomic ranges.</article-title>
                    <source>
						
                        <italic toggle="yes">PLoS Comput Biol.</italic>
					</source>
                    <year>2013</year>;<volume>9</volume>(<issue>8</issue>):<fpage>e1003118</fpage>.
                    <pub-id pub-id-type="pmid">23950696</pub-id>
                    <pub-id pub-id-type="doi">10.1371/journal.pcbi.1003118</pub-id>
                    <pub-id pub-id-type="pmcid">3738458</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-7">
                <label>7</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Risso</surname>
                            <given-names>D</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Schwartz</surname>
                            <given-names>K</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Sherlock</surname>
                            <given-names>G</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>GC-content normalization for RNA-Seq data.</article-title>
                    <source>
						
                        <italic toggle="yes">BMC Bioinformatics.</italic>
					</source>
                    <year>2011</year>;<volume>12</volume>:<fpage>480</fpage>.
                    <pub-id pub-id-type="pmid">22177264</pub-id>
                    <pub-id pub-id-type="doi">10.1186/1471-2105-12-480</pub-id>
                    <pub-id pub-id-type="pmcid">3315510</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-8">
                <label>8</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Robinson</surname>
                            <given-names>MD</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>McCarthy</surname>
                            <given-names>DJ</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Smyth</surname>
                            <given-names>GK</given-names>
                        </name>
					</person-group>:
                    <article-title>edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.</article-title>
                    <source>
						
                        <italic toggle="yes">Bioinformatics.</italic>
					</source>
                    <year>2010</year>;<volume>26</volume>(<issue>1</issue>):<fpage>139</fpage>&#x2013;<lpage>140</lpage>.
                    <pub-id pub-id-type="pmid">19910308</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btp616</pub-id>
                    <pub-id pub-id-type="pmcid">2796818</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-9">
                <label>9</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Ritchie</surname>
                            <given-names>ME</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Phipson</surname>
                            <given-names>B</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Wu</surname>
                            <given-names>D</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>
                        <italic toggle="yes">limma</italic> powers differential expression analyses for RNA-sequencing and microarray studies.</article-title>
                    <source>
						
                        <italic toggle="yes">Nucleic Acids Res.</italic>
					</source>
                    <year>2015</year>;<volume>43</volume>(<issue>7</issue>):<fpage>e47</fpage>.
                    <pub-id pub-id-type="pmid">25605792</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkv007</pub-id>
                    <pub-id pub-id-type="pmcid">4402510</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-10">
                <label>10</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Risso</surname>
                            <given-names>D</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Ngai</surname>
                            <given-names>J</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Speed</surname>
                            <given-names>TP</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Normalization of RNA-seq data using factor analysis of control genes or samples.</article-title>
                    <source>
						
                        <italic toggle="yes">Nat Biotechnol.</italic>
					</source>
                    <year>2014</year>;<volume>32</volume>(<issue>9</issue>):<fpage>896</fpage>&#x2013;<lpage>902</lpage>.
                    <pub-id pub-id-type="pmid">25150836</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nbt.2931</pub-id>
                    <pub-id pub-id-type="pmcid">4404308</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-11">
                <label>11</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Yaari</surname>
                            <given-names>G</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Bolen</surname>
                            <given-names>CR</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Thakar</surname>
                            <given-names>J</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Quantitative set analysis for gene expression: a method to quantify gene set differential expression including gene-gene correlations.</article-title>
                    <source>
						
                        <italic toggle="yes">Nucleic Acids Res.</italic>
					</source>
                    <year>2013</year>;<volume>41</volume>(<issue>18</issue>):<fpage>e170</fpage>.
                    <pub-id pub-id-type="pmid">23921631</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkt660</pub-id>
                    <pub-id pub-id-type="pmcid">3794608</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-12">
                <label>12</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Mitra</surname>
                            <given-names>SA</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Mitra</surname>
                            <given-names>AP</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Triche</surname>
                            <given-names>TJ</given-names>
                        </name>
					</person-group>:
                    <article-title>A central role for long non-coding RNA in cancer.</article-title>
                    <source>
						
                        <italic toggle="yes">Front Genet.</italic>
					</source>
                    <year>2012</year>;<volume>3</volume>:<fpage>17</fpage>.
                    <pub-id pub-id-type="pmid">22363342</pub-id>
                    <pub-id pub-id-type="doi">10.3389/fgene.2012.00017</pub-id>
                    <pub-id pub-id-type="pmcid">3279698</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-13">
                <label>13</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Tatlow</surname>
                            <given-names>PJ</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Piccolo</surname>
                            <given-names>SR</given-names>
                        </name>
						</person-group>:
                    <article-title>A cloud-based workflow to quantify transcript-expression levels in public cancer compendia.</article-title>
                    <source>
						
                        <italic toggle="yes">Sci Rep.</italic>
					</source>
                    <year>2016</year>;<volume>6</volume>:<fpage>39259</fpage>.
                    <pub-id pub-id-type="pmid">27982081</pub-id>
                    <pub-id pub-id-type="doi">10.1038/srep39259</pub-id>
                    <pub-id pub-id-type="pmcid">5159871</pub-id>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report23689">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.12854.r23689</article-id>
            <title-group>
                <article-title>Reviewer response for version 2</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Pimentel</surname>
                        <given-names>Harold</given-names>
                    </name>
                    <xref ref-type="aff" rid="r23689a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-8556-2499</uri>
                </contrib>
                <aff id="r23689a1">
                    <label>1</label>Department of Genetics, Stanford University, Stanford, CA, USA</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>7</day>
                <month>8</month>
                <year>2017</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2017 Pimentel H</copyright-statement>
                <copyright-year>2017</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport23689" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.11355.2"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>Hello Colombo 
                <italic>et al.</italic>,</p>
            <p> </p>
            <p> Firstly: I am so very sorry for such a late review.</p>
            <p> </p>
            <p> Anyway, the new manuscript looks much better. Thanks for the revisions.</p>
            <p> </p>
            <p> I just have one nitpick: in Figure 2 you show the p-value distribution over the range (0, 0.05). Perhaps I missed something, but I'm not sure I completely understand the value of showing over this interval rather than the whole interval (0, 1). Does it have to do with the normalization adjusting p-values specifically in this range?</p>
            <p> </p>
            <p> Regardless -- nice work and congrats!</p>
            <p>Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?</p>
            <p>Yes</p>
            <p>Is the rationale for developing the new software tool clearly explained?</p>
            <p>Partly</p>
            <p>Is the description of the software tool technically sound?</p>
            <p>Yes</p>
            <p>Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?</p>
            <p>Yes</p>
            <p>Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>RNA-Seq analysis methods and data analysis</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report22616">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.12258.r22616</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Abel</surname>
                        <given-names>Ted</given-names>
                    </name>
                    <xref ref-type="aff" rid="r22616a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-2423-4592</uri>
                </contrib>
                <contrib contrib-type="author">
                    <name>
                        <surname>Gaine</surname>
                        <given-names>Marie</given-names>
                    </name>
                    <xref ref-type="aff" rid="r22616a1">1</xref>
                    <role>Co-referee</role>
                </contrib>
                <aff id="r22616a1">
                    <label>1</label>Iowa Neuroscience Institute, University of Iowa, Iowa, USA</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>24</day>
                <month>5</month>
                <year>2017</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2017 Abel T and Gaine M</copyright-statement>
                <copyright-year>2017</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport22616" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.11355.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>This paper introduces a RNA-Seq analysis pipeline, Arkas, which combines currently available tools typically used in RNA-Seq studies. The novelty of this pipeline is the encapsulation of tools needed to prepare the data, run quality control checks, analyze the data and perform secondary analyses. This is especially beneficial for investigators new to RNA-Seq analysis with little experience navigating through computational tools. The authors take care to outline the rationale behind creating an easy-to-use interface and how this will increase reproducibility and consistency across RNA-Seq studies.&#x00a0; They emphasize the importance of consistency with versions by showing differing results between two Kallisto versions.</p>
            <p> </p>
            <p> However, there are some minor limitations also found in this study:</p>
            <p> It would be beneficial to include quality control checks at the beginning of the pipeline to generate data regarding the inputted sequencing files.</p>
            <p> </p>
            <p> It would be interesting to see more processing time information to show the benefit of using this pipeline compared to similar methods.</p>
            <p> </p>
            <p> As is discussed, the inclusion of lncRNAs increases the amount of potentially interesting results from this pipeline. However, the authors have chosen to ignore microRNAs, an important regulator of cellular function. The inclusion of microRNAs as a default option in this pipeline would provide even more potentially interesting results.</p>
            <p> </p>
            <p> The normalization steps and Figure 2 should be discussed in more detail. Specifically, expand on the reasons for choosing these two methods and the differences between the methods and their outputs. In addition, a note about how a user should select a normalization type would help new users.</p>
            <p> Whilst the authors suggest that the integration of Docker will help produce reproducible research methods, the in-depth look into Docker is unnecessary, as no data has been provided to show its benefit above other options.</p>
            <p>Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?</p>
            <p>Yes</p>
            <p>Is the rationale for developing the new software tool clearly explained?</p>
            <p>Yes</p>
            <p>Is the description of the software tool technically sound?</p>
            <p>Yes</p>
            <p>Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?</p>
            <p>Yes</p>
            <p>Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>Molecular neuroscience</p>
            <p>We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
        <sub-article article-type="response" id="comment2767-22616">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Ramsingh</surname>
                            <given-names>Giridharan</given-names>
                        </name>
                        <aff>University of Southern California, USA</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>None.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>8</day>
                    <month>6</month>
                    <year>2017</year>
                </pub-date>
            </front-stub>
            <body>
                <p>Thank you very much Dr. Abel for your insightful review.&#x00a0;</p>
                <p>The revised manuscript removed the in-depth discussion of Docker because it was too broad.&#x00a0; The revised version included a discussion section that compares processing times between Google Genomics, and another BaseSpace application.&#x00a0;</p>
                <p>Your comments helped address the analysis of microRNAs.&#x00a0; For example, Kallisto can process smaller FASTA sequences, however this invokes limitations to the construction of the Target DeBruijn Graph by increasing the path ambiguity of longer read sequences.&#x00a0; The revised manuscript now addressed this limitation, and suggested that users analyze microRNAs separately.&#x00a0; This analysis feature is not yet a default, but would be a great future addition.&#x00a0; We further address details in regard to normalization motivation and selection.</p>
                <p>As suggested by the first reviewer Dr. Pimentel, we have significantly reduced the broad discussion section, and explicitly described the motivation for the development of 
                    <italic>Arkas</italic>.&#x00a0; We have additionally revised the 'Methods' section to provide a brief overview of the applications, and clearer descriptions of the interface style that included Supplementary Figures depicting both interfaces.&#x00a0;</p>
                <p>
                    <italic>"This paper introduces a RNA-Seq analysis pipeline, Arkas, which combines currently available tools typically used in RNA-Seq studies. The novelty of this pipeline is the encapsulation of tools needed to prepare the data, run quality control checks, analyze the data and perform secondary analyses. This is especially beneficial for investigators new to RNA-Seq analysis with little experience navigating through computational tools. The authors take care to outline the rationale behind creating an easy-to-use interface and how this will increase reproducibility and consistency across RNA-Seq studies.&#x00a0; They emphasize the importance of consistency with versions by showing differing results between two Kallisto versions.</italic>
                </p>
                <p>
                    <italic>However, there are some minor limitations also found in this study:</italic>
                </p>
                <p>
                    <italic>It would be beneficial to include quality control checks at the beginning of the pipeline to generate data regarding the inputted sequencing files."</italic>
                </p>
                <p>Thank you for this suggestion.&#x00a0; Analyzing read quality will guide users into the important decision to filter low quality reads, however 
                    <italic>Arkas </italic>was not designed to address this.&#x00a0; In the revised manuscript, we have now mentioned another independent BaseSpace application 
                    <italic>FastQC </italic>which can assess read quality.&#x00a0; For users interested in manually uploading sequencing data to BaseSpace, each read must pass a quality filter.&#x00a0; This quality filter will automatically reject poor quality reads, and for this we designed 
                    <italic>Arkas </italic>with the assumption that sequenced reads input were of good quality.&#x00a0;</p>
                <p>
                    <italic>"It would be interesting to see more processing time information to show the benefit of using this pipeline compared to similar methods."</italic>
                </p>
                <p>Thank you very much for addressing processing times. The revised manuscript significantly reduced the discussion section to comparisons of processing times. &#x00a0;Your remarks inspired the addition of processing times of 
                    <italic>Arkas.&#x00a0; </italic>We&#x2019;ve included further information comparing the processing time to another BaseSpace application 
                    <italic>RNAExpress. </italic>Further, we added processing time information of a different Kallisto analysis pipeline implemented over Google Genomics Platform.&#x00a0; The discussion section now is far more concise with greater relevance toward the functionality of our developed software. &#x00a0;&#x00a0;</p>
                <p>
                    <italic>"As is discussed, the inclusion of lncRNAs increases the amount of potentially interesting results from this pipeline. However, the authors have chosen to ignore microRNAs, an important regulator of cellular function. The inclusion of microRNAs as a default option in this pipeline would provide even more potentially interesting results."</italic>
                </p>
                <p>Including microRNAs is a very great idea.&#x00a0; 
                    <italic>Arkas </italic>can quantify microRNAs, but we decided not include microRNAs as default yet.&#x00a0; In the revised manuscript we address that the small sequence sizes are a potential limitation to quantification of cDNAs/ncRNAs because it may increase path ambiguities during the construction of the Target DeBruijn graphs.&#x00a0; Hence, we suggest that users analyze microRNAs separately and locally.&#x00a0; This would be a great additional feature for the next version of 
                    <italic>Arkas.</italic>
                </p>
                <p>
                    <italic>"The normalization steps and Figure 2 should be discussed in more detail. Specifically, expand on the reasons for choosing these two methods and the differences between the methods and their outputs. In addition, a note about how a user should select a normalization type would help new users."</italic>
                </p>
                <p>Thank you for addressing this.&#x00a0; The revised manuscript has now explicitly stated how end-users may decide a selection of the normalization type.&#x00a0; We further provide a brief explanation to why unsupervised normalization was selected.&#x00a0;&#x00a0;</p>
                <p>
                    <italic>"Whilst the authors suggest that the integration of Docker will help produce reproducible research methods, the in-depth look into Docker is unnecessary, as no data has been provided to show its benefit above other options."</italic>
                </p>
                <p>&#x00a0;We agree that the discussion of Docker was too broad, and the revised discussion is focused on comparative performance from other cloud platforms.</p>
            </body>
        </sub-article>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report22282">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.12258.r22282</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Pimentel</surname>
                        <given-names>Harold</given-names>
                    </name>
                    <xref ref-type="aff" rid="r22282a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0001-8556-2499</uri>
                </contrib>
                <aff id="r22282a1">
                    <label>1</label>Department of Genetics, Stanford University, Stanford, CA, USA</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>I am a co-author of the kallisto tool, one of the tools that is used in this pipeline.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>18</day>
                <month>5</month>
                <year>2017</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2017 Pimentel H</copyright-statement>
                <copyright-year>2017</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport22282" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.11355.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>Note: I am a co-author of the kallisto tool, one of the tools that is used in this pipeline.</p>
            <p> </p>
            <p> Colombo et al. describe Arkas, a tool that takes raw RNA-Seq data and produces several different types of downstream analyses. Arkas leverages existing analysis tools (e.g. kallisto and limma) and platforms (Illumina BaseSpace) to create an easy to use, fast, and reproducible pipeline. A very useful (unique?) feature is that it documents software versions and enforces consistent software versions allowing users to see the potential differences with different software versions. This is made explicit in the "Results" section.</p>
            <p> </p>
            <p> Having all of these tools together greatly reduces the time to setup analyses and also reduces the complexity for RNA-Seq novices who might have no idea where to start. Arkas makes all of the typical figures one might make in a standard RNA-Seq analysis. It also provides gene-set analyses which are often excluded from other pipelines. In my experience, gluing together analyses from differential expression to gene-set analyses can often be an annoyance due to inconsistencies and annotations and versions of these annotations. Arkas nicely solves this problem.</p>
            <p> </p>
            <p> While I think the idea is very good and the tool seems comprehensive, I feel the manuscript needs a bit of work. Here are a few points:</p>
            <p> </p>
            <p> - There are a few areas where the scope seems too broad. In general, I feel that the manuscript can be shortened to be more clear as well as more precise. In particular, the Docker section in the discussion is too broad and the role of Arkas seems lost. I strongly recommend shortening this section and discussing the role of Docker in Arkas more clearly.</p>
            <p> - While the abstract and introduction provide a description of Arkas in RNA-Seq analysis, they do not provide a motivation. It is sort of hinted in several sections in the paper, but it is not explicit. The motivation of building another pipeline should be explicit.</p>
            <p> - How does this pipeline compare to other pipelines such as Galaxy, DNANexus, etc.? Should probably be noted in the introduction/discussion.</p>
            <p> - Perhaps I missed it, but the interface of Arkas does not appear to be described. There is a short subsection "Operation" that doesn't describe the type of interface. It appears to be available on Illumina BaseSpace, but does this make it a commandline tool or an online web form style tool? A short description of this interface and possibly supplementary figures (if it is a web form style) should be provided. This is unclear to folks who are not familiar with BaseSpace.</p>
            <p> - It should be greater emphasized how this tool can be used to reanalyze existing SRA data with relative ease. In my opinion this is a very strong argument as to why one might want a tool like this.</p>
            <p> </p>
            <p> Areas that can be shortened:</p>
            <p> </p>
            <p> - "Data variance between software versions" can be shortened as some of this is repeated in "Results."</p>
            <p> - "Complete transcriptomes enrich annotation information..." Specifics of annotations can probably be removed/condensed. It is probably sufficient to say that some are 3x times larger which can change results drastically.</p>
            <p> - "Docker as a cornerstone of reproducible research" The role of Docker in general can probably be shortened and how Arkas leverages it should be made more clear.</p>
            <p> </p>
            <p> More minor points:</p>
            <p> </p>
            <p> - A short sentence at the beginning of "Methods" should give an overview of the two-step process.</p>
            <p> - The Galaxy Project (https://usegalaxy.org/) should probably be cited even though the scope is a bit different.</p>
            <p> - Figure 1a: "Receiver Operator Characteristic plot" of what? This is stated in the main text, but should also the stated in the figure caption.</p>
            <p> - Swap Figure 1d and 1c.</p>
            <p> - It seems like BaseSpace sessions can easily be shared? If so, this is an additional strong point of using BaseSpace in Arkas.</p>
            <p> </p>
            <p> Overall, I'm very excited to see this comprehensive tool exist and be described in this paper.</p>
            <p>Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?</p>
            <p>Yes</p>
            <p>Is the rationale for developing the new software tool clearly explained?</p>
            <p>Partly</p>
            <p>Is the description of the software tool technically sound?</p>
            <p>Yes</p>
            <p>Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?</p>
            <p>Yes</p>
            <p>Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>RNA-Seq analysis methods and data analysis</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <sub-article article-type="response" id="comment2766-22282">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Ramsingh</surname>
                            <given-names>Giridharan</given-names>
                        </name>
                        <aff>University of Southern California, USA</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>None</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>8</day>
                    <month>6</month>
                    <year>2017</year>
                </pub-date>
            </front-stub>
            <body>
                <p>Thank you very much Dr. Pimentel for your thorough review.&#x00a0; We have significantly reduced the broad discussion section, and narrowed the manuscript to the most important features.&#x00a0; The 'Abstract' and 'Introduction' section was reduced to explicitly state the motivations for the design of 
                    <italic>Arkas.&#x00a0; </italic>In the revised manuscript, the 'Methods' section provides a brief overview of the applications, and the 'Operation' section describes the interface style and includes Supplementary Figures depicting both apps.&#x00a0;</p>
                <p>The second reviewer Dr. Abel also suggested that the in-depth discussion of Docker was too broad.&#x00a0; The revised version includes a discussion section that is compares processing times between Google Genomics, and another BaseSpace application.&#x00a0; We also have now included brief points in regard to Galaxy.</p>
                <p>Your helpful comments helped the manuscript become much more concise. In addition to your remarks, we have addressed important features regarding microRNAs on behalf of the second reviewer.&#x00a0; Kallisto can process smaller FASTA sequences, however we have now addressed that users can analyze microRNAs, but we suggest a separate analysis for this.</p>
                <p>We thank you very much for your revisions and appreciate your thoughtful remarks.&#x00a0;&#x00a0; We believe that addressing your remarks the manuscript is greatly elevated. Below are point-by-point responses to your questions.</p>
                <p>
                    <italic>"Note: I am a co-author of the kallisto tool, one of the tools that is used in this pipeline.</italic>
                </p>
                <p>
                    <italic>Colombo et al. describe Arkas, a tool that takes raw RNA-Seq data and produces several different types of downstream analyses. Arkas leverages existing analysis tools (e.g. kallisto and limma) and platforms (Illumina BaseSpace) to create an easy to use, fast, and reproducible pipeline. A very useful (unique?) feature is that it documents software versions and enforces consistent software versions allowing users to see the potential differences with different software versions. This is made explicit in the "Results" section.</italic>
                </p>
                <p>
                    <italic>Having all of these tools together greatly reduces the time to setup analyses and also reduces the complexity for RNA-Seq novices who might have no idea where to start. Arkas makes all of the typical figures one might make in a standard RNA-Seq analysis. It also provides gene-set analyses which are often excluded from other pipelines. In my experience, gluing together analyses from differential expression to gene-set analyses can often be an annoyance due to inconsistencies and annotations and versions of these annotations. Arkas nicely solves this problem.</italic>
                </p>
                <p>
                    <italic>While I think the idea is very good and the tool seems comprehensive, I feel the manuscript needs a bit of work. Here are a few points:</italic>
                </p>
                <p>
                    <italic>- There are a few areas where the scope seems too broad. In general, I feel that the manuscript can be shortened to be more clear as well as more precise. In particular, the Docker section in the discussion is too broad and the role of Arkas seems lost. I strongly recommend shortening this section and discussing the role of Docker in Arkas more clearly.</italic>"</p>
                <p>Thank you very much for your input. In the revised manuscript, we have narrowed the Docker discussion section to the scope of BaseSpace platform, and have described 
                    <italic>Arkas</italic>' relationship to Docker as an applied infrastructure to this platform.&#x00a0; The previous version of the manuscript detailed the role of Docker in the broad concept of reproducible research.&#x00a0; We have omitted these details. The revised manuscript describes the interdependent relationship between Arkas and Docker in the context of BaseSpace.&#x00a0; For example, Arkas containerized Node.js and R to parse the BaseSpace JSON input information relating to BaseSpace&#x2019;s input fields.&#x00a0; The new manuscript explained that Docker and Arkas are not independent entities, and pertain specifically to BaseSpace.</p>
                <p>
                    <italic>"- While the abstract and introduction provide a description of Arkas in RNA-Seq analysis, they do not provide a motivation. It is sort of hinted in several sections in the paper, but it is not explicit. The motivation of building another pipeline should be explicit.</italic>"</p>
                <p>Thank you for this suggestion.&#x00a0; We have now explicitly provided the motivation for 
                    <italic>Arkas</italic>&#x2019; development by mentioning bottlenecks in RNA-sequencing such as sequencing importing and pre-processing steps, and how 
                    <italic>Arkas </italic>rectifies those bottlenecks.&#x00a0; In the revised version, we illustrate how 
                    <italic>Arkas </italic>was developed downstream from BaseSpace 
                    <ext-link ext-link-type="uri" xlink:href="https://blog.basespace.illumina.com/2014/12/12/import-data-from-sra-into-basespace/">
                        <italic>SRA Import</italic>
                    </ext-link> to greatly reduce importing and conversion steps.&#x00a0; Also, we now explicitly stated the motivation for 
                    <italic>Arkas-Quantification </italic>such that Kallisto was implemented in parallel, which now scales quantification speed to the Amazon AWS EC2 cluster node availability rate.&#x00a0; In addition, the revised manuscript explicitly stated the motivation for 
                    <italic>Arkas-Analysis, </italic>which provides a comprehensive analysis.</p>
                <p>
                    <italic>"- How does this pipeline compare to other pipelines such as Galaxy, DNANexus, etc.? Should probably be noted in the introduction/discussion.</italic>"</p>
                <p>Thank you for this suggestion.&#x00a0; In the revised discussion section, we now compare features of other cloud platforms, and other BaseSpace RNA-Seq applications.&#x00a0; The revised discussion now included processing times of a large scale RNA-seq analysis that implemented Kallisto using Google Genomics Platform.&#x00a0; In addition to Goolgle Genomics, the revised manuscript briefly compares features offered by Galaxy to BaseSpace.&#x00a0; Further we compare 
                    <italic>Arkas </italic>to other BaseSpace RNA-Seq applications.</p>
                <p>
                    <italic>"- Perhaps I missed it, but the interface of Arkas does not appear to be described. There is a short subsection "Operation" that doesn't describe the type of interface. It appears to be available on Illumina BaseSpace, but does this make it a commandline tool or an online web form style tool? A short description of this interface and possibly supplementary figures (if it is a web form style) should be provided. This is unclear to folks who are not familiar with BaseSpace.</italic>"</p>
                <p>Thank you again for this suggestion.&#x00a0; We have included a description explicitly stating that Arkas is a web form style.&#x00a0; In addition, we included two Supplementary Figures to address the web input forms.&#x00a0; Supplementary Figure 1 shows the input form for both web style apps, and Supplementary Figure 2 shows the output folder directory of the 
                    <italic>Arkas-Quantification</italic>.</p>
                <p>
                    <italic>"- It should be greater emphasized how this tool can be used to reanalyze existing SRA data with relative ease. In my opinion this is a very strong argument as to why one might want a tool like this.</italic>"</p>
                <p>Thank you for addressing reanalysis of SRA data.&#x00a0; In the updated manuscript, we now mention that 
                    <italic>Arkas' </italic>design was motivated by the BaseSpace application 
                    <italic>SRA Import.&#x00a0; </italic>The revised introduction now explicitly stated that 
                    <italic>Arkas </italic>is SRA compatible and we have provided citations for readers interested in utilizing this SRA application.&#x00a0;&#x00a0;</p>
                <p>
                    <italic>"Areas that can be shortened:</italic>
                </p>
                <p>
                    <italic>- "Data variance between software versions" can be shortened as some of this is repeated in "Results." '"</italic>
                </p>
                <p>We combined the &#x201c;Data variance between software versions&#x201d; and &#x201c;Results&#x201d; section into an appropriate concise section.</p>
                <p>
                    <italic>- "Complete transcriptomes enrich annotation information..." Specifics of annotations can probably be removed/condensed. It is probably sufficient to say that some are 3x times larger which can change results drastically.</italic>"</p>
                <p>We reduced this discussion to brief specifics of database sizes.&#x00a0; While obvious, we believe that a brief overview provides motivation for the default transcriptomes chosen by 
                    <italic>Arkas.&#x00a0; </italic>In the revised manuscript, we provide a very concise explanation behind the selection of default transcriptomes.</p>
                <p>
                    <italic>- "Docker as a cornerstone of reproducible research" The role of Docker in general can probably be shortened and how Arkas leverages it should be made more clear.</italic>"</p>
                <p>Thank you again for this comment.&#x00a0; We agree that this broad discussion went off topic and may distract future readers.&#x00a0; The manuscript is greatly improved with the removal of the discussion about democratization of research efforts, and biotechnology. We significantly revised the discussion to a comparison of differing cloud platforms and corresponding processing times of other cloud applications.&#x00a0;</p>
                <p>
                    <italic>"More minor points:</italic>
                </p>
                <p>
                    <italic>- A short sentence at the beginning of "Methods" should give an overview of the two-step process.</italic>"</p>
                <p>We provided an overview of 
                    <italic>Arkas </italic>in the section described.</p>
                <p>
                    <italic>"- The Galaxy Project (https://usegalaxy.org/) should probably be cited even though the scope is a bit different.</italic>"</p>
                <p>Galaxy is briefly mentioned in the discussion.&#x00a0; The revised manuscript reviewed and compared processing times of Google Genomics Platform and another RNAseq application within BaseSpace.</p>
                <p>
                    <italic>"- Figure 1a: "Receiver Operator Characteristic plot" of what? This is stated in the main text, but should also the stated in the figure caption.</italic>
                </p>
                <p>
                    <italic>- Swap Figure 1d and 1c.</italic>"</p>
                <p>Thank you for pointing this out.&#x00a0; The revised Figure 1a now states that the Receiver Operator Characteristic plot is for ratios of detected and actual spiked ERCC sequences.&#x00a0; We have swapped Figure1d and Figure 1c.</p>
                <p>
                    <italic>"- It seems like BaseSpace sessions can easily be shared? If so, this is an additional strong point of using BaseSpace in Arkas."</italic>
                </p>
                <p>&#x00a0;We now mention this brief point in the discussion.</p>
                <p>
                    <italic>"Overall, I'm very excited to see this comprehensive tool exist and be described in this paper.</italic>"</p>
                <p>Thank you very much Dr. Pimentel.</p>
            </body>
        </sub-article>
    </sub-article>
</article>
