<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="other" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.17518.1</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Software Tool Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>restfulSE: A semantically rich interface for cloud-scale genomics with Bioconductor</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 1; peer review: 2 approved]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Gopaulakrishnan</surname>
                        <given-names>Shweta</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Pollack</surname>
                        <given-names>Samuela</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Stubbs</surname>
                        <given-names>BJ</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Pag&#x00e8;s</surname>
                        <given-names>Herv&#x00e9; </given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <xref ref-type="aff" rid="a3">3</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Readey</surname>
                        <given-names>John</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <xref ref-type="aff" rid="a4">4</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Davis</surname>
                        <given-names>Sean</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a5">5</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Waldron</surname>
                        <given-names>Levi</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a6">6</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Morgan</surname>
                        <given-names>Martin</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Funding Acquisition</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a7">7</xref>
                </contrib>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Carey</surname>
                        <given-names>Vincent</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Funding Acquisition</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Project Administration</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Validation</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-4046-0063</uri>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>Channing Division of Network Medicine, Harvard Medical School, Boston, Massachusetts, 02115, USA</aff>
                <aff id="a2">
                    <label>2</label>Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, 02115, USA</aff>
                <aff id="a3">
                    <label>3</label>Fred Hutchinson Cancer Research Center, Seattle, Washington, 98109, USA</aff>
                <aff id="a4">
                    <label>4</label>Tools and Cloud Technology, HDF Group, Seattle, WA, 98109, USA</aff>
                <aff id="a5">
                    <label>5</label>Center for Cancer Research, National Cancer Institute, USA, Bethesda, Maryland, 20892, USA</aff>
                <aff id="a6">
                    <label>6</label>Epidemiology and Biostatistics, CUNY School of Public Health, New York, New York, 10027, USA</aff>
                <aff id="a7">
                    <label>7</label>Biostatistics and Bioinformatics, Roswell Park Cancer Institute, Buffalo, New York, 14203, USA</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:stvjc@channing.harvard.edu">stvjc@channing.harvard.edu</email>
                </corresp>
                <fn fn-type="conflict">
                    <p>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>7</day>
                <month>1</month>
                <year>2019</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2019</year>
            </pub-date>
            <volume>8</volume>
            <elocation-id>21</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>19</day>
                    <month>12</month>
                    <year>2018</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2019 Gopaulakrishnan S et al.</copyright-statement>
                <copyright-year>2019</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/8-21/pdf"/>
            <abstract>
                <p>Bioconductor&#x2019;s 
                    <monospace>SummarizedExperiment</monospace> class unites numerical assay quantifications with sample- and experiment-level metadata. 
                    <monospace>SummarizedExperiment</monospace> is the standard Bioconductor class for assays that produce matrix-like data, used by over 200 packages. We describe the 
                    <italic toggle="yes">
                        <ext-link ext-link-type="uri" xlink:href="https://bioconductor.org/packages/3.9/restfulSE">restfulSE</ext-link>
                    </italic> package, a deployment of this data model that supports remote storage. We illustrate use of 
                    <monospace>SummarizedExperiment</monospace> with remote HDF5 and Google BigQuery back ends, with two applications in cancer genomics. Our intent is to allow the use of familiar and semantically meaningful programmatic idioms to query genomic data, while abstracting the remote interface from end users and developers.</p>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>Bioinformatics</kwd>
                <kwd>REST APIs</kwd>
                <kwd>HDF5</kwd>
                <kwd>BigQuery</kwd>
                <kwd>Bioconductor</kwd>
            </kwd-group>
            <funding-group>
                <award-group id="fund-1">
                    <funding-source>Chan Zuckerberg Initiative</funding-source>
                    <award-id>DAF2018-183436</award-id>
                </award-group>
                <award-group id="fund-2" xlink:href="http://dx.doi.org/10.13039/100000002">
                    <funding-source>National Institutes of Health</funding-source>
                    <award-id>NCIU01CA214846</award-id>
                    <award-id>NCIU24CA180996</award-id>
                    <award-id>NHGRI1U24HG010263-01</award-id>
                </award-group>
                <funding-statement>Support for the development of this software was provided by NIH grants NCI U01 CA214846 (Carey, PI), NCI U24 CA180996 (Morgan, PI), and  NHGRI 1U24HG010263-01 (J Taylor, PI), and Chan Zuckerberg Initiative DAF 2018-183436 (Carey, PI).</funding-statement>
                <funding-statement>
                    <italic>The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</italic>
                </funding-statement>
            </funding-group>
        </article-meta>
    </front>
    <body>
        <sec sec-type="intro">
            <title>Introduction</title>
            <p>Analyses of multiomic archives like 
                <ext-link ext-link-type="uri" xlink:href="https://cancergenome.nih.gov/">The Cancer Genome Atlas (TCGA)</ext-link> and single-cell transcriptomic experiments such as the 
                <ext-link ext-link-type="uri" xlink:href="https://support.10xgenomics.com/single-cell-gene-expression/datasets">10x 1.3 million mouse neuron dataset</ext-link> typically begin with downloads of large files and conversion of file contents into formats based on local preferences. In this paper we consider how targeted queries of large remote genomic data resources can be conducted using methods available for Bioconductor&#x2019;s 
                <italic toggle="yes">
                    <ext-link ext-link-type="uri" xlink:href="https://bioconductor.org/packages/3.9/SummarizedExperiment">SummarizedExperiment</ext-link>
                </italic> class. For large data archives that have been centralized in cloud storage, use of this approach can help diminish effort required to manage local storage, and can facilitate interactive analysis of data subsets in familiar programming idioms, without downloading entire datasets. Clients for 
                <ext-link ext-link-type="uri" xlink:href="https://www.hdfgroup.org/">HDF5</ext-link> or 
                <ext-link ext-link-type="uri" xlink:href="https://cloud.google.com/bigquery">Google BigQuery</ext-link> are available in numerous languages; our Bioconductor interface permits access to remote archives of genomic data with familiar and semantically meaningful programmatic idioms, while abstracting the remote interface from end users and developers.</p>
        </sec>
        <sec sec-type="methods">
            <title>Methods: Data structures and remote back ends</title>
            <sec>
                <title>The 
                    <monospace>SummarizedExperiment</monospace> class and related methods</title>
                <p>Let 
                    <italic toggle="yes">Q</italic> denote a matrix of quantifications arising from a genome scale assay with 
                    <italic toggle="yes">G</italic> assay features measured on 
                    <italic toggle="yes">N</italic> experimental samples. The elements of 
                    <italic toggle="yes">Q</italic> are the numbers 
                    <italic toggle="yes">q
                        <sub>ij</sub>
                    </italic>, 
                    <italic toggle="yes">i</italic> = 1, &#x2026; , 
                    <italic toggle="yes">G</italic>, 
                    <italic toggle="yes">j</italic> = 1, &#x2026;, 
                    <italic toggle="yes">N</italic>. Bioconductor&#x2019;s SummarizedExperiment structure manages feature quantifications with associated metadata about assay features and samples.</p>
                <p>In the 10x mouse neuron dataset, 
                    <italic toggle="yes">G</italic> = 27998 and 
                    <italic toggle="yes">N</italic> = 1.3 million. Each of the 
                    <italic toggle="yes">G</italic> features is a gene, and it is useful to have handy a number of feature annotations like gene name, location, functional role; suppose each gene has 
                    <italic toggle="yes">F</italic> such features recorded. When these quantifications and associated annotations are managed in a Bioconductor 
                    <monospace>SummarizedExperiment X</monospace>, the matrix 
                    <italic toggle="yes">Q</italic> is programmatically bound to a 
                    <italic toggle="yes">G</italic> &#x00d7; 
                    <italic toggle="yes">F</italic> table of feature-level metadata accessible by the 
                    <monospace>rowData</monospace> method, and to an 
                    <italic toggle="yes">N &#x00d7; R</italic> table of sample-level metadata accessible by 
                    <monospace>colData</monospace>, where 
                    <italic toggle="yes">R</italic> denotes the number of sample-level metadata features recorded (Huber 
                    <italic toggle="yes">et al.</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref-1">1</xref>
                    </sup>). See 
                    <xref ref-type="fig" rid="f1">Figure 1</xref>.</p>
                <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                    <label>Figure 1. </label>
                    <caption>
                        <title>Schematic of SummarizedExperiment class structure.</title>
                        <p>Colored regions of panels within the schematic are linked with command examples in colored text beneath the panels. For example, the purple command 
                            <monospace>subsetByOverlaps(se, roi)</monospace> would produce a restricted 
                            <monospace>RangedSummarizedExperiment</monospace> instance with features limited to those colored purple. The 
                            <monospace>sizeFactors</monospace> component is specific to a subclass for single cell data.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/19158/eedc82a4-df66-4228-9d52-90b482b3df47_figure1.gif"/>
                </fig>
                <p>In the context of R programming, let 
                    <monospace>K</monospace> denote a vector of feature identifiers, 
                    <monospace>S</monospace> denote a vector of sample identifiers. The standard subsetting idiom 
                    <monospace>X[K,S]</monospace> expresses filtering of the all the information in 
                    <italic toggle="yes">Q</italic> and the associated metadata to features 
                    <monospace>K</monospace> and samples 
                    <monospace>S</monospace>. A 
                    <monospace>GRanges</monospace> instance (Lawrence 
                    <italic toggle="yes">et al.</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref-2">2</xref>
                    </sup>) defining genomic coordinates for features may be bound to 
                    <monospace>X</monospace>, facilitating queries defined by genomic location (using, for example, 
                    <monospace>subsetByOverlaps</monospace>) to isolate features coincident with or near the elements of a set of query genomic ranges (eg., binding peaks). This outline of genomic data representation and analysis is characteristic of Bioconductor.</p>
            </sec>
            <sec>
                <title>Examples of remote back ends</title>
                <p>
                    <bold>
                        <italic toggle="yes">Google BigQuery.</italic>
                    </bold> The Institute for Systems Biology Cancer Genomics Cloud project (ISB-CGC) (ISB
                    <sup>
                        <xref ref-type="bibr" rid="ref-3">3</xref>
                    </sup>) uses Google BigQuery to provide access to various public cancer genomics resources including TCGA and the PanCancer Atlas (Hoadley 
                    <italic toggle="yes">et al.</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref-4">4</xref>
                    </sup>). The 
                    <monospace>pancan_SE</monospace> function of 
                    <italic toggle="yes">
                        <ext-link ext-link-type="uri" xlink:href="https://bioconductor.org/packages/3.9/restfulSE">restfulSE</ext-link>
                    </italic> constructs queries that derive 
                    <monospace>SummarizedExperiment</monospace> instances using quantifications and annotations for PanCancer atlas experiments managed in BigQuery tables.</p>
                <p>
                    <bold>
                        <italic toggle="yes">HDF Scalable Data Service (HSDS)</italic>.</bold> An AWS S3-based distributed data object model for HDF5 datasets, including a RESTful API to structure, populate, and query HDF5 archives, has been implemented by the HDF Group. A number of datasets of interest in bioinformatics are served through 
                    <ext-link ext-link-type="uri" xlink:href="https://www.hdfgroup.org/solutions/hdf-kita/">HDF Kita Lab</ext-link> in the 
                    <monospace>/shared/bioconductor</monospace> folder.</p>
            </sec>
            <sec>
                <title>Lazy data retrieval via DelayedArray</title>
                <p>The 
                    <italic toggle="yes">restfulSE</italic> package provides interfaces to BigQuery and HSDS so that the numerical content housed in these services satisfies the API of the Bioconductor 
                    <italic toggle="yes">
                        <ext-link ext-link-type="uri" xlink:href="https://bioconductor.org/packages/3.9/DelayedArray">DelayedArray</ext-link>
                    </italic> (Pag&#x00e8;s and Hickey
                    <sup>
                        <xref ref-type="bibr" rid="ref-5">5</xref>
                    </sup>). Any 
                    <monospace>DelayedArray</monospace> instance can serve as the 
                    <monospace>assay</monospace> component of a 
                    <monospace>SummarizedExperiment</monospace> instance. Thus the capacities of 
                    <monospace>SummarizedExperiment</monospace> to bind semantically rich metadata to genome-scale assays are extended implicitly to data resources for which no standards exist for associating substantive metadata.</p>
                <p>In conjunction with the 
                    <italic toggle="yes">
                        <ext-link ext-link-type="uri" xlink:href="https://bioconductor.org/packages/3.9/rhdf5client">rhdf5client</ext-link>
                    </italic> and 
                    <italic toggle="yes">
                        <ext-link ext-link-type="uri" xlink:href="https://cran.r-project.org/package%3Dbigrquery">bigrquery</ext-link>
                    </italic> packages, 
                    <italic toggle="yes">restfulSE</italic> functions translate filtering and selection operations which are readily defined using 
                    <monospace>rowData</monospace>, 
                    <monospace>rowRanges</monospace>, 
                    <monospace>colData</monospace> into formal queries resolvable by the HDF5 and BigQuery services. Numerical results are transmitted from server to client only when needed.</p>
            </sec>
        </sec>
        <sec sec-type="results">
            <title>Results</title>
            <p>The RESTful 
                <monospace>SummarizedExperiment</monospace> representation allows complicated research queries to be obtained in a concise, fast, convenient and robust fashion, as illustrated by the following examples.</p>
            <sec>
                <title>Hybrid data/annotation strategy for integrative analysis</title>
                <p>The following code chunk, which generates 
                    <xref ref-type="fig" rid="f2">Figure 2</xref>, illustrates the use of the 
                    <italic toggle="yes">restfulSE</italic> protocol with the ISB-CGC BigQuery back end.</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">
                        <styled-content style="font-size:15px;">library(SummarizedExperiment)
library(BiocOncoTK)       # uses restfulSE for cancer bioinformatics
bq = pancan_BQ()          # need CGC_BILLING to authenticate
seCOAD = buildPancanSE(bq, acronym="COAD", assay="RNASeqv2")
seCOAD = bindMSI(seCOAD)  # update to include MSIsensor scores
par(mfrow=c(1,2))         # figure layout
amap = c("29126"="PD-L1", "925"="CD8A") # entrez:symbol mapping
bxs &lt;- lapply( c("29126", "925"),       # for genes of interest
  function(x) boxplot(split(log2(as.numeric(assay( seCOAD[x,]))+1),
      seCOAD$msiTest &gt;= 4), names = c("&lt;4", "&gt;=4"), ylab=amap[x],
      xlab="MSIsensor score")
  )</styled-content>
                    </preformat>
                </p>
                <p>Our interest is in replicating part of Figure 5C of Bailey 
                    <italic toggle="yes">et al.</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref-6">6</xref>
                    </sup>. In that paper, it is shown that microsatellite instability (MSI) is associated with different expression signatures of immune cell infiltration for adenocarcinomas of colon (COAD) and stomach (STAD), and uterine corpus endometrial carcinoma (UCEC). The MSI scores developed using MSIsensor are found in Table S5 of Ding 
                    <italic toggle="yes">et al.</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref-7">7</xref>
                    </sup>. These scores are not available in BigQuery, but can be combined with the assay data using standard R programming, leading to a hybrid data/annotation strategy.</p>
                <fig fig-type="figure" id="f2" orientation="portrait" position="float">
                    <label>Figure 2. </label>
                    <caption>
                        <title>Association of MSI sensor scores with distributions of PDL-1 and CD8A in TCGA colorectal adenocarcinoma samples (COAD).</title>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/19158/eedc82a4-df66-4228-9d52-90b482b3df47_figure2.gif"/>
                </fig>
                <p>Functions in the 
                    <italic toggle="yes">
                        <ext-link ext-link-type="uri" xlink:href="https://bioconductor.org/packages/3.9/BiocOncoTK">BiocOncoTK</ext-link>
                    </italic> package (Carey
                    <sup>
                        <xref ref-type="bibr" rid="ref-8">8</xref>
                    </sup>) build on 
                    <italic toggle="yes">restfulSE</italic> functionality to a) authenticate the user to the BigQuery platform, b) select a tumor type (COAD) and assay for 
                    <italic toggle="yes">
                        <ext-link ext-link-type="uri" xlink:href="https://bioconductor.org/packages/3.9/SummarizedExperiment">SummarizedExperiment</ext-link>
                    </italic> construction, c) bind Ding 
                    <italic toggle="yes">et al</italic>.&#x2019;s MSI values as sample-level data variable 
                    <monospace>msiTest</monospace>, d) acquire and transform the PD-L1 and CD8A (Entrez IDs 29126 and 925) expression values, and e) form the stratified boxplot. The basic findings of Bailey 
                    <italic toggle="yes">et al.</italic> are replicated. Enhancement of the code to produce a display covering more genes and tumor types is demonstrated in the BiocOncoTK package vignette. Note that in this example, expression values are only downloaded for the genes requested, without altering the end user programming paradigm of working with a SummarizedExperiment instance.</p>
            </sec>
            <sec>
                <title>HDF Scalable Data Service</title>
                <p>
                    <xref ref-type="fig" rid="f3">Figure 3</xref> demonstrates use of a RESTful 
                    <monospace>SummarizedExperiment</monospace>, with assay data provided in the object 
                    <monospace>/shared/bioconductor/darmgcls.h5</monospace> at 
                    <monospace>hsdshdflab.hdfgroup.org</monospace>. Briefly, as a prelude to single-cell RNA-sequencing of glioblastoma (GBM) tumors from four patients, Darmanis 
                    <italic toggle="yes">et al.</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref-9">9</xref>
                    </sup> used immunopanning to increase the proportion of non-neoplastic cells that constitute the &#x201c;migrating front&#x201d; of progression of glioblastoma. Antibody to CD45 was used to capture microglial cells. 
                    <xref ref-type="fig" rid="f3">Figure 3</xref> provides code to compare the distribution of CD45 expression among the classes of cells as labeled in the metadata of GSE84465, the NCBI GEO archive from which the quantifications were derived. In this example, data on one gene from all cells is retrieved when the statement defining vector 
                    <monospace>vals</monospace> is executed. The display can be recapitulated for other genes by substituting different symbols in the statement computing 
                    <monospace>ind</monospace>. The 
                    <monospace>DelayedArray</monospace> framework leveraged here enables basic computations of this kind without loading the entire matrix into memory.</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">
                        <styled-content style="font-size:15px;color:#214A87;">library</styled-content>
                        <styled-content style="font-size:15px;">(rhdf5client)</styled-content>

                        <styled-content style="font-size:15px;color:#214A87;">library</styled-content>
                        <styled-content style="font-size:15px;">(SummarizedExperiment)</styled-content>

                        <styled-content style="font-size:15px;color:#214A87;">library</styled-content>
                        <styled-content style="font-size:15px;">(ggplot2)
cdar = BiocOncoTK</styled-content>
                        <styled-content style="font-size:15px;color:#CF5C00">::</styled-content>
                        <styled-content style="font-size:15px;">darmGBMcls
ind =</styled-content> 
                        <styled-content style="font-size:15px;color:#214A87;">match</styled-content>
                        <styled-content style="font-size:15px;">(</styled-content>
                        <styled-content style="font-size:15px;color:#4F9905">"PTPRC"</styled-content>
                        <styled-content style="font-size:15px;">,</styled-content> 
                        <styled-content style="font-size:15px;color:#214A87;">rowData</styled-content>
                        <styled-content style="font-size:15px;">(cdar)</styled-content>
                        <styled-content style="font-size:15px;color:#CF5C00">$</styled-content>
                        <styled-content style="font-size:15px;">symbol)
var =</styled-content> 
                        <styled-content style="font-size:15px;color:#214A87;">gsub</styled-content>
                        <styled-content style="font-size:15px;">(</styled-content>
                        <styled-content style="font-size:15px;color:#4F9905">"selection: "</styled-content>
                        <styled-content style="font-size:15px;">,</styled-content> 
                        <styled-content style="font-size:15px;color:#4F9905">""</styled-content>
                        <styled-content style="font-size:15px;">,
       cdar</styled-content>
                        <styled-content style="font-size:15px;color:#CF5C00">$</styled-content>
                        <styled-content style="font-size:15px;">characteristics_ch1.</styled-content>
                        <styled-content style="font-size:15px;color:#0000CF">8</styled-content>
                        <styled-content style="font-size:15px;">)
vals =</styled-content> 
                        <styled-content style="font-size:15px;color:#214A87;">log10</styled-content>
                        <styled-content style="font-size:15px;">(</styled-content>
                        <styled-content style="font-size:15px;color:#214A87;">assay</styled-content>
                        <styled-content style="font-size:15px;">(cdar[ind,])</styled-content>
                        <styled-content style="font-size:15px;color:#CF5C00">+</styled-content>
                        <styled-content style="font-size:15px;color:#0000CF">1</styled-content>
                        <styled-content style="font-size:15px;color:">)
ddd =</styled-content> 
                        <styled-content style="font-size:15px;color:#214A87;">data.frame</styled-content>
                        <styled-content style="font-size:15px;">(</styled-content>
                        <styled-content style="font-size:15px;color:#214A87;">log10norm=</styled-content>
                        <styled-content style="font-size:15px;">vals,</styled-content> 
                        <styled-content style="font-size:15px;color:#214A87;">pan=</styled-content>
                        <styled-content style="font-size:15px;">var)</styled-content>

                        <styled-content style="font-size:15px;color:#214A87;">ggplot</styled-content>
                        <styled-content style="font-size:15px;">(ddd,</styled-content> 
                        <styled-content style="font-size:15px;color:#214A87;">aes</styled-content>
                        <styled-content style="font-size:15px;">(</styled-content>
                        <styled-content style="font-size:15px;color:#214A87;">x=</styled-content>
                        <styled-content style="font-size:15px;">log10norm,</styled-content> 
                        <styled-content style="font-size:15px;color:#214A87;">colour=</styled-content>
                        <styled-content style="font-size:15px;">pan))</styled-content> 
                        <styled-content style="font-size:15px;color:#CF5C00">+</styled-content>
  
                        <styled-content style="font-size:15px;color:#214A87;">geom_density</styled-content>
                        <styled-content style="font-size:15px;">()</styled-content> 
                        <styled-content style="font-size:15px;color:#CF5C00">+</styled-content> 
                        <styled-content style="font-size:15px;color:#214A87;">ylim</styled-content>
                        <styled-content style="font-size:15px;">(</styled-content>
                        <styled-content style="font-size:15px;color:#0000CF">0</styled-content>
                        <styled-content style="font-size:15px;">,</styled-content>
                        <styled-content style="font-size:15px;color:#0000CF">1</styled-content>
                        <styled-content style="font-size:15px;">)</styled-content> 
                        <styled-content style="font-size:15px;color:#CF5C00">+</styled-content>
  
                        <styled-content style="font-size:15px;color:#214A87;">xlab</styled-content>
                        <styled-content style="font-size:15px;">(</styled-content>
                        <styled-content style="font-size:15px;color:#4F9905">"log10 CD45+1"</styled-content>
                        <styled-content style="font-size:15px;color:">)</styled-content>
                    </preformat>
                </p>
                <fig fig-type="figure" id="f3" orientation="portrait" position="float">
                    <label>Figure 3. </label>
                    <caption>
                        <title>Density estimates for log10 CD45 expression in single-cell RNA-seq studies of glioblastoma.</title>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/19158/eedc82a4-df66-4228-9d52-90b482b3df47_figure3.gif"/>
                </fig>
            </sec>
        </sec>
        <sec>
            <title>Performance</title>
            <p>We focus on pursuit of reliability, expressivity, and scalability using 
                <italic toggle="yes">restfulSE</italic>.</p>
            <p>
                <bold>Reliability:</bold> The 
                <italic toggle="yes">restfulSE</italic>, 
                <italic toggle="yes">rhdf5client</italic> and 
                <italic toggle="yes">BiocOncoTK</italic> packages are accompanied by detailed unit tests that compare retrievals to known values. In the case of BigQuery table queries, the test suite composes random queries in both BigQuery SQL and in the 
                <monospace>SummarizedExperiment</monospace> idiom. Results are checked for elementwise equality.</p>
            <p>
                <bold>Expressivity:</bold> The code segments for 
                <xref ref-type="fig" rid="f2">Figure 2</xref> and 
                <xref ref-type="fig" rid="f3">Figure 3</xref> are complex but easy to break down. The joining and reshaping of pancan-atlas tables in BigQuery corresponding to the code in 
                <xref ref-type="fig" rid="f2">Figure 2</xref> can be checked through the query history in the BigQuery interface. The acquisition of expression values employed five nested SELECT statements; the query for assay quantifications was 6000 characters in length. The R code is less than 500 characters including comments.</p>
            <p>
                <bold>Scalability.</bold> BigQuery is intrinsically auto-scaling, but charges accrue with the amount of data scanned, so query design can have effects on throughput and cost. We rely on the 
                <italic toggle="yes">bigrquery</italic> (Wickham
                <sup>
                    <xref ref-type="bibr" rid="ref-10">10</xref>
                </sup>) and 
                <italic toggle="yes">dbplyr</italic> (Wickham and Ruiz
                <sup>
                    <xref ref-type="bibr" rid="ref-11">11</xref>
                </sup>) packages for efficient translation of R-oriented data manipulations to BigQuery SQL. Throughput with the HDF Scalable Data Service is dependent upon the configuration of the object server, the relationship of numerical data layout to prevalent access patterns, and the degree to which queries capitalize on API efficiencies like chunk-based retrieval. For both back ends, proper design and deployment of the querying client can lead to throughput that scale with client-side resources.</p>
        </sec>
        <sec sec-type="conclusions">
            <title>Conclusions</title>
            <p>Cloud-scale storage and retrieval strategies are of significant interest for genome science. The 
                <monospace>SummarizedExperiment</monospace> class unifies assay data with substantive sample- and experiment-level metadata, and its API for managing and interrogating genome-scale experiment archives is used in numerous analytic packages. The 
                <italic toggle="yes">restfulSE</italic> package exposes high-performance cloud-resident data stores to users and algorithms as 
                <monospace>SummarizedExperiment</monospace>s. Continued improvements in efficiency of representation and query resolution for assay data and metadata will help to achieve the potential of a federated data ecosystem for enhanced discovery in biology through interactive genome-scale analysis.</p>
        </sec>
        <sec>
            <title>Software availability</title>
            <p>
                <italic toggle="yes">restfulSE</italic> package available from: 
                <ext-link ext-link-type="uri" xlink:href="https://bioconductor.org/packages/3.9/restfulSE">https://bioconductor.org/packages/3.9/restfulSE</ext-link> Source code available from: 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/shwetagopaul92/restfulSE">https://github.com/shwetagopaul92/restfulSE</ext-link> Archived source code as at time of publication: DOI: 
                <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.18129/B9.bioc.restfulSE">10.18129/B9.bioc.restfulSE</ext-link>
                <sup>
                    <xref ref-type="bibr" rid="ref-12">12</xref>
                </sup> License: Artistic-2.0</p>
        </sec>
    </body>
    <back>
        <ref-list>
            <ref id="ref-1">
                <label>1</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Huber</surname>
                            <given-names>W</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Carey</surname>
                            <given-names>VJ</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Gentleman</surname>
                            <given-names>R</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Orchestrating high-throughput genomic analysis with Bioconductor.</article-title>
                    <source>
					
                        <italic toggle="yes">Nat Methods.</italic>
					</source>
                    <italic toggle="yes">Nature Publishing Group</italic>.<year>2015</year>;<volume>12</volume>(<issue>2</issue>):<fpage>115</fpage>&#x2013;<lpage>121</lpage>.
                    <pub-id pub-id-type="pmid">25633503</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nmeth.3252</pub-id>
                    <pub-id pub-id-type="pmcid">4509590</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-2">
                <label>2</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Lawrence</surname>
                            <given-names>M</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Huber</surname>
                            <given-names>W</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Pag&#x00e8;s</surname>
                            <given-names>H</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Software for computing and annotating genomic ranges.</article-title>
                    <source>
						
                        <italic toggle="yes">PLoS Comput Biol.</italic>
					</source>
                    <year>2013</year>;<volume>9</volume>(<issue>8</issue>):<fpage>e1003118</fpage>.
                    <pub-id pub-id-type="pmid">23950696</pub-id>
                    <pub-id pub-id-type="doi">10.1371/journal.pcbi.1003118</pub-id>
                    <pub-id pub-id-type="pmcid">3738458</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-3">
                <label>3</label>
                <mixed-citation publication-type="journal">
                    <collab>ISB</collab>:
                    <article-title>ISB Cancer Genomics Cloud 1.0.0 Documentation</article-title>.<year>2018</year>; Accessed: 2018-08-17.
                    <ext-link ext-link-type="uri" xlink:href="https://isb-cancer-genomics-cloud.readthedocs.io/en/latest/sections/About-ISB-CGC.html">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-4">
                <label>4</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Hoadley</surname>
                            <given-names>KA</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Yau</surname>
                            <given-names>C</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Hinoue</surname>
                            <given-names>T</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Cell-of-Origin Patterns Dominate the Molecular Classification of 10,000 Tumors from 33 Types of Cancer.</article-title>
                    <source>
						
                        <italic toggle="yes">Cell.</italic>
					</source>
                    <year>2018</year>;<volume>173</volume>(<issue>2</issue>):<fpage>291</fpage>&#x2013;<lpage>304.e6</lpage>.
                    <pub-id pub-id-type="pmid">29625048</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.cell.2018.03.022</pub-id>
                    <pub-id pub-id-type="pmcid">5957518</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-5">
                <label>5</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Pag&#x00e8;s</surname>
                            <given-names>H</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Hickey</surname>
                            <given-names>P</given-names>
                        </name>
					</person-group>:
                    <article-title>DelayedArray: Delayed operations on array-like objects</article-title>. R package version 0.7.28.<year>2018</year>.</mixed-citation>
            </ref>
            <ref id="ref-6">
                <label>6</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Bailey</surname>
                            <given-names>MH</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Tokheim</surname>
                            <given-names>C</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Porta-Pardo</surname>
                            <given-names>EP</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Comprehensive Characterization of Cancer Driver Genes and Mutations.</article-title>
                    <source>
						
                        <italic toggle="yes">Cell.</italic>
					</source>
                    <year>2018</year>;<volume>173</volume>(<issue>2</issue>):<fpage>371</fpage>&#x2013;<lpage>385.e18</lpage>.
                    <pub-id pub-id-type="pmid">29625053</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.cell.2018.02.060</pub-id>
                    <pub-id pub-id-type="pmcid">6029450</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-7">
                <label>7</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Ding</surname>
                            <given-names>L</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Bailey</surname>
                            <given-names>MH</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Porta-Pardo</surname>
                            <given-names>E</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Perspective on Oncogenic Processes at the End of the Beginning of Cancer Genomics.</article-title>
                    <source>
						
                        <italic toggle="yes">Cell.</italic>
					</source>
                    <year>2018</year>;<volume>173</volume>(<issue>2</issue>):<fpage>305</fpage>&#x2013;<lpage>320.e10</lpage>.
                    <pub-id pub-id-type="pmid">29625049</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.cell.2018.03.033</pub-id>
                    <pub-id pub-id-type="pmcid">5916814</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-8">
                <label>8</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Carey</surname>
                            <given-names>V</given-names>
                        </name>
					</person-group>:
                    <article-title>BiocOncoTK: Bioconductor components for general cancer genomics</article-title>. R package version 1.1.16.<year>2018</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://rdrr.io/github/vjcitn/BiocOncoTK/">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-9">
                <label>9</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Darmanis</surname>
                            <given-names>S</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Sloan</surname>
                            <given-names>SA</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Croote</surname>
                            <given-names>D</given-names>
                        </name>
						
                        <etal/>
					</person-group>:
                    <article-title>Single-Cell RNA-Seq Analysis of Infiltrating Neoplastic Cells at the Migrating Front of Human Glioblastoma.</article-title>
                    <source>
						
                        <italic toggle="yes">Cell Rep.</italic>
					</source>
                    <year>2017</year>;<volume>21</volume>(<issue>5</issue>):<fpage>1399</fpage>&#x2013;<lpage>1410</lpage>.
                    <pub-id pub-id-type="pmid">29091775</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.celrep.2017.10.030</pub-id>
                    <pub-id pub-id-type="pmcid">5810554</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-10">
                <label>10</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Wickham</surname>
                            <given-names>H</given-names>
                        </name>
					</person-group>:
                    <article-title>bigrquery: An Interface to Google&#x2019;s &#x2019;BigQuery&#x2019; &#x2019;API&#x2019;</article-title>. R package version 1.0.0.<year>2018</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://CRAN.R-project.org/package=bigrquery">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-11">
                <label>11</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Wickham</surname>
                            <given-names>H</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Ruiz</surname>
                            <given-names>E</given-names>
                        </name>
					</person-group>:
                    <article-title>dbplyr: A &#x2019;dplyr&#x2019; Back End for Databases</article-title>. R package version 1.2.1.<year>2018</year>.
                    <ext-link ext-link-type="uri" xlink:href="https://CRAN.R-project.org/package=dbplyr">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-12">
                <label>12</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">
						
                        <name name-style="western">
                            <surname>Carey</surname>
                            <given-names>V</given-names>
                        </name>
						
                        <name name-style="western">
                            <surname>Gopaulakrishnan</surname>
                            <given-names>S</given-names>
                        </name>
					</person-group>:
                    <article-title>restfulSE: Access matrix-like HDF5 server content or BigQuery content through a SummarizedExperiment interface</article-title>. R package version 1.4.0.<year>2018</year>.
                    <pub-id pub-id-type="doi">10.18129/B9.bioc.restfulSE</pub-id>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report42652">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.19158.r42652</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Reynolds</surname>
                        <given-names>Sheila</given-names>
                    </name>
                    <xref ref-type="aff" rid="r42652a1">1</xref>
                    <role>Referee</role>
                </contrib>
                <aff id="r42652a1">
                    <label>1</label>Institute for Systems Biology, Seattle, WA, USA</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>25</day>
                <month>2</month>
                <year>2019</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2019 Reynolds S</copyright-statement>
                <copyright-year>2019</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport42652" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.17518.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The restfulSE interface described in this article by Gopaulakrishnan et al. is a very useful extension to the SummarizedExperiment class which provides a convenient approach to storing and manipulating rectangular matrices of experimental results, along with associated meta-data. This new extension allows users to query remote data, eliminating the common &#x201c;download&#x201d; step that still precedes many large-scale analyses.</p>
            <p> As these datasets grow, and are more commonly made available in cloud-hosted technologies such as Google or AWS object stores or data warehouses such as Google BigQuery, tools that allow users to easily access and query these datasets become critical. The restfulSE interface permits targeted queries of such remote datasets.</p>
            <p> </p>
            <p> As background information, the article includes a nice summary of the SummarizedExperiment class and related methods, for researchers (such as this reviewer) who had not come across this package before. The authors go on to describe two separate remote back ends: one which accesses PanCancer Atlas TCGA, hosted in Google BigQuery by the ISB-CGC; and the other which access HDF5 data hosted in AWS S3. Both of these backends further make use of the DelayedArray package, which implements delayed or block-processing operations to facilitate working with large datasets that cannot be stored in-memory. This enables &#x201c;lazy&#x201d; data retrieval, with numerical results transmitted from server to client only when needed.</p>
            <p> </p>
            <p> The authors provide two concrete examples, illustrating the usage of both remote back ends. This reviewer ran into some issues trying to run these examples and reached out to the authors who provided additional information in video and Jupyter notebook form. Making additional tutorial resources available with this article will render this information useful and usable by a wider audience and is strongly encouraged.</p>
            <p>Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?</p>
            <p>Yes</p>
            <p>Is the rationale for developing the new software tool clearly explained?</p>
            <p>Yes</p>
            <p>Is the description of the software tool technically sound?</p>
            <p>Yes</p>
            <p>Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?</p>
            <p>Partly</p>
            <p>Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>Computational biology, cloud-computing, integrative analyses of heterogeneous and large-scale cancer data sets</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report43093">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.19158.r43093</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Hazelett</surname>
                        <given-names>Dennis J.</given-names>
                    </name>
                    <xref ref-type="aff" rid="r43093a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-0749-9935</uri>
                </contrib>
                <aff id="r43093a1">
                    <label>1</label>The Center for Bioinformatics and Functional Genomics, Department of Biomedical Sciences, Cedars-Sinai Medical Center, Los Angeles, CA, USA</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>4</day>
                <month>2</month>
                <year>2019</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2019 Hazelett DJ</copyright-statement>
                <copyright-year>2019</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport43093" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.17518.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The restfulSE software package for bioconductor purports to extend a very useful data structure, the SummarizedExperiment to handle very large datasets wherein dynamic download of the full dataset is neither necessary nor practical. Therefore, Gopaulakrishnan et al. have created restfulSE to make this data structure interactive with remote databases on an as-needed basis.&#x00a0;</p>
            <p> </p>
            <p> This is a very useful idea from the Bioconductor core team, and likely to be impactful as datasets grow larger, cheaper to produce, and it becomes increasingly necessary for bioinformaticians to leverage available data against local experiments.</p>
            <p> </p>
            <p> The tool is technically sound, built on Google BigQuery and HDF5, and the paper is well written and clear. The manuscript includes code examples making it simple to get a quick start and see how the software works.</p>
            <p>Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?</p>
            <p>Yes</p>
            <p>Is the rationale for developing the new software tool clearly explained?</p>
            <p>Yes</p>
            <p>Is the description of the software tool technically sound?</p>
            <p>Yes</p>
            <p>Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?</p>
            <p>Yes</p>
            <p>Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>Bioinformatics, regulatory genomics, cancer genomics and epigenomics</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
</article>
