<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="other" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.22969.2</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Software Tool Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>clustifyr: an R package for automated single-cell RNA sequencing cluster classification</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 2; peer review: 2 approved]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Fu</surname>
                        <given-names>Rui</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Gillen</surname>
                        <given-names>Austin E.</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Sheridan</surname>
                        <given-names>Ryan M.</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Tian</surname>
                        <given-names>Chengzhe</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Daya</surname>
                        <given-names>Michelle</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-9057-6593</uri>
                    <xref ref-type="aff" rid="a3">3</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Hao</surname>
                        <given-names>Yue</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a4">4</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Hesselberth</surname>
                        <given-names>Jay R.</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Supervision</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a5">5</xref>
                </contrib>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Riemondy</surname>
                        <given-names>Kent A.</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-0750-1273</uri>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>RNA Bioscience Initiative, University of Colorado School of Medicine, Aurora,, CO, 80045, USA</aff>
                <aff id="a2">
                    <label>2</label>Department of Biochemistry, University of Colorado Boulder, Boulder, CO, 80303, USA</aff>
                <aff id="a3">
                    <label>3</label>Biomedical Informatics &amp; Personalized Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA</aff>
                <aff id="a4">
                    <label>4</label>Bioinformatics Research Center, North Carolina State University, Raleigh, NC, 27695, USA</aff>
                <aff id="a5">
                    <label>5</label>Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, CO, 80045, USA</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:kent.riemondy@cuanschutz.edu">kent.riemondy@cuanschutz.edu</email>
                </corresp>
                <fn fn-type="conflict">
                    <p>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>16</day>
                <month>7</month>
                <year>2020</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2020</year>
            </pub-date>
            <volume>9</volume>
            <elocation-id>223</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>8</day>
                    <month>7</month>
                    <year>2020</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2020 Fu R et al.</copyright-statement>
                <copyright-year>2020</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/9-223/pdf"/>
            <abstract>
                <p>Assignment of cell types from single-cell RNA sequencing (scRNA-seq) data remains a time-consuming and error-prone process. Current packages for identity assignment use limited types of reference data and often have rigid data structure requirements. We developed the clustifyr R package to leverage several external data types, including gene expression profiles to assign likely cell types using data from scRNA-seq, bulk RNA-seq, microarray expression data, or signature gene lists. We benchmark various parameters of a correlation-based approach and implement gene list enrichment methods. clustifyr is a lightweight and effective cell-type assignment tool developed for compatibility with various scRNA-seq analysis workflows. clustifyr is publicly available at 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyr">https://github.com/rnabioco/clustifyr</ext-link>
                </p>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>Single-cell RNA sequencing</kwd>
                <kwd>cell type classification</kwd>
                <kwd>gene expression profile</kwd>
                <kwd>R package</kwd>
            </kwd-group>
            <funding-group>
                <award-group id="fund-1">
                    <funding-source>University of Colorado School of Medicine</funding-source>
                    <award-id>RNABioscienceInitiative</award-id>
                </award-group>
                <award-group id="fund-2" xlink:href="http://dx.doi.org/10.13039/100000057">
                    <funding-source>National Institute of General Medical Sciences</funding-source>
                    <award-id>R35GM119550</award-id>
                </award-group>
                <funding-statement>RNA Bioscience Initiative at the University of Colorado School of Medicine and the National Institutes of Health [R35 GM119550 to J.R.H.]. This work was in part completed during the NIH sponsored Rocky Mountain Genomics HackCon (2018) hosted by the Biofrontiers Department at the University of Colorado at Boulder.</funding-statement>
                <funding-statement>
                    <italic>The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</italic>
                </funding-statement>
            </funding-group>
        </article-meta>
        <notes>
            <sec sec-type="version-changes">
                <label>Revised</label>
                <title>Amendments from Version 1</title>
                <p>The new version of the manuscript includes more detailed introductions and descriptions of the datasets and analyses performed to benchmark clustifyr. Additional functionality in clustifyr is also highlighted, including the ability to identify cell types using marker gene lists and methods to examine cell type classification in the presence of overclustering of the reference or query datasets. Additional software tools were benchmarked against clustifyr and the benchmarking was standardized across multiple datasets for clarity. Lastly we have organized the datasets described in this manuscript into a ExperimentHub resource (clustifyrdatahub)&#x00a0; that will be available through Bioconductor. We have included a new table (Table_1.xlsx), made edits to figures 1-4, and now also provide links to a supplemental table hosted on zenodo (https://doi.org/10.5281/zenodo.3934480)</p>
            </sec>
        </notes>
    </front>
    <body>
        <sec sec-type="intro">
            <title>Introduction</title>
            <p>Single-cell mRNA sequencing (scRNA-seq) promises to deliver elevated understanding of cellular mechanisms, cell heterogeneity within tissue, and developmental transitions
                <sup>
                    <xref ref-type="bibr" rid="ref-1">1</xref>&#x2013;
                    <xref ref-type="bibr" rid="ref-5">5</xref>
                </sup>. A key challenge in scRNA-seq data analysis is the identification of cell types from single-cell transcriptomes. Manual inspection of the expression patterns from a small number of marker genes is still standard practice, which is both cumbersome and potentially inaccurate. Methods that compare cell type expression patterns against robust reference data provide additional confidence in cell type assignments and have the potential to automate and standardize cell type assignment. Unfortunately, current implementations of scRNA-seq suffer from several limitations
                <sup>
                    <xref ref-type="bibr" rid="ref-3">3</xref>,
                    <xref ref-type="bibr" rid="ref-6">6</xref>,
                    <xref ref-type="bibr" rid="ref-7">7</xref>
                </sup> that further compound the problem of cell type identification. First, only RNA levels are measured, which may not correlate with cell surface marker or gene expression signatures identified through other experimental techniques. Second, due to the low capture rate of RNAs, low expressing genes may face detection problems regardless of sequencing depth. Many previously established markers of disease or developmental processes suffer from this issue, such as transcription factors. On the data analysis front, over or under-clustering can generate cluster markers that are uninformative for cell type labeling. In addition, cluster markers that are unrecognizable to an investigator may indicate potentially interesting unexpected cell types but can be very intimidating to interpret.</p>
            <p>For these reasons, investigators struggle to integrate scRNA-seq into their studies due to the challenges of confidently identifying previously characterized or novel cell populations. Formalized data-driven approaches for assigning cell type labels to clusters greatly aid researchers in interrogating scRNA-seq experiments. Currently, multiple cell type assignment packages exist but they are specifically tailored towards input types or workflows
                <sup>
                    <xref ref-type="bibr" rid="ref-8">8</xref>&#x2013;
                    <xref ref-type="bibr" rid="ref-14">14</xref>
                </sup>. Seurat, a popular toolkit for single cell RNA-seq analysis, implements a mutual nearest neighbor-based method to annotate cell types using another single cell RNA-seq dataset in the Seurat object format
                <sup>
                    <xref ref-type="bibr" rid="ref-14">14</xref>
                </sup>. SingleR and scmap provide functionality within the Bioconductor framework to annotate cell types using correlation if provided a reference from bulk-RNA-seq or averaged single cell cluster data
                <sup>
                    <xref ref-type="bibr" rid="ref-8">8</xref>,
                    <xref ref-type="bibr" rid="ref-9">9</xref>
                </sup>. scPred also uses a Bioconductor framework and applies a Support Vector Machine (SVM) model to PCA reduced gene expression data to classify cell types
                <sup>
                    <xref ref-type="bibr" rid="ref-12">12</xref>
                </sup>.  ACTINN, a neural network-based annotation tool, also relies on existing single cell reference data and operates on files within a command line framework
                <sup>
                    <xref ref-type="bibr" rid="ref-11">11</xref>
                </sup>. As more and more approaches to the classification problem are introduced, benchmarking performance and compatibility to sequencing platforms and analysis pipelines becomes increasingly important.</p>
            <p>We developed the R package clustifyr, a lightweight and flexible tool that leverages a wide range of prior knowledge of cell types to pinpoint target cells of interest or assign general cell identities to difficult-to-annotate clusters. Here, we demonstrate its basic usage and applications with transcriptomic information of external datasets and/or signature gene profiles, to explore and quantify likely cell types. The clustifyr package is built with compatibility and ease-of-use in mind to support other popular scRNA-seq tools and formats.</p>
        </sec>
        <sec sec-type="methods">
            <title>Methods</title>
            <sec>
                <title>Implementation</title>
                <p>clustifyr requires query and reference data in the form of normalized expression matrices, corresponding metadata tables, and a list of variable genes (
                    <xref ref-type="fig" rid="f1">Figure 1</xref>).</p>
                <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                    <label>Figure 1. </label>
                    <caption>
                        <title>Schematic for clustifyr input and output.</title>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/27827/48bada72-0bd4-4f03-9e5d-55abea0b1855_figure1.gif"/>
                </fig>
                <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">

                    <styled-content style="font-size:15px;color:#24292E;">library(clustifyr)
pbmc_matrix_small[</styled-content>
                    <styled-content style="font-size:15px;color:#005CC5;">1</styled-content>
                    <styled-content style="font-size:15px;color:#D73A49;">:</styled-content>
                    <styled-content style="font-size:15px;color:#005CC5;">5</styled-content>
                    <styled-content style="font-size:15px;color:#24292E;">,</styled-content> 
                    <styled-content style="font-size:15px;color:#005CC5;">1</styled-content>
                    <styled-content style="font-size:15px;color:#D73A49;">:</styled-content>
                    <styled-content style="font-size:15px;color:#005CC5;">5</styled-content>
                    <styled-content style="font-size:15px;color:#24292E;">]</styled-content>  
                    <styled-content style="font-size:15px;color:#6A737D;"># query matrix of normalized scRNA-
seq counts</styled-content>

                    <styled-content style="font-size:15px;color:#24292E;">cbmc_ref[</styled-content>
                    <styled-content style="font-size:15px;color:#005CC5;">1</styled-content>
                    <styled-content style="font-size:15px;color:#D73A49;">:</styled-content>
                    <styled-content style="font-size:15px;color:#005CC5;">5</styled-content>
                    <styled-content style="font-size:15px;color:#24292E;">,</styled-content> 
                    <styled-content style="font-size:15px;color:#005CC5;">1</styled-content>
                    <styled-content style="font-size:15px;color:#D73A49;">:</styled-content>
                    <styled-content style="font-size:15px;color:#005CC5;">5</styled-content>
                    <styled-content style="font-size:15px;color:#24292E;">]</styled-content>  
                    <styled-content style="font-size:15px;color:#6A737D;"># reference matrix of expression for each
cell type</styled-content>

                    <styled-content style="font-size:15px;color:#24292E;">pbmc_meta[</styled-content>
                    <styled-content style="font-size:15px;color:#005CC5;">1</styled-content>
                    <styled-content style="font-size:15px;color:#D73A49;">:</styled-content>
                    <styled-content style="font-size:15px;color:#005CC5;">5</styled-content>
                    <styled-content style="font-size:15px;color:#24292E;">, ]</styled-content>  
                    <styled-content style="font-size:15px;color:#6A737D;"># query meta-data data.frame containing cell
clusters</styled-content>

                    <styled-content style="font-size:15px;color:#24292E;">length(pbmc_markers_M3Drop</styled-content>
                    <styled-content style="font-size:15px;color:#D73A49;">$</styled-content>
                    <styled-content style="font-size:15px;color:#24292E;">Gene)</styled-content>  
                    <styled-content style="font-size:15px;color:#6A737D;"># vector of variable genes</styled-content>
</preformat>
                <p>clustifyr adopts correlation-based methods to find reference transcriptomes with the highest similarity to query cluster expression profiles, defaulting to Spearman ranked correlation, with options to use Pearson, Kendall, or Cosine correlation instead if desired. clustify() will return a matrix of correlation coefficients for each cell type and cluster, with the row names corresponding to the query cluster number and column names as the reference cell types.</p>
                <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">

                    <styled-content style="font-size:15px;color:#24292E;">res</styled-content> 
                    <styled-content style="font-size:15px;color:#E36209;">&lt;-</styled-content> 
                    <styled-content style="font-size:15px;color:#24292E;">clustify(</styled-content>
  
                    <styled-content style="font-size:15px;color:#E36209;">input =</styled-content> 
                    <styled-content style="font-size:15px;color:#24292E;">pbmc_matrix_small,</styled-content>
  
                    <styled-content style="font-size:15px;color:#E36209;">metadata =</styled-content> 
                    <styled-content style="font-size:15px;color:#24292E;">pbmc_meta,</styled-content>
  
                    <styled-content style="font-size:15px;color:#E36209;">cluster_col =</styled-content> 
                    <styled-content style="font-size:15px;color:#032F62;">"seurat_clusters"</styled-content>
                    <styled-content style="font-size:15px;color:#24292E;">,</styled-content> 
                    <styled-content style="font-size:15px;color:#6A737D;"># column in meta.data with
clusters</styled-content>
  
                    <styled-content style="font-size:15px;color:#E36209;">ref_mat =</styled-content> 
                    <styled-content style="font-size:15px;color:#24292E;">cbmc_ref,</styled-content>
  
                    <styled-content style="font-size:15px;color:#E36209;">query_genes =</styled-content> 
                    <styled-content style="font-size:15px;color:#24292E;">pbmc_markers_M3Drop
                        <styled-content style="font-size:15px;color:#D73A49;">$</styled-content>Gene
)
res[</styled-content>
                    <styled-content style="font-size:15px;color:#005CC5;">1</styled-content>
                    <styled-content style="font-size:15px;color:#D73A49;">:</styled-content>
                    <styled-content style="font-size:15px;color:#005CC5;">5</styled-content>
                    <styled-content style="font-size:15px;color:#24292E;">,</styled-content> 
                    <styled-content style="font-size:15px;color:#005CC5;">1</styled-content>
                    <styled-content style="font-size:15px;color:#D73A49;">:</styled-content>
                    <styled-content style="font-size:15px;color:#005CC5;">5</styled-content>
                    <styled-content style="font-size:15px;color:#24292E;">]</styled-content>

                    <styled-content style="font-size:15px;color:#6A737D;">#&gt;           B CD14+ Mono CD16+ Mono     CD34+     CD4 T
#&gt; 0 0.4700038  0.5033242  0.5188112 0.6012423 0.7909705
#&gt; 1 0.4850570  0.4900953  0.5232810 0.5884319 0.7366543
#&gt; 2 0.5814309  0.9289886  0.8927613 0.6394140 0.5258430
#&gt; 3 0.8609621  0.4663520  0.5686564 0.6429193 0.4698687
#&gt; 4 0.2814882  0.1888232  0.2506101 0.4140560 0.6125503</styled-content>
</preformat>
                <p>Query clusters are assigned cell types to the highest correlated reference cell type, with an automatic or manual cutoff threshold. Query clusters dissimilar to all available reference cell types are labeled as &#x201c;unassigned&#x201d;.</p>
                <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">
                    <styled-content style="font-size:15px;color:000000;">res2</styled-content> 
                    <styled-content style="font-size:15px;color:#E36209;">&lt;-</styled-content> 
                    <styled-content style="font-size:15px;color:000000;">cor_to_call(</styled-content>

  
                    <styled-content style="font-size:15px;color:#E36209;">cor_mat =</styled-content> 
                    <styled-content style="font-size:15px;color:000000;">res,</styled-content>          
                    <styled-content style="font-size:15px;color:#6A737D;"># matrix of correlation coefficients</styled-content>
  
  
                    <styled-content style="font-size:15px;color:#E36209;">cluster_col =</styled-content> 
                    <styled-content style="font-size:15px;color:#032F62;">"seurat_clusters", </styled-content>
                    <styled-content style="font-size:15px;color:#6A737D;"># column in meta.data with
clusters</styled-content>

  
                    <styled-content style="font-size:15px;color:#E36209;">threshold =</styled-content> 
                    <styled-content style="font-size:15px;color:#005CC5;">0.5
)</styled-content>

</preformat>
                <p>To better integrate with standard workflows that involve S3/S4 R objects, methods for clustifyr are written to directly recognize Seurat
                    <sup>
                        <xref ref-type="bibr" rid="ref-14">14</xref>
                    </sup> (v2 and v3) and SingleCellExperiment
                    <sup>
                        <xref ref-type="bibr" rid="ref-15">15</xref>
                    </sup> objects, retrieve the required information, and reinsert classification results back into an output object. A more general wrapper is also included for compatibility with other common data structures and can be easily extended to new object types. This approach also has the added benefit of forgoing certain calculations such as variable gene selection or clustering, which may already be stored within input objects.</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">


                        <styled-content style="font-size:15px;color:#000000;;">res</styled-content> 
                        <styled-content style="font-size:15px;color:#D73A49;">&lt;-</styled-content> 
                        <styled-content style="font-size:15px;color:#000000;;">clustify(</styled-content>
  
                        <styled-content style="font-size:15px;color:#E36209;">input</styled-content> 
                        <styled-content style="font-size:15px;color:#D73A49;">=</styled-content> 
                        <styled-content style="font-size:15px;color:#000000;;">sce_small,</styled-content>       
                        <styled-content style="font-size:15px;color:#6A737D;"># an SCE object</styled-content>
  
                        <styled-content style="font-size:15px;color:#E36209;">ref_mat</styled-content> 
                        <styled-content style="font-size:15px;color:#D73A49;">=</styled-content> 
                        <styled-content style="font-size:15px;color:#000000;;">cbmc_ref</styled-content>
                        <styled-content style="font-size:15px;color:#000000;;">,</styled-content>      
                        <styled-content style="font-size:15px;color:#6A737D;"># matrix of expression for each cell type</styled-content>
  
                        <styled-content style="font-size:15px;color:#E36209;">cluster_col</styled-content> 
                        <styled-content style="font-size:15px;color:#D73A49;">=</styled-content> 
                        <styled-content style="font-size:15px;color:#032F62;">"cell_type1"</styled-content>
                        <styled-content style="font-size:15px;color:#000000;;">,</styled-content> 
                        <styled-content style="font-size:15px;color:#6A737D;"># column in meta.data with clusters</styled-content>
  
                        <styled-content style="font-size:15px;color:#E36209;">obj_out</styled-content> 
                        <styled-content style="font-size:15px;color:#D73A49;">=</styled-content> 
                        <styled-content style="font-size:15px;color:#005CC5;">TRUE</styled-content>           
                        <styled-content style="font-size:15px;color:#6A737D;"># output SCE object with cell type</styled-content>

                        <styled-content style="font-size:15px;color:#000000;;">)</styled-content>

                        <styled-content style="font-size:15px;color:#6F42C1;">SingleCellExperiment</styled-content>
                        <styled-content style="font-size:15px;color:#D73A49;">::</styled-content>
                        <styled-content style="font-size:15px;color:#000000;;">colData(res)</styled-content>
                        <styled-content style="font-size:15px;color:#000000;;">[</styled-content>
                        <styled-content style="font-size:15px;color:#005CC5;">1</styled-content>
                        <styled-content style="font-size:15px;color:#D73A49;">:</styled-content>
                        <styled-content style="font-size:15px;color:#005CC5;">10</styled-content>
                        <styled-content style="font-size:15px;color:#000000;;">, c(</styled-content>
                        <styled-content style="font-size:15px;color:#032F62;">"type"</styled-content>
                        <styled-content style="font-size:15px;color:#000000;;">,</styled-content> 
                        <styled-content style="font-size:15px;color:#032F62;">"r"</styled-content>
                        <styled-content style="font-size:15px;color:#000000;;">)]</styled-content>

                        <styled-content style="font-size:15px;color:#6A737D;">#&gt; DataFrame with 10 rows and 2 columns
#&gt;               type                 r
#&gt;        &lt;character&gt;         &lt;numeric&gt;
#&gt; AZ_A1         pDCs 0.814336567702192
#&gt; AZ_A10       Eryth 0.665800619720566
#&gt; AZ_A11        pDCs 0.682088309107356
#&gt; AZ_A12       Eryth 0.665800619720566
#&gt; AZ_A2            B 0.634114583333333
#&gt; AZ_A3         pDCs 0.814336567702192
#&gt; AZ_A4         pDCs 0.814336567702192
#&gt; AZ_A5           NK 0.655407634437123
#&gt; AZ_A6         pDCs 0.682088309107356
#&gt; AZ_A7         pDCs  0.71424223704931</styled-content>
</preformat>
                </p>
                <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">
                    <styled-content style="font-size:15px;color#000000;">res</styled-content> 
                    <styled-content style="font-size:15px;color:#E36209;">&lt;-</styled-content> 
                    <styled-content style="font-size:15px;color#000000;">clustify(</styled-content>

  
                    <styled-content style="font-size:15px;color:#E36209;">input</styled-content> 
                    <styled-content style="font-size:15px;color:#D73A49;">=</styled-content> 
                    <styled-content style="font-size:15px;color#000000;">s_small3,</styled-content>              
                    <styled-content style="font-size:15px;color#6A737D;"># a Seurat object</styled-content>
  
  
                    <styled-content style="font-size:15px;color:#E36209;">ref_mat</styled-content> 
                    <styled-content style="font-size:15px;color:#D73A49;">=</styled-content> 
                    <styled-content style="font-size:15px;color#000000;">cbmc_ref,</styled-content>            
                    <styled-content style="font-size:15px;color#6A737D;"># matrix of expression for each</styled-content>
cell type
  
  
                    <styled-content style="font-size:15px;color:#E36209;">cluster_col</styled-content> 
                    <styled-content style="font-size:15px;color:#D73A49;">=</styled-content> 
                    <styled-content style="font-size:15px;color:#032F62;">"RNA_snn_res.1",</styled-content>   
                    <styled-content style="font-size:15px;color#6A737D;"># name of column in meta.data</styled-content>
containing cell clusters
  
  
                    <styled-content style="font-size:15px;color:#E36209;">obj_out</styled-content> 
                    <styled-content style="font-size:15px;color:#D73A49;">=</styled-content> 
                    <styled-content style="font-size:15px;color:#005CC5;">TRUE</styled-content>                 
                    <styled-content style="font-size:15px;color#6A737D;"># output Seurat object with cell
type inserted as "type" column</styled-content>


                    <styled-content style="font-size:15px;color#000000;">)</styled-content>
                </preformat>
                <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">
                    <styled-content style="font-size:15px;color#000000;">res</styled-content>
                    <styled-content style="font-size:15px;color:#D73A49;">@</styled-content>
                    <styled-content style="font-size:15px;color#000000;">meta.data[</styled-content>
                    <styled-content style="font-size:15px;color:#005CC5;">1</styled-content>
                    <styled-content style="font-size:15px;color:#D73A49;">:</styled-content>
                    <styled-content style="font-size:15px;color:#005CC5;">5</styled-content>
                    <styled-content style="font-size:15px;color#000000;">, ]</styled-content>


                    <styled-content style="font-size:15px;color#6A737D;">#&gt;                   orig.ident nCount_RNA nFeature_RNA
RNA_snn_res.0.8

#&gt; ATGCCAGAACGACT SeuratProject         70           47
0

#&gt; CATGGCCTGTGCAT SeuratProject         85           52
0

#&gt; GAACCTGATGAACC SeuratProject         87           50
1

#&gt; TGACTGGATTCTCA SeuratProject        127           56
0

#&gt; AGTCAGACTGCACA SeuratProject        173           53
0

#&gt;                letter.idents groups RNA_snn_res.1 type
r

#&gt; ATGCCAGAACGACT             A     g2             0   Mk
0.6204476

#&gt; CATGGCCTGTGCAT             A     g1             0   Mk
0.6204476

#&gt; GAACCTGATGAACC             B     g2             0   Mk
0.6204476

#&gt; TGACTGGATTCTCA             A     g2             0   Mk
0.6204476

#&gt; AGTCAGACTGCACA             A     g2             0   Mk
0.6204476</styled-content>
                </preformat>
                <p>In the absence of suitable reference data (i.e. RNA-seq or microarray expression matrices), clustifyr can build scRNA-seq reference data by averaging per-cell expression data for each cluster, to generate a transcriptomic snapshot. Direct reference-building from SingleCellExperiment or Seurat objects is supported as well.</p>
                <p>
                    <preformat orientation="portrait" position="float" preformat-type="computer code" xml:space="preserve">
                        <styled-content style="font-size:15px;color:#000000;">new_ref_matrix</styled-content> 
                        <styled-content style="font-size:15px;color:#D73A49;">&lt;-</styled-content> 
                        <styled-content style="font-size:15px;color:#000000;">average_clusters(</styled-content>
  
                        <styled-content style="font-size:15px;color:#E36209;">mat</styled-content> 
                        <styled-content style="font-size:15px;color:#D73A49;">=</styled-content> 
                        <styled-content style="font-size:15px;color:#000000;">pbmc_matrix_small,</styled-content>
  
                        <styled-content style="font-size:15px;color:#E36209;">metadata</styled-content> 
                        <styled-content style="font-size:15px;color:#D73A49;">=</styled-content> 
                        <styled-content style="font-size:15px;color:#000000;">pbmc_meta</styled-content>
                        <styled-content style="font-size:15px;color:#D73A49;">$</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">classified,</styled-content> 
                        <styled-content style="font-size:15px;color:#6A737D;"># or use metadata = pbmc_meta, cluster_col = "classified"</styled-content>
  
                        <styled-content style="font-size:15px;color:#E36209;">if_log</styled-content> 
                        <styled-content style="font-size:15px;color:#D73A49;">=</styled-content> 
                        <styled-content style="font-size:15px;color:#005CC5;">TRUE</styled-content>   
                        <styled-content style="font-size:15px;color:#6A737D;"># whether the expression matrix is already log transformed</styled-content>

                        <styled-content style="font-size:15px;color:#000000;">)</styled-content>

                        <styled-content style="font-size:15px;color:#000000;">new_ref_matrix_sce</styled-content> 
                        <styled-content style="font-size:15px;color:#D73A49;">&lt;-</styled-content> 
                        <styled-content style="font-size:15px;color:#000000;">object_ref(</styled-content>
  
                        <styled-content style="font-size:15px;color:#E36209;">input</styled-content> 
                        <styled-content style="font-size:15px;color:#D73A49;">=</styled-content> 
                        <styled-content style="font-size:15px;color:#000000;">sce_small,</styled-content>             
                        <styled-content style="font-size:15px;color:#6A737D;"># SCE object</styled-content>
  
                        <styled-content style="font-size:15px;color:#E36209;">cluster_col</styled-content> 
                        <styled-content style="font-size:15px;color:#D73A49;">=</styled-content> 
                        <styled-content style="font-size:15px;color:#032F62;">"cell_type1"</styled-content>    
                        <styled-content style="font-size:15px;color:#6A737D;"># column in colData with cell identities</styled-content>
)

                        <styled-content style="font-size:15px;color:#000000;">new_ref_matrix_v3</styled-content> 
                        <styled-content style="font-size:15px;color:#D73A49;">&lt;</styled-content>
                        <styled-content style="font-size:15px;color:#000000;">- seurat_ref(</styled-content>
  
                        <styled-content style="font-size:15px;color:#E36209;">seurat_object</styled-content> 
                        <styled-content style="font-size:15px;color:#D73A49;">=</styled-content> 
                        <styled-content style="font-size:15px;color:#000000;">s_small3,</styled-content>     
                        <styled-content style="font-size:15px;color:#6A737D;"># SeuratV3 object</styled-content>
  
                        <styled-content style="font-size:15px;color:#E36209;">cluster_col</styled-content> 
                        <styled-content style="font-size:15px;color:#D73A49;">=</styled-content> 
                        <styled-content style="font-size:15px;color:#032F62;">"RNA_snn_res.1"</styled-content> 
                        <styled-content style="font-size:15px;color:#6A737D;"># column in meta.data with cell identities</styled-content>

                        <styled-content style="font-size:15px;color:#000000;">)</styled-content>
                    </preformat>
                </p>
                <p>Data exploration plotting functions, for dimensional reduction scatter plots and heatmaps, are extended from ggplot2 and ComplexHeatmap packages, featuring colorblind-friendly default colors. Gene list-based methods (clustify_lists()) are also implemented via hypergeometric tests, GSEA, jaccard index, or percentage gene detection by cluster, which provide easy to interpret methods to verify the presence of known positive and negative marker genes.</p>
            </sec>
            <sec>
                <title>Parameters</title>
                <p>
                    <bold>
                        <italic toggle="yes">Reference datasets.</italic>
                    </bold> Multiple scRNA-seq and other cell type references datasets are provided in an ExperimentHub Bioconductor package (clustifyrdatahub). A description of these datasets and others used for benchmarking and optimizing parameters for clustifyr are provided in 
                    <xref ref-type="table" rid="T1">Table 1</xref>.</p>
                <table-wrap id="T1" orientation="portrait" position="anchor">
                    <label>Table 1. </label>
                    <caption>
                        <title>Collection of datasets used for introducing and benchmarking clustifyr.</title>
                        <p>A description of single cell RNA-seq, bulk RNA-seq, and microarray datasets used in this study. The datasets available through ExperimentHub are references that were built from raw or downloaded data and can be used with clustifyr. R objects can be accessed using the direct download URLs to the .rda files, or through the clustifyrdatahub ExperimentHub.</p>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Description</th>
                                <th align="left" colspan="1" rowspan="1" valign="top"># of
                                    <break/>cell
                                    <break/>types</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Organism</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Publication</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Source</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Data Provider</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">R object download URL
                                    <sup>
                                        <xref ref-type="other" rid="fn1">1</xref>
                                    </sup>
                                </th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Bioconductor
                                    <break/>ExperimentHubID
                                    <sup>
                                        <xref ref-type="other" rid="fn1">2</xref>
                                    </sup>
                                </th>
                                <th align="left" colspan="1" rowspan="1" valign="top">R object
                                    <break/>name
                                    <sup>
                                        <xref ref-type="other" rid="fn1">3</xref>
                                    </sup>
                                </th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Mouse Cell
                                    <break/>Atlas</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">713</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">mouse</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://www.cell.com/cell/fulltext/S0092-8674(18)30116-8">https://www.cell.</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://www.cell.com/cell/fulltext/S0092-8674(18)30116-8">com/cell/fulltext/S0092-</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://www.cell.com/cell/fulltext/S0092-8674(18)30116-8">8674(18)30116-8</ext-link>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://ndownloader.figshare.com/files/10756795">https://ndownloader.figshare.com/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://ndownloader.figshare.com/files/10756795">files/10756795</ext-link>
                                </td>
                                <td align="center" colspan="1" rowspan="1" valign="top">figshare</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/raw/master/data/ref_MCA.rda">https://github.com/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/raw/master/data/ref_MCA.rda">rnabioco/clustifyrdata/raw/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/raw/master/data/ref_MCA.rda">master/data/ref_MCA.rda</ext-link>
                                </td>
                                <td align="center" colspan="1" rowspan="1" valign="top">EH3444</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">ref_MCA</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Tabula Muris
                                    <break/>(10X)</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">112</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">mouse</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://www.nature.com/articles/s41586-018-0590-4">https://www.nature.</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://www.nature.com/articles/s41586-018-0590-4">com/articles/s41586-</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://www.nature.com/articles/s41586-018-0590-4">018-0590-4</ext-link>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://ndownloader.figshare.com/articles/5821263">https://ndownloader.figshare.com/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://ndownloader.figshare.com/articles/5821263">articles/5821263</ext-link>
                                </td>
                                <td align="center" colspan="1" rowspan="1" valign="top">figshare</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/raw/master/data/ref_tabula_muris_drop.rda">https://github.com/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/raw/master/data/ref_tabula_muris_drop.rda">rnabioco/clustifyrdata/raw/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/raw/master/data/ref_tabula_muris_drop.rda">master/data/ref_tabula_</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/raw/master/data/ref_tabula_muris_drop.rda">muris_drop.rda</ext-link>
                                </td>
                                <td align="center" colspan="1" rowspan="1" valign="top">EH3445</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">ref_tabula_
                                    <break/>muris_drop</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Tabula Muris
                                    <break/>(SmartSeq2)</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">175</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">mouse</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://www.nature.com/articles/s41586-018-0590-4">https://www.nature.</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://www.nature.com/articles/s41586-018-0590-4">com/articles/s41586-</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://www.nature.com/articles/s41586-018-0590-4">018-0590-4</ext-link>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://ndownloader.figshare.com/articles/5821263">https://ndownloader.figshare.com/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://ndownloader.figshare.com/articles/5821263">articles/5821263</ext-link>
                                </td>
                                <td align="center" colspan="1" rowspan="1" valign="top">figshare</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/raw/master/data/ref_tabula_muris_facs.rda">https://github.com/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/raw/master/data/ref_tabula_muris_facs.rda">rnabioco/clustifyrdata/raw/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/raw/master/data/ref_tabula_muris_facs.rda">master/data/ref_tabula_</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/raw/master/data/ref_tabula_muris_facs.rda">muris_facs.rda</ext-link>
                                </td>
                                <td align="center" colspan="1" rowspan="1" valign="top">EH3446</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">ref_tabula_
                                    <break/>muris_facs</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Mouse RNA-seq
                                    <break/>from 28 cell
                                    <break/>types</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">28</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">mouse</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://genome.cshlp.org/content/early/2019/03/11/gr.240093.118">https://genome.</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://genome.cshlp.org/content/early/2019/03/11/gr.240093.118">cshlp.org/content/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://genome.cshlp.org/content/early/2019/03/11/gr.240093.118">early/2019/03/11/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://genome.cshlp.org/content/early/2019/03/11/gr.240093.118">gr.240093.118</ext-link>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/dviraran/SingleR/tree/master/data">https://github.com/dviraran/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/dviraran/SingleR/tree/master/data">SingleR/tree/master/data</ext-link>
                                </td>
                                <td align="center" colspan="1" rowspan="1" valign="top">GitHub</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/raw/master/data/ref_mouse.rnaseq.rda">https://github.com/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/raw/master/data/ref_mouse.rnaseq.rda">rnabioco/clustifyrdata/raw/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/raw/master/data/ref_mouse.rnaseq.rda">master/data/ref_mouse.</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/raw/master/data/ref_mouse.rnaseq.rda">rnaseq.rda</ext-link>
                                </td>
                                <td align="center" colspan="1" rowspan="1" valign="top">EH3447</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">ref_mouse.
                                    <break/>rnaseq</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Mouse
                                    <break/>Organogenesis
                                    <break/>Cell Atlas (main
                                    <break/>cell types)</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">37</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">mouse</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://www.nature.com/articles/s41586-019-0969-x">https://www.nature.</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://www.nature.com/articles/s41586-019-0969-x">com/articles/s41586-</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://www.nature.com/articles/s41586-019-0969-x">019-0969-x</ext-link>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://oncoscape.v3.sttrcancer.org/atlas.gs.washington.edu.mouse.rna/downloads">https://oncoscape.v3.sttrcancer.</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://oncoscape.v3.sttrcancer.org/atlas.gs.washington.edu.mouse.rna/downloads">org/atlas.gs.washington.edu.</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://oncoscape.v3.sttrcancer.org/atlas.gs.washington.edu.mouse.rna/downloads">mouse.rna/downloads</ext-link>
                                </td>
                                <td align="center" colspan="1" rowspan="1" valign="top">washington.edu</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/raw/master/data/ref_moca_main.rda">https://github.com/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/raw/master/data/ref_moca_main.rda">rnabioco/clustifyrdata/raw/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/raw/master/data/ref_moca_main.rda">master/data/ref_moca_</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/raw/master/data/ref_moca_main.rda">main.rda</ext-link>
                                </td>
                                <td align="center" colspan="1" rowspan="1" valign="top">EH3448</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">ref_moca_
                                    <break/>main</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Mouse sorted
                                    <break/>immune cells</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">253</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">mouse</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://www.nature.com/articles/ni1008-1091">https://www.nature.</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://www.nature.com/articles/ni1008-1091">com/articles/ni1008-</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://www.nature.com/articles/ni1008-1091">1091</ext-link>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/dviraran/SingleR/tree/master/data">https://github.com/dviraran/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/dviraran/SingleR/tree/master/data">SingleR/tree/master/data</ext-link>
                                </td>
                                <td align="center" colspan="1" rowspan="1" valign="top">GitHub</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/raw/master/data/ref_immgen.rda">https://github.com/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/raw/master/data/ref_immgen.rda">rnabioco/clustifyrdata/raw/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/raw/master/data/ref_immgen.rda">master/data/ref_immgen.</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/raw/master/data/ref_immgen.rda">rda</ext-link>
                                </td>
                                <td align="center" colspan="1" rowspan="1" valign="top">EH3449</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">ref_immgen</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Human
                                    <break/>hematopoietic
                                    <break/>cell microarray</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">38</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">human</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://www.cell.com/fulltext/S0092-8674(11)00005-5">https://www.cell.</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://www.cell.com/fulltext/S0092-8674(11)00005-5">com/fulltext/S0092-</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://www.cell.com/fulltext/S0092-8674(11)00005-5">8674(11)00005-5</ext-link>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://ftp.ncbi.nlm.nih.gov/geo/series/GSE24nnn/GSE24759/matrix/GSE24759_series_matrix.txt.gz">https://ftp.ncbi.nlm.nih.gov/geo/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://ftp.ncbi.nlm.nih.gov/geo/series/GSE24nnn/GSE24759/matrix/GSE24759_series_matrix.txt.gz">series/GSE24nnn/GSE24759/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://ftp.ncbi.nlm.nih.gov/geo/series/GSE24nnn/GSE24759/matrix/GSE24759_series_matrix.txt.gz">matrix/GSE24759_series_matrix.</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://ftp.ncbi.nlm.nih.gov/geo/series/GSE24nnn/GSE24759/matrix/GSE24759_series_matrix.txt.gz">txt.gz</ext-link>
                                </td>
                                <td align="center" colspan="1" rowspan="1" valign="top">GEO</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/raw/master/data/ref_hema_microarray.rda">https://github.com/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/raw/master/data/ref_hema_microarray.rda">rnabioco/clustifyrdata/raw/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/raw/master/data/ref_hema_microarray.rda">master/data/ref_hema_</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/raw/master/data/ref_hema_microarray.rda">microarray.rda</ext-link>
                                </td>
                                <td align="center" colspan="1" rowspan="1" valign="top">EH3450</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">ref_hema_
                                    <break/>microarray</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Human cortex
                                    <break/>development
                                    <break/>scRNA-seq</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">47</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">human</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://science.sciencemag.org/content/358/6368/1318.long">https://science.</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://science.sciencemag.org/content/358/6368/1318.long">sciencemag.org/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://science.sciencemag.org/content/358/6368/1318.long">content/358/6368/1318.</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://science.sciencemag.org/content/358/6368/1318.long">long</ext-link>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://cells.ucsc.edu/cortex-dev/exprMatrix.tsv.gz">https://cells.ucsc.edu/cortex-dev/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://cells.ucsc.edu/cortex-dev/exprMatrix.tsv.gz">exprMatrix.tsv.gz</ext-link>
                                </td>
                                <td align="center" colspan="1" rowspan="1" valign="top">UCSC</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/raw/master/data/ref_cortex_dev.rda">https://github.com/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/raw/master/data/ref_cortex_dev.rda">rnabioco/clustifyrdata/raw/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/raw/master/data/ref_cortex_dev.rda">master/data/ref_cortex_</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/raw/master/data/ref_cortex_dev.rda">dev.rda</ext-link>
                                </td>
                                <td align="center" colspan="1" rowspan="1" valign="top">EH3451</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">ref_cortex_
                                    <break/>dev</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Human
                                    <break/>pancreatic cell
                                    <break/>scRNA-seq
                                    <break/>(inDrop)</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">14</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">human</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://www.cell.com/fulltext/S2405-4712(16)30266-6">https://www.cell.</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://www.cell.com/fulltext/S2405-4712(16)30266-6">com/fulltext/S2405-</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://www.cell.com/fulltext/S2405-4712(16)30266-6">4712(16)30266-6</ext-link>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://scrnaseq-public-datasets.s3.amazonaws.com/scater-objects/baron-human.Rda">https://scrnaseq-public-datasets.</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://scrnaseq-public-datasets.s3.amazonaws.com/scater-objects/baron-human.Rda">s3.amazonaws.com/scater-objects/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://scrnaseq-public-datasets.s3.amazonaws.com/scater-objects/baron-human.Rda">baron-human.Rda</ext-link>
                                </td>
                                <td align="center" colspan="1" rowspan="1" valign="top">S3</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/raw/master/data/ref_pan_indrop.rda">https://github.com/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/raw/master/data/ref_pan_indrop.rda">rnabioco/clustifyrdata/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/raw/master/data/ref_pan_indrop.rda">raw/master/data/ref_pan_</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/raw/master/data/ref_pan_indrop.rda">indrop.rda</ext-link>
                                </td>
                                <td align="center" colspan="1" rowspan="1" valign="top">EH3452</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">ref_pan_
                                    <break/>indrop</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Human
                                    <break/>pancreatic cell
                                    <break/>scRNA-seq
                                    <break/>(SmartSeq2)</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">12</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">human</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://www.sciencedirect.com/science/article/pii/S1550413116304363">https://www.</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://www.sciencedirect.com/science/article/pii/S1550413116304363">sciencedirect.com/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://www.sciencedirect.com/science/article/pii/S1550413116304363">science/article/pii/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://www.sciencedirect.com/science/article/pii/S1550413116304363">S1550413116304363</ext-link>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://scrnaseq-public-datasets.s3.amazonaws.com/scater-objects/segerstolpe.Rda">https://scrnaseq-public-datasets.</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://scrnaseq-public-datasets.s3.amazonaws.com/scater-objects/segerstolpe.Rda">s3.amazonaws.com/scater-objects/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://scrnaseq-public-datasets.s3.amazonaws.com/scater-objects/segerstolpe.Rda">segerstolpe.Rda</ext-link>
                                </td>
                                <td align="center" colspan="1" rowspan="1" valign="top">S3</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/raw/master/data/ref_pan_smartseq2.rda">https://github.com/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/raw/master/data/ref_pan_smartseq2.rda">rnabioco/clustifyrdata/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/raw/master/data/ref_pan_smartseq2.rda">raw/master/data/ref_pan_</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/raw/master/data/ref_pan_smartseq2.rda">smartseq2.rda</ext-link>
                                </td>
                                <td align="center" colspan="1" rowspan="1" valign="top">EH3453</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">ref_pan_
                                    <break/>smartseq2</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Human PBMCs,
                                    <break/>PBMC-Bench
                                    <break/>(multiple
                                    <break/>platforms)</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">9</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">human</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1186/s13059-019-1795-z">https://doi.org/10.1186/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1186/s13059-019-1795-z">s13059-019-1795-z</ext-link>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://zenodo.org/record/3357167/files/scRNAseq_Benchmark_datasets.zip?download=1">https://zenodo.org/record/3357167/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://zenodo.org/record/3357167/files/scRNAseq_Benchmark_datasets.zip?download=1">files/scRNAseq_Benchmark_</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://zenodo.org/record/3357167/files/scRNAseq_Benchmark_datasets.zip?download=1">datasets.zip?download=1</ext-link>
                                </td>
                                <td align="center" colspan="1" rowspan="1" valign="top">Zenodo</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://zenodo.org/record/3357167/files/scRNAseq_Benchmark_datasets.zip?download=1">https://zenodo.org/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://zenodo.org/record/3357167/files/scRNAseq_Benchmark_datasets.zip?download=1">record/3357167/files/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://zenodo.org/record/3357167/files/scRNAseq_Benchmark_datasets.zip?download=1">scRNAseq_Benchmark_</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://zenodo.org/record/3357167/files/scRNAseq_Benchmark_datasets.zip?download=1">datasets.zip?download=1</ext-link>
                                </td>
                                <td align="center" colspan="1" rowspan="1" valign="top">NA</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">NA</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Human PBMCs,
                                    <break/>Unseen
                                    <break/>rejection test</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">5,7,10</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">human</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1186/s13059-019-1795-z">https://doi.org/10.1186/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1186/s13059-019-1795-z">s13059-019-1795-z</ext-link>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://zenodo.org/record/3357167/files/scRNAseq_Benchmark_datasets.zip?download=1">https://zenodo.org/record/3357167/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://zenodo.org/record/3357167/files/scRNAseq_Benchmark_datasets.zip?download=1">files/scRNAseq_Benchmark_</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://zenodo.org/record/3357167/files/scRNAseq_Benchmark_datasets.zip?download=1">datasets.zip?download=1</ext-link>
                                </td>
                                <td align="center" colspan="1" rowspan="1" valign="top">Zenodo</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://zenodo.org/record/3357167/files/scRNAseq_Benchmark_datasets.zip?download=1">https://zenodo.org/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://zenodo.org/record/3357167/files/scRNAseq_Benchmark_datasets.zip?download=1">record/3357167/files/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://zenodo.org/record/3357167/files/scRNAseq_Benchmark_datasets.zip?download=1">scRNAseq_Benchmark_</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://zenodo.org/record/3357167/files/scRNAseq_Benchmark_datasets.zip?download=1">datasets.zip?download=1</ext-link>
                                </td>
                                <td align="center" colspan="1" rowspan="1" valign="top">NA</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">NA</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Mouse anterior
                                    <break/>lateral motor
                                    <break/>cortex (ALM)</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">34</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">mouse</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1038/s41586-018-0654-5">https://doi.org/10.1038/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1038/s41586-018-0654-5">s41586-018-0654-5</ext-link>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://portal.brain-map.org/atlases-and-data/rnaseq/mouse-v1-and-alm-smart-seq">https://portal.brain-map.org/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://portal.brain-map.org/atlases-and-data/rnaseq/mouse-v1-and-alm-smart-seq">atlases-and-data/rnaseq/mouse-</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://portal.brain-map.org/atlases-and-data/rnaseq/mouse-v1-and-alm-smart-seq">v1-and-alm-smart-seq</ext-link>
                                </td>
                                <td align="center" colspan="1" rowspan="1" valign="top">Allen Brain
                                    <break/>Institute</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">NA</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">NA</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">NA</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Mouse brain
                                    <break/>primary visual
                                    <break/>cortex (VISp)</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">34</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">mouse</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1038/s41586-018-0654-5">https://doi.org/10.1038/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1038/s41586-018-0654-5">s41586-018-0654-5</ext-link>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://portal.brain-map.org/atlases-and-data/rnaseq/mouse-v1-and-alm-smart-seq">https://portal.brain-map.org/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://portal.brain-map.org/atlases-and-data/rnaseq/mouse-v1-and-alm-smart-seq">atlases-and-data/rnaseq/mouse-</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://portal.brain-map.org/atlases-and-data/rnaseq/mouse-v1-and-alm-smart-seq">v1-and-alm-smart-seq</ext-link>
                                </td>
                                <td align="center" colspan="1" rowspan="1" valign="top">Allen Brain
                                    <break/>Institute</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">NA</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">NA</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">NA</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Human PBMC
                                    <break/>rejection test
                                    <break/>(SciBet)</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">5</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">human</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1038/s41467-020-15523-2">https://doi.org/10.1038/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1038/s41467-020-15523-2">s41467-020-15523-2</ext-link>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="http://scibet.cancer-pku.cn/document.html">http://scibet.cancer-pku.cn/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="http://scibet.cancer-pku.cn/document.html">document.html</ext-link>
                                </td>
                                <td align="center" colspan="1" rowspan="1" valign="top">Investigator</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">NA</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">NA</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">NA</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Human CBMC
                                    <break/>(CITE-Seq)</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">13</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">human</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1038/nmeth.4380">https://doi.org/10.1038/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1038/nmeth.4380">nmeth.4380</ext-link>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE100nnn/GSE100866/suppl/GSE100866_CBMC_8K_13AB_10X-RNA_umi.csv.gz">ftp://ftp.ncbi.nlm.nih.gov/geo/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE100nnn/GSE100866/suppl/GSE100866_CBMC_8K_13AB_10X-RNA_umi.csv.gz">series/GSE100nnn/GSE100866/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE100nnn/GSE100866/suppl/GSE100866_CBMC_8K_13AB_10X-RNA_umi.csv.gz">suppl/GSE100866_CBMC_8K_</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE100nnn/GSE100866/suppl/GSE100866_CBMC_8K_13AB_10X-RNA_umi.csv.gz">13AB_10X-RNA_umi.csv.gz</ext-link>
                                </td>
                                <td align="center" colspan="1" rowspan="1" valign="top">GEO</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">NA</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">NA</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">NA</td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">Human PBMCs
                                    <break/>(3k)</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">9</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">human</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1038/ncomms14049">https://doi.org/10.1038/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1038/ncomms14049">ncomms14049</ext-link>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://support.10xgenomics.com/single-cell-gene-expression/datasets">https://support.10xgenomics.</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://support.10xgenomics.com/single-cell-gene-expression/datasets">com/single-cell-gene-expression/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://support.10xgenomics.com/single-cell-gene-expression/datasets">datasets</ext-link>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">10x Genomics</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <ext-link ext-link-type="uri" xlink:href="https://www.dropbox.com/s/63gnlw45jf7cje8/pbmc3k_final.rds?dl=0">https://www.dropbox.</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://www.dropbox.com/s/63gnlw45jf7cje8/pbmc3k_final.rds?dl=0">com/s/63gnlw45jf7cje8/</ext-link>
                                    <break/>
                                    <ext-link ext-link-type="uri" xlink:href="https://www.dropbox.com/s/63gnlw45jf7cje8/pbmc3k_final.rds?dl=0">pbmc3k_final.rds?dl=0</ext-link>
                                </td>
                                <td align="center" colspan="1" rowspan="1" valign="top">NA</td>
                                <td align="center" colspan="1" rowspan="1" valign="top">NA</td>
                            </tr>
                        </tbody>
                    </table>
                    <table-wrap-foot>
                        <fn id="fn1">
                            <p>
                                <sup>1</sup>download URL to access R object (if available)</p>
                            <p>
                                <sup>2</sup>R object id in the clustifyrdatahub Bioconductor Experiment hub</p>
                            <p>
                                <sup>3</sup>R object name (if available via clustifyrdatahub)</p>
                        </fn>
                    </table-wrap-foot>
                </table-wrap>
                <p>
                    <bold>
                        <italic toggle="yes">Correlation method.</italic>
                    </bold> We benchmarked clustifyr against a suite of comparable datasets, PBMC-bench
                    <sup>
                        <xref ref-type="bibr" rid="ref-13">13</xref>,
                        <xref ref-type="bibr" rid="ref-16">16</xref>
                    </sup>, generated using multiple scRNA-seq methods on aliquots of peripheral blood mononuclear cells (PBMCs) from two individuals. Additional details about each query and reference dataset are provided in Supplemental Table 1. For each single cell technology, average gene expression profiles were generated from annotated cell types and compared across each platform. Notably, for each reference dataset cross-referenced against all other samples, clustifyr achieved a median F1-score (see Benchmarking Methods) of above 0.94 using Spearman ranked correlation (
                    <xref ref-type="fig" rid="f2">Figure 2A</xref>). Other correlation methods are on par or slightly worse at cross-platform classifications, which is expected based on the nature of ranked vs unranked methods. We therefore selected Spearman as the default method in clustifyr, with other methods also available, as well as a wrapper function to find consensus identities across available correlation methods (call_consensus()).</p>
                <fig fig-type="figure" id="f2" orientation="portrait" position="float">
                    <label>Figure 2. </label>
                    <caption>
                        <title>Parameter considerations for clustifyr.</title>
                        <p>
                            <bold>A</bold>) Comparison of median F1-scores of different correlation methods for classifying across platforms using the PBMC-bench dataset. 
                            <bold>B</bold>) Heatmap showing correlation coefficients between query cell types and the reference cell types from a rejection test, whereby megakaryocytes were excluded from the reference dataset. The Neg.Cell cluster is megakaryocytes, which is correctly not annotated a different cell type when megakaryocytes are not present in the reference. By default clusters with correlation &lt; 0.50 are assigned as &#x201c;unassigned&#x201d; by clustifyr.  
                            <bold>C</bold>) Comparison of correlation coefficients with and without feature selection when comparing average gene expression per cell type between two pancreas scRNA-seq datasets. The &#x201c;unclassified&#x201d; cell type was not defined in the Segerstolpe 
                            <italic toggle="yes">et al</italic> dataset. 
                            <bold>D</bold>) Accuracy (defined as the ratio between the number of correctly classified clusters and the overall number of clusters) and performance were assessed with decreasing query cluster cell numbers using the Tabula Muris as the query dataset and the Mouse cell atlas as the reference dataset. 
                            <bold>E</bold>) Example of overclustering the query data and assigning cell types for data exploration. UMAP of PBMC dataset generated by 10x Genomics with cell types assigned by comparing to reference data from CBMC cells from Stoeckius 
                            <italic toggle="yes">et al.</italic> 2017. 
                            <bold>F</bold>) An assessment of the median F1-score when using single or multiple averaged profiles as reference cell types was conducted using the PBMC-bench test set. The number of reference expression profiles to generate for each cell type is determined by the number of cells in the cluster (n), and the sub-clustering power argument (x), with the formula n
                            <sup>x</sup>.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/27827/48bada72-0bd4-4f03-9e5d-55abea0b1855_figure2.gif"/>
                </fig>
                <p>
                    <bold>
                        <italic toggle="yes">Correlation minimum cutoff.</italic>
                    </bold> Recognition of missing reference cell types, so as to avoid misclassification, is another point of great interest in the field. From general usage of clustifyr, we find using a minimum correlation cutoff of 0.5 or 0.4 is generally satisfactory. Alternatively, the cutoff threshold can be determined heuristically using 0.8 * highest correlation coefficient among the clusters. One example is shown in 
                    <xref ref-type="fig" rid="f2">Figure 2B</xref>, using PBMC rejection benchmark data modified by the SciBet package
                    <sup>
                        <xref ref-type="bibr" rid="ref-17">17</xref>
                    </sup>. Megakaryocytes were removed from the reference melanoma immune cells data, but retained in the test data to mimic the situation when the reference data does not contain a rare cell type. clustifyr analysis successfully found the megakaryocytes to be dissimilar to all available reference cell types, and hence left as &#x201c;unassigned&#x201d; under the default minimum threshold cutoff.</p>
                <p>
                    <bold>
                        <italic toggle="yes">Variable gene selection and normalization.</italic>
                    </bold> As the core function of clustifyr is ranked correlation, feature selection to focus on highly variable genes is critical. To illustrate the importance of feature selection we used clustifyr to classify pancreatic cell types generated using the inDrops platform using a reference built from a dataset generated on the Smart-Seq2 platform
                    <sup>
                        <xref ref-type="bibr" rid="ref-18">18</xref>,
                        <xref ref-type="bibr" rid="ref-19">19</xref>
                    </sup>. In 
                    <xref ref-type="fig" rid="f2">Figure 2C</xref>, we compare correlation coefficients using all detected genes (&gt;10,000) vs feature selection by the package M3Drop. A basic level of feature selection, e.g. using M3Drop, Seurat VST (default uses top 2,000 variable genes), or simply 1,000 genes with the highest variance in the reference data, is sufficient to classify the pancreatic cells. In the case of other cell type mixtures, especially ones without complete knowledge of the expected cell types, further optimization of clustering and feature selection may be of greater importance. clustifyr does not provide novel clustering, feature selection, or normalization methods on its own, but instead is built to maintain flexibility to incorporate methods from other, and future, packages. We recommend that users use normalized reference and query data and match normalization methods between datasets when possible. We view these questions as fast-moving fields
                    <sup>
                        <xref ref-type="bibr" rid="ref-20">20</xref>,
                        <xref ref-type="bibr" rid="ref-21">21</xref>
                    </sup>, and hope to benefit from new advances, while keeping the general clustifyr framework intact.</p>
                <p>
                    <bold>
                        <italic toggle="yes">Minimum cells per cluster.</italic>
                    </bold> We next applied clustifyr to a larger general reference set built from the Mouse Cell Atlas
                    <sup>
                        <xref ref-type="bibr" rid="ref-22">22</xref>
                    </sup> and examined cell type classification of another mouse cell atlas, the Tabula Muris dataset
                    <sup>
                        <xref ref-type="bibr" rid="ref-5">5</xref>
                    </sup>. clustifyr assigned cell types with a median accuracy of 1. Using these test datasets we sought to determine the minimum number of query cells necessary in a cluster to obtain accurate cell type annotation. We subsampled the query data (
                    <xref ref-type="fig" rid="f2">Figure 2D</xref>) and as expected, with further downsampling of the number of cells in each query cluster, we observe decreased accuracy. Yet, even at 15 cells per tested cluster, clustifyr still performed well, with a further increase in speed. Based on these results, we set the default parameters in clustifyr to exclude or warn users of classification on clusters containing less than 10 cells. These results also suggest that clustering the query dataset to obtain more refined clusters (e.g fewer cells per cluster) could be employed to aid in the identification of rarer or less well-defined cell subsets. clustifyr can also be used to classify individual cells, although we do not recommend per cell classification because of the reduced accuracy observed with decreasing numbers of cells per cluster.</p>
                <p>
                    <italic toggle="yes">
                        <bold>Subclustering.</bold>
                    </italic> clustifyr also provides functionality to assess the quality of the cell type annotations. An intentional overclustering and classification function based on k-means clustering (overcluster_test()) is implemented in clustifyr for exploration of cell type annotation at increasing numbers of clusters (
                    <xref ref-type="fig" rid="f2">Figure 2E</xref>). This approach provides a rapid visualization to determine if cell type annotations are stable with varying numbers of clusters. For example, scRNA-seq data from the Seurat PBMC 3k tutorial was reclassified at multiple clustering levels using Cord Blood Mononuclear Cells (CBMCs) as reference, which demonstrated largely stable cell type assignments in the presence of overclustered query data (
                    <xref ref-type="fig" rid="f2">Figure 2E</xref>)
                    <sup>
                        <xref ref-type="bibr" rid="ref-23">23</xref>
                    </sup>. When using scRNA-seq data as the reference data, matrices are built by averaging per-cell expression data for each cluster (average_clusters()), to generate a transcriptomic snapshot similar to bulk RNA-seq or microarray data. An additional argument to subcluster the reference single cell clusters is also available, to generate more than one expression profile per reference cell type, in a manner analogous to overcluster_test(), but applied to the reference scRNA-seq dataset. The number of subclusters for each reference cell type is dependent on the number of cells in the cluster (n), and the sub-clustering power argument (x), following the formula n
                    <sup>x</sup> 
                    <sup>
                        <xref ref-type="bibr" rid="ref-9">9</xref>
                    </sup>. This approach does not improve classification in the PBMC-bench data (
                    <xref ref-type="fig" rid="f2">Figure 2F</xref>), whose reference and query clustering are already consistent. However, we envision its utility would greatly depend on the granularity of the clustering in the reference dataset.</p>
            </sec>
            <sec>
                <title>Benchmarking</title>
                <p>Using clustifyr, PBMC clusters from the Seurat PBMC 3k tutorial are correctly labeled using either bulk-RNA seq references generated from processed microarray data of purified cell types
                    <sup>
                        <xref ref-type="bibr" rid="ref-24">24</xref>
                    </sup>, the ImmGen database of bulk-RNA-seq
                    <sup>
                        <xref ref-type="bibr" rid="ref-9">9</xref>,
                        <xref ref-type="bibr" rid="ref-25">25</xref>
                    </sup>, or previously annotated scRNA-seq results from the Seurat CBMC CITE-seq tutorial
                    <sup>
                        <xref ref-type="bibr" rid="ref-14">14</xref>,
                        <xref ref-type="bibr" rid="ref-23">23</xref>
                    </sup> (
                    <xref ref-type="fig" rid="f3">Figure 3</xref>).</p>
                <fig fig-type="figure" id="f3" orientation="portrait" position="float">
                    <label>Figure 3. </label>
                    <caption>
                        <title>clustifyr can utilize multiple reference data types.</title>
                        <p>UMAP projections of PBMCs showing the ground truth cell types (
                            <bold>A</bold>), or cell types called by clustifyr using microarray data from sorted immune cell types (
                            <bold>B</bold>), bulk RNA-seq from immune cell populations (
                            <bold>C</bold>) or scRNA-seq data from CBMCs (
                            <bold>D</bold>).</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/27827/48bada72-0bd4-4f03-9e5d-55abea0b1855_figure3.gif"/>
                </fig>
                <p>To assess the performance of clustifyr, we used the Tabula Muris dataset
                    <sup>
                        <xref ref-type="bibr" rid="ref-5">5</xref>
                    </sup>, which contains data generated from 12 matching tissues using both 10x Genomics 3&#x2019; end seq (&#x201c;drop&#x201d;) and Smart-Seq2 (&#x201c;facs&#x201d;) platforms. We attempted to assign cell type identities to clusters in &#x201c;drop&#x201d; Seurat objects using references built from &#x201c;facs&#x201d; Seurat objects, which contain pre-computed variable genes generated by the Seurat mean.var.plot (dispersion z-scores based on expression bins) approach. For each method we used the recommended variable gene selection approach. clustifyr uses variable genes supplied by the user and for benchmarking we used the variable genes stored in the Seurat object. scmap calculates variable genes using a modified approach based on M3drop. SingleR selects variable genes by identifying marker genes between clusters. scPred, in contrast, selects informative principal components as a feature selection procedure whereas ACTINN does not perform feature selection for classification.</p>
                <p> In benchmarking results, clustifyr is comparably accurate versus other automated classification packages (
                    <xref ref-type="fig" rid="f4">Figure 4A</xref>). Cross-platform comparisons are inherently more difficult, and the approach used by clustifyr is aimed at being platform- and normalization-agnostic. Mean runtime, including both reference building and test data classification, in Tabular Muris classifications was ~ 1 second if the required variable gene list is extracted from the query Seurat object. Alternatively, variable genes can be recalculated by other methods such as M3Drop
                    <sup>
                        <xref ref-type="bibr" rid="ref-26">26</xref>
                    </sup>, to reach similar results (clustifyr (m3drop)).</p>
                <fig fig-type="figure" id="f4" orientation="portrait" position="float">
                    <label>Figure 4. </label>
                    <caption>
                        <title>clustifyr accurately and rapidly annotates cell types.</title>
                        <p>
                            <bold>A</bold>) Accuracy and run-time of classifications generated by clustifyr or existing methods using the Tabula Muris dataset to benchmark cell type classifications between datasets generated with the Smart-Seq2 or 10x Genomics sequencing platforms. Each point represents a different tissue comparison. clustifyr (m3drop) indicates clustifyr run using variable genes defined by M3drop, clustifyr_lists (hyper) uses hypergeometric tests to compare marker gene lists, and clustifyr_lists(jaccard) calculates the jaccard index between marker gene lists to annotate cell types. 
                            <bold>B</bold>) Performance comparison of clustifyr to existing methods with random subsamples of cells from the Smart-Seq2 Tabula Muris dataset. Error bars represent standard error of the mean and are derived from 5 independent subsamples of the dataset. 
                            <bold>C</bold>) Performance comparison of clustifyr to existing methods testing classification of an Allen Institute Brain Atlas dataset from two murine brain regions that contain 34 cell types. scPred is not shown as it failed with an error on this dataset. 
                            <bold>D</bold>) Comparing clustifyr to existing methods for rejecting unseen populations using PBMC data. Three reference PBMC datasets were generated that excluded either T-cells, CD4+ T-cells or memory T-cells respectively. The % of rejected indicates the % of the indicated cell type that was not misclassified when the cell type was missing from the reference.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/27827/48bada72-0bd4-4f03-9e5d-55abea0b1855_figure4.gif"/>
                </fig>
                <p>Signature marker gene lists are an additional reference data type that is commonly used to guide cluster cell type classification. We therefore sought to determine if a gene list enrichment approach could provide comparable classification power to using correlation. clustifyr provides a function clustify_lists() which compares marker genes between query clusters to a list of marker genes per reference cell type. clustify_lists will calculate enrichment with a hypergeometric test, marker overlap with the jaccard index, or use the percent of cells expressing marker genes to annotate cell types. Alternatively, if ranked gene lists are available, Gene Set Enrichment Analysis (GSEA) using the fGSEA package
                    <sup>
                        <xref ref-type="bibr" rid="ref-27">27</xref>
                    </sup> or Spearman ranked correlation can be employed. We find that using gene expression for clustifyr classification had higher accuracy than gene list enrichment using a hypergeometric test or the jaccard index, however this approach could be very useful for datasets without scRNA or bulk RNA-seq data for use as a reference. (
                    <xref ref-type="fig" rid="f4">Figure 4A</xref>).</p>
                <p>For scalability benchmarking, we adapted an existing benchmark dataset, scRNAseq_Benchmark subsampling, which contains query and reference data with downsampled numbers of cells from the Smart-seq2 Tabula Muris dataset
                    <sup>
                        <xref ref-type="bibr" rid="ref-5">5</xref>,
                        <xref ref-type="bibr" rid="ref-13">13</xref>
                    </sup>. Once again, clustifyr is accurate and efficient, compared to other developed methods (
                    <xref ref-type="fig" rid="f4">Figure 4B</xref>). As a further comparison, we also examined classification of cell types in murine brain datasets generated by the Allen Institute Brain Atlas, and provided by the scRNAseq_Benchmark pipeline
                    <sup>
                        <xref ref-type="bibr" rid="ref-13">13</xref>
                    </sup>.  The two murine brain regions contained 34 shared cell types and clustifyr was also able to reach similarly satisfactory cell annotation compared to other annotation methods. (
                    <xref ref-type="fig" rid="f4">Figure 4C</xref>).</p>
                <p>Lastly, we applied clustifyr to a series of increasingly challenging datasets from the scRNAseq_Benchmark
                    <sup>
                        <xref ref-type="bibr" rid="ref-13">13</xref>
                    </sup> unseen population rejection test (
                    <xref ref-type="fig" rid="f4">Figure 4D</xref>). This test assess how frequently cells will be mis-assigned when the corresponded cell types are not present in the dataset. The PBMC dataset contains different T-cell subsets, which do not often cluster into discrete well-defined cell types solely based on gene expression. Without the corresponding cell type references, 57.5% of T cells were rejected and unassigned. When only CD4+ references were removed, 28.2% of test CD4+ T cells were rejected and unassigned. clustifyr was unable to reject CD4+/CD45RO+ memory T cells, mislabeling them as CD4+/CD25 T Reg instead when the exact reference was unavailable. However, these misclassifications are also observed with other classification tools benchmarked in the scRNAseq_Benchmark study (
                    <xref ref-type="fig" rid="f4">Figure 4D</xref>)
                    <sup>
                        <xref ref-type="bibr" rid="ref-13">13</xref>
                    </sup>.</p>
            </sec>
            <sec>
                <title>Benchmarking methods</title>
                <p>clustifyr was tested against scmap v1.8.0
                    <sup>
                        <xref ref-type="bibr" rid="ref-8">8</xref>
                    </sup>, SingleR v1.0.1
                    <sup>
                        <xref ref-type="bibr" rid="ref-9">9</xref>
                    </sup>, Seurat v3.1.1
                    <sup>
                        <xref ref-type="bibr" rid="ref-14">14</xref>
                    </sup>, latest GitHub versions of ACTINN
                    <sup>
                        <xref ref-type="bibr" rid="ref-11">11</xref>
                    </sup> and scPred
                    <sup>
                        <xref ref-type="bibr" rid="ref-12">12</xref>
                    </sup>, and SVM as implemented in python3 scikit-learn v0.19.1
                    <sup>
                        <xref ref-type="bibr" rid="ref-28">28</xref>
                    </sup>. scRNA-seq Tabula Muris data was downloaded as seuratV2 objects. Human pancreas data was downloaded as SCE objects. In all instances, to mimic the usage case of clustifyr, clustering and dimension reduction projections are acquired from available metadata, in lieu of new analysis.</p>
                <p>An R script was modified to benchmark clustifyr following the approach and datasets of scRNAseq_Benchmark
                    <sup>
                        <xref ref-type="bibr" rid="ref-13">13</xref>
                    </sup>, using M3Drop
                    <sup>
                        <xref ref-type="bibr" rid="ref-26">26</xref>
                    </sup> to generate variable genes for clustifyr. R code used for benchmarking, and preprocessing of other datasets, in the form of matrices and tables, are documented in R scripts available in the clustifyr and clustifyrdatahub GitHub repositories.</p>
                <p>Classification accuracy was measured using two approaches depending on the datasets compared. For datasets where the query and reference data contain identical cell types, an F1-score, the harmonic mean of the precision and recall, was calculated for each cell type (PBMC-bench, Allen Brain Institute Atlas, and Smart-Seq2 Tabula Muris subsampling). When summarizing classification accuracy across an entire dataset the median F1-score is reported. Datasets with varying cell types in the query and reference data cannot be characterized with an F1-score and instead accuracy, defined as the ratio between the number of correctly classified clusters and the overall number of clusters, is reported (Mouse cell atlas vs. Tabula Muris and Tabula Muris Smart-Seq2 vs. 10x Genomics).</p>
            </sec>
            <sec>
                <title>Operation</title>
                <p>clustifyr is distributed as part of the Bioconductor R package repository and is compatible with Mac OS X, Windows, and major Linux operating systems. Package dependencies and system requirements are documented in the clustifyr Bioconductor repository.</p>
            </sec>
        </sec>
        <sec sec-type="conclusions">
            <title>Conclusions</title>
            <p>We present a flexible and lightweight R package for cluster identity assignment. The tool bridges various forms of prior knowledge and scRNA-seq analysis. Reference sources can include scRNA-seq data with cell types assigned (or average expression per cell type, which can be stored at much smaller file sizes), sorted bulk RNA-seq, and microarray data. clustifyr, with minimal package dependencies, is compatible with a number of standard analysis workflows such as Seurat or Bioconductor, without requiring the user to perform the error-prone process of converting to a new scRNA-seq data structure and can be easily extended to incorporate other data storage object types. clustifyr is designed to perform classification after previous steps of analysis by other informatics tools. Therefore, it relies on, and is agnostic to, common external packages for cell clustering and variable feature selection. We envision it to be compatible with all current and future scRNA-seq processing, clustering, and marker gene discovery workflows. Benchmarking reveals the package performs well in mapping cluster identity across different scRNA-seq platforms and experimental types. As we and others observe
                <sup>
                    <xref ref-type="bibr" rid="ref-29">29</xref>
                </sup>, novel algorithms may not be necessary for cell type classification, at least within the current limitations of sequencing technology and our broadstroke understanding of cell &#x201c;types&#x201d;. Rather, the generation of community curated reference databases is likely to be critical for reproducible annotation of cell types in scRNA-seq datasets.</p>
            <p>On the user end, clustifyr is built with simple out-of-the-box wrapper functions, sensible defaults, yet also extensive options for more experienced users. Instead of building an additional single-cell-specific data structure, or requiring specific scRNA-seq pipeline packages, it simply handles basic data.frames (tables) and matrices (
                <xref ref-type="fig" rid="f1">Figure 1</xref>). Input query data and reference data are intentionally kept in expression matrix form for maximum flexibility, ease-of-use, and ease-of-interpretation. Also, by operating on predefined clusters, clustifyr has high scalability and minimal resource requirements on large datasets. Using per-cluster expression averages results in rapid classification. However, cell-type annotation accuracy is therefore heavily reliant on appropriate selection of the number of clusters. Users are therefore encouraged to explore cell type annotations derived from multiple clustering settings. Additionally, assigning cell types using discrete clusters may not be appropriate for datasets with continuous cellular transitions such as developmental processes, which are more suited to trajectory inference analysis methods. As an alternative, clustifyr also supports per-cell annotation, however the runtime is greatly increased and the accuracy of the cell type classifications are decreased due to the sparsity of scRNA-seq datasets, and requires a consensus aggregation step across multiple cells to obtain reliable cell type annotations.</p>
            <p>To further improve the user experience, clustifyr provides easy-to-extend implementations to identify and extract data from established scRNA-seq object formats, such as Seurat
                <sup>
                    <xref ref-type="bibr" rid="ref-14">14</xref>
                </sup>, SingleCellExperiment
                <sup>
                    <xref ref-type="bibr" rid="ref-15">15</xref>
                </sup>, URD
                <sup>
                    <xref ref-type="bibr" rid="ref-4">4</xref>
                </sup>, and CellDataSet (Monocle)
                <sup>
                    <xref ref-type="bibr" rid="ref-30">30</xref>
                </sup>. Available in flexible wrapper functions, both reference building and new classification can be directly achieved through scRNA-seq objects at hand, without going through format conversions or manual extraction. The wrappers can also be expanded to other single cell RNA-seq object types, including the HDF5-backed loom objects, as well as other data types generated by CITE-seq and similar experiments
                <sup>
                    <xref ref-type="bibr" rid="ref-31">31</xref>
                </sup>. Tutorials are documented online to help users integrate clustifyr into their workflows with these and other bioinformatics software.</p>
        </sec>
        <sec>
            <title>Software availability</title>
            <p>clustifyr is available from Bioconductor: 
                <ext-link ext-link-type="uri" xlink:href="https://bioconductor.org/packages/release/bioc/html/clustifyr.html">https://bioconductor.org/packages/release/bioc/html/clustifyr.html</ext-link>
            </p>
            <p>Up-to-date source code, and tutorials are available from: 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyr">https://github.com/rnabioco/clustifyr</ext-link>
            </p>
            <p>Package documentation is also provided at: 
                <ext-link ext-link-type="uri" xlink:href="https://rnabioco.github.io/clustifyr/">https://rnabioco.github.io/clustifyr/</ext-link>
            </p>
            <p>Archived source code as at time of publication and Supplemental Table 1 detailing datasets used in each analysis are available from:</p>
            <p>
                <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5281/zenodo.3934480">https://doi.org/10.5281/zenodo.3934480</ext-link>
                <sup>
                    <xref ref-type="bibr" rid="ref-32">32</xref>
                </sup>
            </p>
            <p>Data used in examples and additional prebuilt references available from: 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdatahub">https://github.com/rnabioco/clustifyrdatahub</ext-link>
            </p>
            <p>License: MIT</p>
        </sec>
        <sec>
            <title>Data availability</title>
            <p>Original raw data used in benchmarking is available from the following sources and additionally described in 
                <xref ref-type="table" rid="T1">Table 1</xref>.</p>
            <table-wrap id="T1A" orientation="portrait" position="float">
                <table content-type="article-table" frame="hsides">
                    <thead>
                        <tr>
                            <th align="left" colspan="1" rowspan="1">Dataset</th>
                            <th align="left" colspan="1" rowspan="1">Source</th>
                        </tr>
                    </thead>
                    <tbody>
                        <tr>
                            <td align="left" colspan="1" rowspan="1">PBMC 3k Seurat V3
                                <break/>object</td>
                            <td align="left" colspan="1" rowspan="1">
                                <ext-link ext-link-type="uri" xlink:href="https://www.dropbox.com/s/63gnlw45jf7cje8/pbmc3k_final.rds?dl=0">https://www.dropbox.com/s/63gnlw45jf7cje8/pbmc3k_final.rds?dl=0</ext-link>
                            </td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1">CBMC CITE-seq</td>
                            <td align="left" colspan="1" rowspan="1">Accession number, GSE100866: 
                                <ext-link ext-link-type="uri" xlink:href="ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE100nnn/GSE100866/suppl/GSE100866_CBMC_8K_13AB_10X-RNA_umi.csv.gz">ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE100nnn/</ext-link>
                                <break/>
                                <ext-link ext-link-type="uri" xlink:href="ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE100nnn/GSE100866/suppl/GSE100866_CBMC_8K_13AB_10X-RNA_umi.csv.gz">GSE100866/suppl/GSE100866_CBMC_8K_13AB_10X-RNA_umi.csv.gz</ext-link>
                            </td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1">Hematopoiesis
                                <break/>microarray data</td>
                            <td align="left" colspan="1" rowspan="1">Accession number, GSE24759: 
                                <ext-link ext-link-type="uri" xlink:href="https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE24759">https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE24759</ext-link>
                            </td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1">Tabula Muris as
                                <break/>Seurat V2 objects</td>
                            <td align="left" colspan="1" rowspan="1">
                                <ext-link ext-link-type="uri" xlink:href="https://figshare.com/projects/Tabula_Muris_Transcriptomic_characterization_of_20_organs_and_tissues_from_Mus_musculus_at_single_cell_resolution/27733">https://figshare.com/projects/Tabula_Muris_Transcriptomic_characterization_of_20_organs_</ext-link>
                                <break/>
                                <ext-link ext-link-type="uri" xlink:href="https://figshare.com/projects/Tabula_Muris_Transcriptomic_characterization_of_20_organs_and_tissues_from_Mus_musculus_at_single_cell_resolution/27733">and_tissues_from_Mus_musculus_at_single_cell_resolution/27733</ext-link>
                            </td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1">Mouse Cell Atlas</td>
                            <td align="left" colspan="1" rowspan="1">
                                <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.6084/m9.figshare.5435866.v8">https://doi.org/10.6084/m9.figshare.5435866.v8</ext-link>
                            </td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1">Pancreatic
                                <break/>scRNA-seq as
                                <break/>SingleCellExperiment
                                <break/>objects</td>
                            <td align="left" colspan="1" rowspan="1">
                                <ext-link ext-link-type="uri" xlink:href="https://hemberg-lab.github.io/scRNA.seq.datasets/">https://hemberg-lab.github.io/scRNA.seq.datasets/</ext-link>
                            </td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1">Allen Institute Brain
                                <break/>Atlas</td>
                            <td align="left" colspan="1" rowspan="1">
                                <ext-link ext-link-type="uri" xlink:href="http://celltypes.brain-map.org/rnaseq">http://celltypes.brain-map.org/rnaseq</ext-link>
                            </td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1">PBMC-bench</td>
                            <td align="left" colspan="1" rowspan="1">
                                <ext-link ext-link-type="uri" xlink:href="https://singlecell.broadinstitute.org/single_cell/study/SCP424/single-cell-comparison-pbmc-data">https://singlecell.broadinstitute.org/single_cell/study/SCP424/single-cell-comparison-pbmc-data</ext-link>
                            </td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1">PBMC rejection test</td>
                            <td align="left" colspan="1" rowspan="1">
                                <ext-link ext-link-type="uri" xlink:href="http://scibet.cancer-pku.cn/document.html">http://scibet.cancer-pku.cn/document.html</ext-link>
                            </td>
                        </tr>
                        <tr>
                            <td align="left" colspan="1" rowspan="1">ImmGen Database</td>
                            <td align="left" colspan="1" rowspan="1">
                                <ext-link ext-link-type="uri" xlink:href="http://www.immgen.org/">http://www.immgen.org/</ext-link>
                            </td>
                        </tr>
                    </tbody>
                </table>
            </table-wrap>
        </sec>
    </body>
    <back>
        <ack>
            <title>Acknowledgements</title>
            <p>A previous version of this article is available on bioRxiv: 
                <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1101/855064">https://doi.org/10.1101/855064</ext-link>.</p>
        </ack>
        <ref-list>
            <ref id="ref-1">
                <label>1</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Zheng</surname>
                            <given-names>GXY</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Terry</surname>
                            <given-names>JM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Belgrader</surname>
                            <given-names>P</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Massively parallel digital transcriptional profiling of single cells.</article-title>
                    <source>

                        <italic toggle="yes">Nat Commun.</italic>
</source>
                    <year>2017</year>;<volume>8</volume>:<fpage>14049</fpage>.
                    <pub-id pub-id-type="pmid">28091601</pub-id>
                    <pub-id pub-id-type="doi">10.1038/ncomms14049</pub-id>
                    <pub-id pub-id-type="pmcid">5241818</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-2">
                <label>2</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Chen</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ning</surname>
                            <given-names>B</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Shi</surname>
                            <given-names>T</given-names>
                        </name>
</person-group>:
                    <article-title>Single-Cell RNA-Seq Technologies and Related Computational Data Analysis.</article-title>
                    <source>

                        <italic toggle="yes">Front Genet.</italic>
</source>
                    <year>2019</year>;<volume>10</volume>:<fpage>317</fpage>.
                    <pub-id pub-id-type="pmid">31024627</pub-id>
                    <pub-id pub-id-type="doi">10.3389/fgene.2019.00317</pub-id>
                    <pub-id pub-id-type="pmcid">6460256</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-3">
                <label>3</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Luecken</surname>
                            <given-names>MD</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Theis</surname>
                            <given-names>FJ</given-names>
                        </name>
</person-group>:
                    <article-title>Current best practices in single-cell RNA-seq analysis: a tutorial.</article-title>
                    <source>

                        <italic toggle="yes">Mol Syst Biol.</italic>
</source>
                    <year>2019</year>;<volume>15</volume>(<issue>6</issue>):<fpage>e8746</fpage>.
                    <pub-id pub-id-type="pmid">31217225</pub-id>
                    <pub-id pub-id-type="doi">10.15252/msb.20188746</pub-id>
                    <pub-id pub-id-type="pmcid">6582955</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-4">
                <label>4</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Farrell</surname>
                            <given-names>JA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Riesenfeld</surname>
                            <given-names>SJ</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Single-cell reconstruction of developmental trajectories during zebrafish embryogenesis.</article-title>
                    <source>

                        <italic toggle="yes">Science.</italic>
</source>
                    <year>2018</year>;<volume>360</volume>(<issue>6392</issue>): pii: eaar3131.
                    <pub-id pub-id-type="pmid">29700225</pub-id>
                    <pub-id pub-id-type="doi">10.1126/science.aar3131</pub-id>
                    <pub-id pub-id-type="pmcid">6247916</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-5">
                <label>5</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <collab>Tabula Muris Consortium; Overall coordination; Logistical coordination; </collab>

                        <etal/>
</person-group>:
                    <article-title>Single-cell transcriptomics of 20 mouse organs creates a 
                        <italic toggle="yes">Tabula Muris</italic>.</article-title>
                    <source>

                        <italic toggle="yes">Nature.</italic>
</source>
                    <year>2018</year>;<volume>562</volume>(<issue>7727</issue>):<fpage>367</fpage>&#x2013;<lpage>72</lpage>.
                    <pub-id pub-id-type="pmid">30283141</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41586-018-0590-4</pub-id>
                    <pub-id pub-id-type="pmcid">6642641</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-6">
                <label>6</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kiselev</surname>
                            <given-names>VY</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Andrews</surname>
                            <given-names>TS</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hemberg</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <article-title>Challenges in unsupervised clustering of single-cell RNA-seq data.</article-title>
                    <source>

                        <italic toggle="yes">Nat Rev Genet.</italic>
</source>
                    <year>2019</year>;<volume>20</volume>(<issue>5</issue>):<fpage>273</fpage>&#x2013;<lpage>82</lpage>.
                    <pub-id pub-id-type="pmid">30617341</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41576-018-0088-9</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-7">
                <label>7</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Vallejos</surname>
                            <given-names>CA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Risso</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Scialdone</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Normalizing single-cell RNA sequencing data: challenges and opportunities.</article-title>
                    <source>

                        <italic toggle="yes">Nat Methods.</italic>
</source>
                    <year>2017</year>;<volume>14</volume>(<issue>6</issue>):<fpage>565</fpage>&#x2013;<lpage>71</lpage>.
                    <pub-id pub-id-type="pmid">28504683</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nmeth.4292</pub-id>
                    <pub-id pub-id-type="pmcid">5549838</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-8">
                <label>8</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kiselev</surname>
                            <given-names>VY</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Yiu</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hemberg</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <article-title>scmap: projection of single-cell RNA-seq data across data sets.</article-title>
                    <source>

                        <italic toggle="yes">Nat Methods.</italic>
</source>
                    <year>2018</year>;<volume>15</volume>(<issue>5</issue>):<fpage>359</fpage>&#x2013;<lpage>62</lpage>.
                    <pub-id pub-id-type="pmid">29608555</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nmeth.4644</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-9">
                <label>9</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Aran</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Looney</surname>
                            <given-names>AP</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Liu</surname>
                            <given-names>L</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage.</article-title>
                    <source>

                        <italic toggle="yes">Nat Immunol.</italic>
</source>
                    <year>2019</year>;<volume>20</volume>(<issue>2</issue>):<fpage>163</fpage>&#x2013;<lpage>72</lpage>.
                    <pub-id pub-id-type="pmid">30643263</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41590-018-0276-y</pub-id>
                    <pub-id pub-id-type="pmcid">6340744</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-10">
                <label>10</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Pliner</surname>
                            <given-names>HA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Shendure</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Trapnell</surname>
                            <given-names>C</given-names>
                        </name>
</person-group>:
                    <article-title>Supervised classification enables rapid annotation of cell atlases.</article-title>
                    <source>

                        <italic toggle="yes">Nat Methods.</italic>
</source>
                    <year>2019</year>;<volume>16</volume>(<issue>10</issue>):<fpage>983</fpage>&#x2013;<lpage>6</lpage>.
                    <pub-id pub-id-type="pmid">31501545</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41592-019-0535-3</pub-id>
                    <pub-id pub-id-type="pmcid">6791524</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-11">
                <label>11</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ma</surname>
                            <given-names>F</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Pellegrini</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <article-title>ACTINN: automated identification of cell types in single cell RNA sequencing.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2020</year>;<volume>36</volume>(<issue>2</issue>):<fpage>533</fpage>&#x2013;<lpage>8</lpage>.
                    <pub-id pub-id-type="pmid">31359028</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btz592</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-12">
                <label>12</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Alquicira-Hernandez</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sathe</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ji</surname>
                            <given-names>HP</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>
                        <italic toggle="yes">scPred</italic>: accurate supervised method for cell-type classification from single-cell RNA-seq data.</article-title>
                    <source>

                        <italic toggle="yes">Genome Biol.</italic>
</source>
                    <year>2019</year>;<volume>20</volume>(<issue>1</issue>):<fpage>264</fpage>.
                    <pub-id pub-id-type="pmid">31829268</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13059-019-1862-5</pub-id>
                    <pub-id pub-id-type="pmcid">6907144</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-13">
                <label>13</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Abdelaal</surname>
                            <given-names>T</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Michielsen</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Cats</surname>
                            <given-names>D</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A comparison of automatic cell identification methods for single-cell RNA sequencing data.</article-title>
                    <source>

                        <italic toggle="yes">Genome Biol.</italic>
</source>
                    <year>2019</year>;<volume>20</volume>(<issue>1</issue>):<fpage>194</fpage>.
                    <pub-id pub-id-type="pmid">31500660</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13059-019-1795-z</pub-id>
                    <pub-id pub-id-type="pmcid">6734286</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-14">
                <label>14</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Butler</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hoffman</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Smibert</surname>
                            <given-names>P</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Integrating single-cell transcriptomic data across different conditions, technologies, and species.</article-title>
                    <source>

                        <italic toggle="yes">Nat Biotechnol.</italic>
</source>
                    <year>2018</year>;<volume>36</volume>(<issue>5</issue>):<fpage>411</fpage>&#x2013;<lpage>20</lpage>.
                    <pub-id pub-id-type="pmid">29608179</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nbt.4096</pub-id>
                    <pub-id pub-id-type="pmcid">6700744</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-15">
                <label>15</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Lun</surname>
                            <given-names>ATL</given-names>
                        </name>

                        <name name-style="western">
                            <surname>McCarthy</surname>
                            <given-names>DJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Marioni</surname>
                            <given-names>JC</given-names>
                        </name>
</person-group>:
                    <article-title>A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor [version 2; peer review: 3 approved, 2 approved with reservations].</article-title>
                    <source>

                        <italic toggle="yes">F1000Res.</italic>
</source>
                    <year>2016</year>;<volume>5</volume>:<fpage>2122</fpage>.
                    <pub-id pub-id-type="pmid">27909575</pub-id>
                    <pub-id pub-id-type="doi">10.12688/f1000research.9501.2</pub-id>
                    <pub-id pub-id-type="pmcid">5112579</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-16">
                <label>16</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ding</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Adiconis</surname>
                            <given-names>X</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Simmons</surname>
                            <given-names>SK</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Systematic comparative analysis of single cell RNA-sequencing methods.</article-title>
                    <source>

                        <italic toggle="yes">bioRxiv.</italic>
</source>
                    <year>2019</year>;<fpage>632216</fpage>.
                    <pub-id pub-id-type="doi">10.1101/632216</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-17">
                <label>17</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Li</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Liu</surname>
                            <given-names>B</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kang</surname>
                            <given-names>B</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>SciBet as a portable and fast single cell type identifier.</article-title>
                    <source>

                        <italic toggle="yes">Nat Commun.</italic>
</source>
                    <year>2020</year>;<volume>11</volume>(<issue>1</issue>):<fpage>1818</fpage>.
                    <pub-id pub-id-type="pmid">32286268</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41467-020-15523-2</pub-id>
                    <pub-id pub-id-type="pmcid">7156687</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-18">
                <label>18</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Baron</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Veres</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wolock</surname>
                            <given-names>SL</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure.</article-title>
                    <source>

                        <italic toggle="yes">Cell Syst.</italic>
</source>
                    <year>2016</year>;<volume>3</volume>(<issue>4</issue>):<fpage>346</fpage>&#x2013;<lpage>360.e4</lpage>.
                    <pub-id pub-id-type="pmid">27667365</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.cels.2016.08.011</pub-id>
                    <pub-id pub-id-type="pmcid">5228327</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-19">
                <label>19</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Segerstolpe</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Palasantza</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Eliasson</surname>
                            <given-names>P</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Single-Cell Transcriptome Profiling of Human Pancreatic Islets in Health and Type 2 Diabetes.</article-title>
                    <source>

                        <italic toggle="yes">Cell Metab.</italic>
</source>
                    <year>2016</year>;<volume>24</volume>(<issue>4</issue>):<fpage>593</fpage>&#x2013;<lpage>607</lpage>.
                    <pub-id pub-id-type="pmid">27667667</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.cmet.2016.08.020</pub-id>
                    <pub-id pub-id-type="pmcid">5069352</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-20">
                <label>20</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Du&#x00f2;</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Robinson</surname>
                            <given-names>MD</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Soneson</surname>
                            <given-names>C</given-names>
                        </name>
</person-group>:
                    <article-title>A systematic performance evaluation of clustering methods for single-cell RNA-seq data [version 2; peer review: 2 approved].</article-title>
                    <source>

                        <italic toggle="yes">F1000Res.</italic>
</source>
                    <year>2018</year>;<volume>7</volume>:<fpage>1141</fpage>.
                    <pub-id pub-id-type="pmid">30271584</pub-id>
                    <pub-id pub-id-type="doi">10.12688/f1000research.15666.2</pub-id>
                    <pub-id pub-id-type="pmcid">6134335</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-21">
                <label>21</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Soneson</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Robinson</surname>
                            <given-names>MD</given-names>
                        </name>
</person-group>:
                    <article-title>Bias, robustness and scalability in single-cell differential expression analysis.</article-title>
                    <source>

                        <italic toggle="yes">Nat Methods.</italic>
</source>
                    <year>2018</year>;<volume>15</volume>(<issue>4</issue>):<fpage>255</fpage>&#x2013;<lpage>61</lpage>.
                    <pub-id pub-id-type="pmid">29481549</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nmeth.4612</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-22">
                <label>22</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Han</surname>
                            <given-names>X</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zhou</surname>
                            <given-names>Y</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Mapping the Mouse Cell Atlas by Microwell-Seq.</article-title>
                    <source>

                        <italic toggle="yes">Cell.</italic>
</source>
                    <year>2018</year>;<volume>172</volume>(<issue>5</issue>):<fpage>1091</fpage>&#x2013;<lpage>1107.e17</lpage>.
                    <pub-id pub-id-type="pmid">29474909</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.cell.2018.02.001</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-23">
                <label>23</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Stoeckius</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hafemeister</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Stephenson</surname>
                            <given-names>W</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Simultaneous epitope and transcriptome measurement in single cells.</article-title>
                    <source>

                        <italic toggle="yes">Nat Methods.</italic>
</source>
                    <year>2017</year>;<volume>14</volume>(<issue>9</issue>):<fpage>865</fpage>&#x2013;<lpage>868</lpage>.
                    <pub-id pub-id-type="pmid">28759029</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nmeth.4380</pub-id>
                    <pub-id pub-id-type="pmcid">5669064</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-24">
                <label>24</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Novershtern</surname>
                            <given-names>N</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Subramanian</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lawton</surname>
                            <given-names>LN</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Densely interconnected transcriptional circuits control cell states in human hematopoiesis.</article-title>
                    <source>

                        <italic toggle="yes">Cell.</italic>
</source>
                    <year>2011</year>;<volume>144</volume>(<issue>2</issue>):<fpage>296</fpage>&#x2013;<lpage>309</lpage>.
                    <pub-id pub-id-type="pmid">21241896</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.cell.2011.01.004</pub-id>
                    <pub-id pub-id-type="pmcid">3049864</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-25">
                <label>25</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Heng</surname>
                            <given-names>TSP</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Painter</surname>
                            <given-names>MW, </given-names>
                        </name>
</person-group>
                    <collab>Immunological Genome Project Consortium</collab>:
                    <article-title>The Immunological Genome Project: networks of gene expression in immune cells.</article-title>
                    <source>

                        <italic toggle="yes">Nat Immunol.</italic>
</source>
                    <year>2008</year>;<volume>9</volume>(<issue>10</issue>):<fpage>1091</fpage>&#x2013;<lpage>4</lpage>.
                    <pub-id pub-id-type="pmid">18800157</pub-id>
                    <pub-id pub-id-type="doi">10.1038/ni1008-1091</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-26">
                <label>26</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Andrews</surname>
                            <given-names>TS</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hemberg</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <article-title>M3Drop: dropout-based feature selection for scRNASeq.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2019</year>;<volume>35</volume>(<issue>16</issue>):<fpage>2865</fpage>&#x2013;<lpage>7</lpage>.
                    <pub-id pub-id-type="pmid">30590489</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/bty1044</pub-id>
                    <pub-id pub-id-type="pmcid">6691329</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-27">
                <label>27</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Korotkevich</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sukhov</surname>
                            <given-names>V</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sergushichev</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>Fast gene set enrichment analysis.</article-title>
                    <source>

                        <italic toggle="yes">bioRxiv.</italic>
</source>
                    <year>2019</year>;<fpage>060012</fpage>.
                    <pub-id pub-id-type="doi">10.1101/060012</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-28">
                <label>28</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Pedregosa</surname>
                            <given-names>F</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Varoquaux</surname>
                            <given-names>G</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gramfort</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Scikit-learn: Machine Learning in Python.</article-title>
                    <source>

                        <italic toggle="yes">J Mach Learn Res.</italic>
</source>
                    <year>2011</year>;<volume>12</volume>:<fpage>2825</fpage>&#x2013;<lpage>30</lpage>.
                    <ext-link ext-link-type="uri" xlink:href="http://www.jmlr.org/papers/volume12/pedregosa11a/pedregosa11a.pdf">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-29">
                <label>29</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>K&#x00f6;hler</surname>
                            <given-names>ND</given-names>
                        </name>

                        <name name-style="western">
                            <surname>B&#x00fc;ttner</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Theis</surname>
                            <given-names>FJ</given-names>
                        </name>
</person-group>:
                    <article-title>Deep learning does not outperform classical machine learning for cell-type annotation.</article-title>
                    <source>

                        <italic toggle="yes">bioRxiv.</italic>
</source>
                    <year>2019 [cited 2020 Jan 28]</year>;<fpage>653907</fpage>.
                    <pub-id pub-id-type="doi">10.1101/653907</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-30">
                <label>30</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Cao</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Spielmann</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Qiu</surname>
                            <given-names>X</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The single-cell transcriptional landscape of mammalian organogenesis.</article-title>
                    <source>

                        <italic toggle="yes">Nature.</italic>
</source>
                    <year>2019</year>;<volume>566</volume>(<issue>7745</issue>):<fpage>496</fpage>&#x2013;<lpage>502</lpage>.
                    <pub-id pub-id-type="pmid">30787437</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41586-019-0969-x</pub-id>
                    <pub-id pub-id-type="pmcid">6434952</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-31">
                <label>31</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Richer</surname>
                            <given-names>AL</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Riemondy</surname>
                            <given-names>KA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hardie</surname>
                            <given-names>L</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Simultaneous measurement of biochemical phenotypes and gene expression in single cells.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2020</year>;<volume>48</volume>(<issue>10</issue>):<fpage>e59</fpage>.
                    <pub-id pub-id-type="pmid">32286626</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkaa240</pub-id>
                    <pub-id pub-id-type="pmcid">7261187</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-32">
                <label>32</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Fu</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gillen</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sheridan</surname>
                            <given-names>R</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>rnabioco/clustifyr 0.99.7 (Version 0.99.7).</article-title>
                    <source>

                        <italic toggle="yes">Zenodo.</italic>
</source>
                    <year>2020</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://www.doi.org/10.5281/zenodo.3934480">http://www.doi.org/10.5281/zenodo.3934480</ext-link>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report67326">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.27827.r67326</article-id>
            <title-group>
                <article-title>Reviewer response for version 2</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Slowikowski</surname>
                        <given-names>Kamil</given-names>
                    </name>
                    <xref ref-type="aff" rid="r67326a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-2843-6370</uri>
                </contrib>
                <aff id="r67326a1">
                    <label>1</label>Massachusetts General Hospital, Boston, MA, USA</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>24</day>
                <month>7</month>
                <year>2020</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2020 Slowikowski K</copyright-statement>
                <copyright-year>2020</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport67326" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.22969.2"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>Thank you for improving the manuscript!</p>
            <p> </p>
            <p> I have no further comments.</p>
            <p>Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?</p>
            <p>Partly</p>
            <p>Is the rationale for developing the new software tool clearly explained?</p>
            <p>Yes</p>
            <p>Is the description of the software tool technically sound?</p>
            <p>Yes</p>
            <p>Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?</p>
            <p>Partly</p>
            <p>Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>Bioinformatics, computational biology, immunogenomics, scRNA-seq.</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report67325">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.27827.r67325</article-id>
            <title-group>
                <article-title>Reviewer response for version 2</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Korthauer</surname>
                        <given-names>Keegan</given-names>
                    </name>
                    <xref ref-type="aff" rid="r67325a1">1</xref>
                    <xref ref-type="aff" rid="r67325a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-4565-1654</uri>
                </contrib>
                <aff id="r67325a1">
                    <label>1</label>BC Children's Hospital Research Institute, Vancouver, BC, Canada</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>17</day>
                <month>7</month>
                <year>2020</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2020 Korthauer K</copyright-statement>
                <copyright-year>2020</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport67325" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.22969.2"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The authors have addressed all comments and concerns.</p>
            <p>Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?</p>
            <p>Partly</p>
            <p>Is the rationale for developing the new software tool clearly explained?</p>
            <p>Yes</p>
            <p>Is the description of the software tool technically sound?</p>
            <p>Yes</p>
            <p>Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?</p>
            <p>Yes</p>
            <p>Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>Statistical genomics, bioinformatics</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.</p>
        </body>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report63065">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.25358.r63065</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Slowikowski</surname>
                        <given-names>Kamil</given-names>
                    </name>
                    <xref ref-type="aff" rid="r63065a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-2843-6370</uri>
                </contrib>
                <aff id="r63065a1">
                    <label>1</label>Massachusetts General Hospital, Boston, MA, USA</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>2</day>
                <month>6</month>
                <year>2020</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2020 Slowikowski K</copyright-statement>
                <copyright-year>2020</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport63065" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.22969.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The authors describe an R package for annotating cell clusters in scRNA-seq datasets. Specifically, the package implements code for computing correlations between the columns of two data matrices. They show that high correlations between unknown cell clusters in the first data matrix and annotated cell types in the second matrix can be used to label the unknown cell clusters. They try varying parameters and show the effects on the results, and they also benchmark the time and accuracy compared to other packages designed to annotate scRNA-seq data.</p>
            <p> </p>
            <p> Details of the code, methods, and analyses are partly provided. Some details seem to be missing (e.g. the functionality for gene lists).</p>
            <p> </p>
            <p> The conclusions about the tool and its performance are partly supported by the findings presented in the article. Some terms such as "medF1-score" and "accuracy" are left undefined, and some results omit some methods (Figure 4A has different methods than B or C). Readers may have difficulty understanding the specific questions that were asked and what results are shown.</p>
            <p> </p>
            <p> Main comments: 
                <list list-type="order">
                    <list-item>
                        <p>The clarity of the manuscript can be increased by adding more verbose details about all analyses. Please consider expanding details about each question, the approach, the datasets used, and the results.</p>
                    </list-item>
                    <list-item>
                        <p>Please consider adding a table describing the reference datasets used in this article, just like the one shown on one of your GitHub repositories. This should help&#x00a0;to summarize&#x00a0;which datasets were used for the analyses in this article.</p>
                    </list-item>
                </list> </p>
            <p> Comments about specific parts of the manuscript are below. Excerpts from the article are shown in 
                <italic>"quoted</italic>&#x00a0;
                <italic>italics"</italic>&#x00a0;after a bullet point, and my comments are shown directly below the bullet point.</p>
            <p> &#x00a0; 
                <list list-type="bullet">
                    <list-item>
                        <p>
                            <italic>"A key challenge in scRNA-seq data analysis is the identification of cell types from single-cell transcriptomes. Manual inspection of the expression patterns from a small number of marker genes is still standard practice, which is cumbersome and frequently inaccurate."</italic>
                        </p>
                    </list-item>
                </list> Do we know the accuracy by manual inspection? Is there a reference for this? In the absence of evidence, you might consider weakening the statement to say "may be inaccurate" rather than "is cumbersome and frequently inaccurate". You might consider that many scRNA-seq experiments are done for the purpose of discovering new cell types that have not been well-described in previous published datasets. In this setting, manual inspection is necessary and automated analyses could be inaccurate or misleading.</p>
            <p> &#x00a0; 
                <list list-type="bullet">
                    <list-item>
                        <p>
                            <italic>"Currently, multiple cell type assignment packages exist but they are specifically tailored towards input types or workflows
                                <sup>8&#x2013;13</sup>."</italic>
                        </p>
                    </list-item>
                </list> Please consider naming and describing each method that will be compared to clustifyr in this manuscript, so the reader can assess how the methodology of clustifyr compares to other methods. Which methods are "specifically tailored towards input types or workflows"? Could you give an example to help the reader understand this claim?</p>
            <p> </p>
            <p> </p>
            <p> 
                <bold>Suggested improvements for Figure 1:</bold> 
                <list list-type="bullet">
                    <list-item>
                        <p>In Figure 1, you might consider showing the dimensions of the inputs and outputs. This might help the reader to understand how they relate to each other.</p>
                    </list-item>
                    <list-item>
                        <p>Should the query and reference data be counts? CPM? Or Log2(CPM + 1)? You might consider elaborating on this.</p>
                    </list-item>
                </list> </p>
            <p> 
                <bold>Suggested improvements for Figure 2:</bold> 
                <list list-type="bullet">
                    <list-item>
                        <p>Please consider rotating Figure 2A, D, and E 90 degrees clockwise to improve legibility.</p>
                    </list-item>
                    <list-item>
                        <p>Please consider limiting the axes ranges to the data instead of using the range [0, 1].</p>
                    </list-item>
                    <list-item>
                        <p>Please consider increasing all font sizes in all panels in all figures, including titles, legends, axis text, etc. Some readers might need larger sizes to see clearly.</p>
                    </list-item>
                    <list-item>
                        <p>Please consider changing the title to "All genes (n = 10,000)" and "M3Drop variable genes (n = 1,000)"&#x00a0; in Figure 2C, so we have some sense of the number of genes used to generate each heatmap.</p>
                    </list-item>
                    <list-item>
                        <p>Please consider showing a graphical representation of the experiment setup for this figure. What is the reference? What is the query? What are their dimensions? What is the main question in this analysis?</p>
                    </list-item>
                    <list-item>
                        <p>One way to&#x00a0;enhance clarity is to add&#x00a0;descriptive titles to every figure in every panel (e.g. "Testing different correlation statistics", etc.).</p>
                    </list-item>
                    <list-item>
                        <p>Please consider adding more details to the legend text for Figure 2 to help readers understand exactly what experiment has been done, what data was used, and what result is shown.</p>
                    </list-item>
                    <list-item>
                        <p>In Figure 2C, it seems that the y-axis and x-axis have been swapped by mistake. I see that the y-axis is labeled "ground truth cell type" but it includes "unclassified". I would expect the category "unclassified" to appear in the "called cell type" axis, but not in the "ground truth cell type" axis. Are the axes swapped or are they correct? Could you please clarify?</p>
                    </list-item>
                    <list-item>
                        <p>In Figure 2E, what does the color indicate? Is it the power argument "n^x" or something else?</p>
                    </list-item>
                    <list-item>
                        <p>The reader may be wondering: 
                            <list list-type="bullet">
                                <list-item>
                                    <p>How many query cells did you use?</p>
                                </list-item>
                                <list-item>
                                    <p>How many clusters were in the query dataset? How many cells per cluster?</p>
                                </list-item>
                                <list-item>
                                    <p>How many reference datasets were used?</p>
                                </list-item>
                                <list-item>
                                    <p>How many clusters were in the reference dataset?</p>
                                </list-item>
                                <list-item>
                                    <p>Were the query and reference datasets acquired from the same tissue sample or were they completely independent and unrelated?</p>
                                </list-item>
                            </list> </p>
                    </list-item>
                </list> </p>
            <p> In the section "Subclustering", please consider adding more details to help the reader avoid misunderstandings. What exactly is the "sub-clustering power argument (x)"? Please consider giving a concrete example to help the reader understand this section. Please consider creating a new figure that helps the reader to understand the "subcluster()" functionality.</p>
            <p> </p>
            <p> What is the PBMCbench data? Is this the same data as mentioned in the section "Correlation minimum cutoff"?&#x00a0;</p>
            <p> </p>
            <p> In the section "Cells per cluster", you might consider introducing the dataset, then introducing the question that is being addressed, and finally reporting the results. What is the number (15, 8, 4)? Is the "Mouse Cell Atlas" the same as the "Tabula Muris"? Were these mouse datasets used in the previous sections? The reader might benefit from an introduction of these datasets.</p>
            <p> </p>
            <p> </p>
            <p> 
                <bold>Suggested improvements for Figure 3:</bold> 
                <list list-type="bullet">
                    <list-item>
                        <p>Please consider adding labels "A", "B", "C", "D" to mark each of the four panels, so they can be referenced clearly.</p>
                    </list-item>
                    <list-item>
                        <p>Please consider using the same name consistently in the text and the figure titles. For example, the figure says "Bulk RNA-seq reference data" but the text says "ImmGen database". The reader might better understand the results if the same label were used in both places instead of using two different labels for the same thing.</p>
                    </list-item>
                    <list-item>
                        <p>Please consider including the identifiers for readers who wish to find these datasets and download them. For example, if the datasets are available on NCBI GEO, please consider including the accession numbers directly in the legend text, or in a table. Check to see if any other database provides an accession number. If an accession number is not available, please consider providing the DOI for a publication or a URL for a website that provides the data. By the way, if any data you are using is not deposited to a permanent repository, please consider uploading this data to a permanent repository (e.g. Figshare).</p>
                    </list-item>
                </list> </p>
            <p> In the section describing Figure 4A, please consider these suggested changes: 
                <list list-type="bullet">
                    <list-item>
                        <p>Please explain what is "clustifyr", "clustifyr_lists", and "clustifyr_m3drop".</p>
                    </list-item>
                    <list-item>
                        <p>How was feature selection performed for each analysis in Figure 4A?</p>
                    </list-item>
                    <list-item>
                        <p>What is the strategy used by scmap?</p>
                    </list-item>
                    <list-item>
                        <p>What is the strategy used by "Seurat"?</p>
                    </list-item>
                    <list-item>
                        <p>What is the strategy used by "SingleR"?</p>
                    </list-item>
                    <list-item>
                        <p>How is clustifyr similar or different?</p>
                    </list-item>
                </list> </p>
            <p> This section says "Correlation-based clustifyr classification performed better than hypergeometirc-based gene list enrichment as implemented in clustify_lists." Please consider explaining the "clustify_lists" algorithm in detail and also consider sharing the quantification of the performance of each approach so the reader can interpret the claim "performed better". Also consider elaborating on "performed better".</p>
            <p> </p>
            <p> What is "scRNAseq_Benchmark subsampling"? Could you elaborate on what this is and why it was used?</p>
            <p> </p>
            <p> </p>
            <p> 
                <bold>Suggested improvements for Figure 4:</bold> 
                <list list-type="bullet">
                    <list-item>
                        <p>Please consider including an overview schematic to help the reader understand which datasets were used for each result.</p>
                    </list-item>
                    <list-item>
                        <p>Please define "accuracy". What is the algorithm for computing this number?</p>
                    </list-item>
                    <list-item>
                        <p>Please define "medF1-score". What is the algorithm for computing this number?</p>
                    </list-item>
                    <list-item>
                        <p>For the lower half of panel B, please consider using a format similar to the one in Figure 2B from Kiselev 
                            <italic>et al.</italic>&#x00a0;(2018
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-63065-1">1</xref>
                            </sup>). For example, please use a log10 axis for time, so readers can see the difference between methods.</p>
                    </list-item>
                    <list-item>
                        <p>Why is "medF1-score" used for Figure 4C and "accuracy" for Figure 4B?</p>
                    </list-item>
                </list> </p>
            <p> Why does Figure 4A have 6 methods, Figure 4B have 5 methods, and Figure 4C have 3 methods? Is it possible to include all 6 methods for all panels? Could you please comment on the reasons for excluding or including methods in each analysis? 
                <list list-type="bullet">
                    <list-item>
                        <p>
                            <italic>"As we and others observe
                                <sup>25</sup>, novel algorithms may not be necessary for cell type classification, at least within the current limitations of sequencing technology and our broadstroke understanding of cell &#x201c;types&#x201d;. Rather, the generation of community curated reference databases is likely to be critical for reproducible annotation of cell types in scRNA-seq datasets."</italic>
                        </p>
                    </list-item>
                </list> I agree that a community curated reference database would be a valuable contribution to the field. You might consider creating a table or other type of descriptive listing that helps the reader to understand all of the references that were used in this article. Consider including tissue source, healthy or disease status, number of cells and genes, technology used for the assay, DOI, data URL, NCBI GEO accession, or any other details that the reader might find helpful.</p>
            <p> </p>
            <p> Thank you for providing a GitHub repository with data files! Please also consider sharing the same data in compressed plain text format (e.g. "file.tsv.gz"). In addition to GitHub, please consider using a specialty service that is funded for the purpose of permanently archiving research data such as NIH Figshare (
                <ext-link ext-link-type="uri" xlink:href="https://nih.figshare.com">https://nih.figshare.com</ext-link>). There are other options (Zenodo, Open Science Framework OSF, etc.). 
                <list list-type="bullet">
                    <list-item>
                        <p>
                            <italic>"As an alternative, clustifyr also supports per-cell annotation, however the runtime is greatly increased and the accuracy of the cell type classifications are decreased due to the sparsity of scRNA-seq datasets, and requires a consensus aggregation step across multiple cells to obtain reliable cell type annotations."</italic>
                        </p>
                    </list-item>
                </list> You might consider offering another alternative option. One extreme is to use the cluster averages, while the other extreme is to use single cells. Perhaps there might be a middle ground where clustifyr could automatically use k-means or some other algorithm to form clusters within the user-defined clusters. This would give the user even more flexibility.</p>
            <p> </p>
            <p> After reviewing the code, I can see that there is an "overcluster()" function that seems to do exactly what I suggested. Please consider describing this in the article and showing an example of how it works. In retrospect, I can see that the section titled "Subclustering" was supposed to describe this topic &#x2014; I misunderstood this section on the first read.</p>
            <p> </p>
            <p> You may want to double-check all of the links in all of your HTML pages. I see three URLs: 
                <list list-type="bullet">
                    <list-item>
                        <p>
                            <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdatahub/">https://github.com/rnabioco/clustifyrdatahub/</ext-link>
                        </p>
                    </list-item>
                    <list-item>
                        <p>
                            <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyr">https://github.com/rnabioco/clustifyr</ext-link>
                        </p>
                    </list-item>
                    <list-item>
                        <p>
                            <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata">https://github.com/rnabioco/clustifyrdata</ext-link>
                        </p>
                    </list-item>
                </list> </p>
            <p> I can see that the "clustifyrdatahub" repo has code for creating ".rda" files from the reference datasets.</p>
            <p> </p>
            <p> I also see similar scripts at 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/tree/master/data-raw">https://github.com/rnabioco/clustifyrdata/tree/master/data-raw</ext-link>
            </p>
            <p> </p>
            <p> Readers might be confused when they see two different repos with similar scripts. You might consider deleting the "clustifyrdatahub" repo if it is not necessary.</p>
            <p> </p>
            <p> I'm happy to see that the data is organized and annotated in the GitHub repo. Specifically, in the GitHub "clustifyrdata" repo, in the "README.md" file, the table shows the name of the reference, the number of cell types, the number of genes, the organism, and a link to the publication. Please consider adding some version of this table to the article, so the reader can understand the scope of this article.</p>
            <p> </p>
            <p> After reviewing the code, I was able to resolve some of my misunderstandings caused by lack of clarity in the terse descriptions in this article. To reduce the chance of misunderstanding by other readers, you might consider clarifying or adding details to the descriptions of functions and results. For example, the article does not mention that GSEA is used to work with gene lists.</p>
            <p>Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?</p>
            <p>Partly</p>
            <p>Is the rationale for developing the new software tool clearly explained?</p>
            <p>Yes</p>
            <p>Is the description of the software tool technically sound?</p>
            <p>Yes</p>
            <p>Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?</p>
            <p>Partly</p>
            <p>Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>Bioinformatics, computational biology, immunogenomics, scRNA-seq.</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <back>
            <ref-list>
                <title>References</title>
                <ref id="rep-ref-63065-1">
                    <label>1</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>scmap: projection of single-cell RNA-seq data across data sets.</article-title>
                        <source>
                            <italic>Nat Methods</italic>
                        </source>.<volume>15</volume>(<issue>5</issue>) :
                        <elocation-id>10.1038/nmeth.4644</elocation-id>
                        <fpage>359</fpage>-<lpage>362</lpage>
                        <pub-id pub-id-type="pmid">29608555</pub-id>
                        <pub-id pub-id-type="doi">10.1038/nmeth.4644</pub-id>
                    </mixed-citation>
                </ref>
            </ref-list>
        </back>
        <sub-article article-type="response" id="comment5695-63065">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Riemondy</surname>
                            <given-names>Kent</given-names>
                        </name>
                        <aff>University of Colorado, Anschutz Medical Campus, USA</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>None to declare.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>7</day>
                    <month>7</month>
                    <year>2020</year>
                </pub-date>
            </front-stub>
            <body>
                <p>We thank the reviewer for their detailed suggestions which we believe have substantially improved the clarity of the manuscript. Our responses are indicated in italics below.&#x00a0;</p>
                <p>The authors describe an R package for annotating cell clusters in scRNA-seq datasets. Specifically, the package implements code for computing correlations between the columns of two data matrices. They show that high correlations between unknown cell clusters in the first data matrix and annotated cell types in the second matrix can be used to label the unknown cell clusters. They try varying parameters and show the effects on the results, and they also benchmark the time and accuracy compared to other packages designed to annotate scRNA-seq data.</p>
                <p>Details of the code, methods, and analyses are partly provided. Some details seem to be missing (e.g. the functionality for gene lists).</p>
                <p>The conclusions about the tool and its performance are partly supported by the findings presented in the article. Some terms such as "medF1-score" and "accuracy" are left undefined, and some results omit some methods (Figure 4A has different methods than B or C). Readers may have difficulty understanding the specific questions that were asked and what results are shown.</p>
                <p>Main comments: 
                    <list list-type="bullet">
                        <list-item>
                            <p>The clarity of the manuscript can be increased by adding more verbose details about all analyses. Please consider expanding details about each question, the approach, the datasets used, and the results.</p>
                        </list-item>
                    </list> 
                    <italic>In an effort to more clearly present clustifyr we have added additional details about each dataset, the questions posed by the analysis, and the conclusions from each analysis.&#x00a0;&#x00a0;</italic> 
                    <list list-type="bullet">
                        <list-item>
                            <p>Please consider adding a table describing the reference datasets used in this article, just like the one shown on one of your GitHub repositories. This should help to summarize which datasets were used for the analyses in this article.</p>
                        </list-item>
                    </list> 
                    <italic>We have added a table to the main text (Table 1) and a supplemental table that provide additional details about each dataset, and provide a reference of each dataset used in each figure panel.</italic>
                </p>
                <p>Comments about specific parts of the manuscript are below. Excerpts from the article are shown in "quoted italics" after a bullet point, and my comments are shown directly below the bullet point.</p>
                <p>&#x00a0; 
                    <list list-type="bullet">
                        <list-item>
                            <p>"A key challenge in scRNA-seq data analysis is the identification of cell types from single-cell transcriptomes. Manual inspection of the expression patterns from a small number of marker genes is still standard practice, which is cumbersome and frequently inaccurate."</p>
                        </list-item>
                    </list> Do we know the accuracy by manual inspection? Is there a reference for this? In the absence of evidence, you might consider weakening the statement to say "may be inaccurate" rather than "is cumbersome and frequently inaccurate". You might consider that many scRNA-seq experiments are done for the purpose of discovering new cell types that have not been well-described in previous published datasets. In this setting, manual inspection is necessary and automated analyses could be inaccurate or misleading.</p>
                <p>
                    <italic>To our knowledge there has not been a direct study of the accuracy of manual inspection compared to automated methods. We thank the reviewer for noting this point and have weakened this statement accordingly. We also have noted that automated methods can supplement manual inspection of markers to provide additional justification of the discovery of novel cell types.&#x00a0;</italic> 
                    <list list-type="bullet">
                        <list-item>
                            <p>"Currently, multiple cell type assignment packages exist but they are specifically tailored towards input types or workflows8&#x2013;13."</p>
                        </list-item>
                    </list> Please consider naming and describing each method that will be compared to clustifyr in this manuscript, so the reader can assess how the methodology of clustifyr compares to other methods. Which methods are "specifically tailored towards input types or workflows"? Could you give an example to help the reader understand this claim?</p>
                <p>
                    <italic>We have added descriptions of the methodologies used by tools that we compared clustifyr against (see Introduction). We also have noted which tools are tailored towards input types: reference single cell data (Seurat, ACTINN, scPred) or workflows: using Seurat objects&#x00a0; (Seurat), using singleCellExperiment (singleR, scPred), or using the command-line (ACTINN).</italic>
                </p>
                <p>Suggested improvements for Figure 1: 
                    <list list-type="bullet">
                        <list-item>
                            <p>In Figure 1, you might consider showing the dimensions of the inputs and outputs. This might help the reader to understand how they relate to each other.</p>
                        </list-item>
                    </list> 
                    <italic>We have added the dimensions to provide clarity.</italic> 
                    <list list-type="bullet">
                        <list-item>
                            <p>Should the query and reference data be counts? CPM? Or Log2(CPM + 1)? You might consider elaborating on this.</p>
                        </list-item>
                    </list> 
                    <italic>Clustifyr supports both raw counts or log normalized values. The decision of which to use is left to the user and we recommend using similar normalization as used for the reference matrix, if possible. We have added text (under Variable gene selection and normalization) to provide guidance to the use</italic>r.</p>
                <p>Suggested improvements for Figure 2: 
                    <list list-type="bullet">
                        <list-item>
                            <p>Please consider rotating Figure 2A, D, and E 90 degrees clockwise to improve legibility.</p>
                        </list-item>
                    </list> 
                    <italic>We have amended the figures as suggested.</italic> 
                    <list list-type="bullet">
                        <list-item>
                            <p>Please consider limiting the axes ranges to the data instead of using the range [0, 1].</p>
                        </list-item>
                    </list> 
                    <italic>We respectfully decline to implement this suggestion, as we believe restricting the plot to only the range of the data can over-emphasize minor differences in distributions.</italic>&#x00a0; 
                    <list list-type="bullet">
                        <list-item>
                            <p>Please consider increasing all font sizes in all panels in all figures, including titles, legends, axis text, etc. Some readers might need larger sizes to see clearly.</p>
                        </list-item>
                    </list> 
                    <italic>We have increased the font sizes accordingly.</italic> 
                    <list list-type="bullet">
                        <list-item>
                            <p>Please consider changing the title to "All genes (n = 10,000)" and "M3Drop variable genes (n = 1,000)"&#x00a0; in Figure 2C, so we have some sense of the number of genes used to generate each heatmap.</p>
                        </list-item>
                    </list> 
                    <italic>We have changed these titles as suggested.</italic> 
                    <list list-type="bullet">
                        <list-item>
                            <p>Please consider showing a graphical representation of the experiment setup for this figure. What is the reference? What is the query? What are their dimensions? What is the main question in this analysis?</p>
                        </list-item>
                    </list> 
                    <italic>We have added additional details about the query and reference datasets in the main text, legends, and in the titles of figure panels as an alternative to graphical representations. The questions addressed by each analysis are more clearly stated when introducing each figure panel. We believe that these edits now provide sufficient clarity for the reader to understand the content of each figure.</italic> 
                    <list list-type="bullet">
                        <list-item>
                            <p>One way to enhance clarity is to add descriptive titles to every figure in every panel (e.g. "Testing different correlation statistics", etc.).</p>
                        </list-item>
                    </list> 
                    <italic>We thank the reviewer for this suggestion and we have added titles to figure panels that we believe were unclearly presented.</italic> 
                    <list list-type="bullet">
                        <list-item>
                            <p>Please consider adding more details to the legend text for Figure 2 to help readers understand exactly what experiment has been done, what data was used, and what result is shown.</p>
                        </list-item>
                    </list> 
                    <italic>We have added additional details about each panel to the legend text as well as additional text to the main manuscript as requested.</italic> 
                    <list list-type="bullet">
                        <list-item>
                            <p>In Figure 2C, it seems that the y-axis and x-axis have been swapped by mistake. I see that the y-axis is labeled "ground truth cell type" but it includes "unclassified". I would expect the category "unclassified" to appear in the "called cell type" axis, but not in the "ground truth cell type" axis. Are the axes swapped or are they correct? Could you please clarify?</p>
                        </list-item>
                    </list> 
                    <italic>The unclassified cell type was annotated as unclassified in the original study, whereas the query dataset contained a cell type (schwann cells) that appears to be similar to the reference data &#x201c;unclassified&#x201d;.&#x00a0; We have added additional text to the legend to clarify.</italic> 
                    <list list-type="bullet">
                        <list-item>
                            <p>In Figure 2E, what does the color indicate? Is it the power argument "n^x" or something else?</p>
                        </list-item>
                    </list> 
                    <italic>The color did not indicate any particular class and therefore was removed.&#x00a0;</italic> 
                    <list list-type="bullet">
                        <list-item>
                            <p>The reader may be wondering: 
                                <list list-type="bullet">
                                    <list-item>
                                        <p>How many query cells did you use?</p>
                                    </list-item>
                                    <list-item>
                                        <p>How many clusters were in the query dataset? How many cells per cluster?</p>
                                    </list-item>
                                    <list-item>
                                        <p>How many reference datasets were used?</p>
                                    </list-item>
                                    <list-item>
                                        <p>How many clusters were in the reference dataset?</p>
                                    </list-item>
                                    <list-item>
                                        <p>Were the query and reference datasets acquired from the same tissue sample or were they completely independent and unrelated?</p>
                                    </list-item>
                                </list> </p>
                        </list-item>
                    </list> 
                    <italic>We have added a supplemental table (supplemental table 1) with additional details for each dataset in the manuscript. The tissues samples used for the query and reference datasets were derived from unrelated individuals or mice based on our reading of the original publications for each dataset. An exception was the PBMC-bench dataset, in which multiple single cell technologies were tested using the same aliquot of PBMCs. We have added text to the results section to clarify (under Correlation method).&#x00a0;</italic>
                </p>
                <p>In the section "Subclustering", please consider adding more details to help the reader avoid misunderstandings. What exactly is the "sub-clustering power argument (x)"? Please consider giving a concrete example to help the reader understand this section. Please consider creating a new figure that helps the reader to understand the "subcluster()" functionality.</p>
                <p>
                    <italic>We have added an additional figure (2E) to demonstrate the utility of the subcluster/overcluster_test functionality.&#x00a0;</italic>
                </p>
                <p>What is the PBMCbench data? Is this the same data as mentioned in the section "Correlation minimum cutoff"?&#x00a0;</p>
                <p>
                    <italic>Yes this is the same dataset. We have added additional details to the text to introduce this dataset, as well as additional details provided in table 1.</italic>&#x00a0;</p>
                <p>In the section "Cells per cluster", you might consider introducing the dataset, then introducing the question that is being addressed, and finally reporting the results. What is the number (15, 8, 4)? Is the "Mouse Cell Atlas" the same as the "Tabula Muris"? Were these mouse datasets used in the previous sections? The reader might benefit from an introduction of these datasets.</p>
                <p>
                    <italic>We have provided additional text to the manuscript to introduce and describe these datasets to improve clarity about the analyses conducted. The x axis refers to the number of cells per cluster.&#x00a0;</italic>
                </p>
                <p>Suggested improvements for Figure 3: 
                    <list list-type="bullet">
                        <list-item>
                            <p>Please consider adding labels "A", "B", "C", "D" to mark each of the four panels, so they can be referenced clearly.</p>
                        </list-item>
                    </list> 
                    <italic>We have added these labels and referenced them in the updated figure legend.</italic>&#x00a0; 
                    <list list-type="bullet">
                        <list-item>
                            <p>Please consider using the same name consistently in the text and the figure titles. For example, the figure says "Bulk RNA-seq reference data" but the text says "ImmGen database". The reader might better understand the results if the same label were used in both places instead of using two different labels for the same thing.</p>
                        </list-item>
                    </list> 
                    <italic>We have added subtitles to each panel to more clearly reference the datasets in the text.&#x00a0;</italic> 
                    <list list-type="bullet">
                        <list-item>
                            <p>Please consider including the identifiers for readers who wish to find these datasets and download them. For example, if the datasets are available on NCBI GEO, please consider including the accession numbers directly in the legend text, or in a table. Check to see if any other database provides an accession number. If an accession number is not available, please consider providing the DOI for a publication or a URL for a website that provides the data. By the way, if any data you are using is not deposited to a permanent repository, please consider uploading this data to a permanent repository (e.g. Figshare).</p>
                        </list-item>
                    </list> 
                    <italic>The publicly available datasets are referenced in the Data Availability section, with additional details now provided in Table 1. GEO accession numbers, DOIs, or URLs are provided, depending on the datasource. Additionally to further ease access to these resources we have organized these datasets into an ExperimentHub (clustifyrdatahub)&#x00a0; that is in the process of being submitted to bioconductor.&#x00a0;</italic>
                </p>
                <p>In the section describing Figure 4A, please consider these suggested changes: 
                    <list list-type="bullet">
                        <list-item>
                            <p>Please explain what is "clustifyr", "clustifyr_lists", and "clustifyr_m3drop".</p>
                        </list-item>
                        <list-item>
                            <p>How was feature selection performed for each analysis in Figure 4A?</p>
                        </list-item>
                        <list-item>
                            <p>What is the strategy used by scmap?</p>
                        </list-item>
                        <list-item>
                            <p>What is the strategy used by "Seurat"?</p>
                        </list-item>
                        <list-item>
                            <p>What is the strategy used by "SingleR"?</p>
                        </list-item>
                        <list-item>
                            <p>How is clustifyr similar or different?</p>
                        </list-item>
                    </list> 
                    <italic>We have added an additional paragraph to explain the differing clustifyr methods shown in Figure 4A.&#x00a0; Feature selection was performed by the Tabula muris investigators using the variable genes selected by Seurat by examining a plot of the gene expression mean vs. variance (mean.var.plot). Seurat and clustifyr use these variable genes, whereas SingleR and scmap define variable genes using differential expression testing or M3Drop respectively. We have added text to explain the feature selection methods used by each benchmarked method.&#x00a0;</italic>
                </p>
                <p>This section says "Correlation-based clustifyr classification performed better than hypergeometirc-based gene list enrichment as implemented in clustify_lists." Please consider explaining the "clustify_lists" algorithm in detail and also consider sharing the quantification of the performance of each approach so the reader can interpret the claim "performed better". Also consider elaborating on "performed better".</p>
                <p>
                    <italic>We have elaborated on the clustifyr_lists approach for classifying cell types based on gene set enrichment in the text. Additionally we have included a comparison of two approaches that performed best in our benchmarking, using hypergeometric tests, or using the jaccard index and selecting the cell type with the highest index value (Figure 4A).&#x00a0;</italic>
                </p>
                <p>What is "scRNAseq_Benchmark subsampling"? Could you elaborate on what this is and why it was used?</p>
                <p>
                    <italic>We have added additional text to the result section to introduce this dataset. This dataset contains random subsets of the tabula muris dataset to enable investigation of performance and accuracy with varying cell numbers.&#x00a0;</italic>
                </p>
                <p>Suggested improvements for Figure 4: 
                    <list list-type="bullet">
                        <list-item>
                            <p>Please consider including an overview schematic to help the reader understand which datasets were used for each result.</p>
                        </list-item>
                    </list> 
                    <italic>We have added descriptive titles and additional text to the results section to describe the datasets and goals of each benchmarking test.&#x00a0;</italic> 
                    <list list-type="bullet">
                        <list-item>
                            <p>Please define "accuracy". What is the algorithm for computing this number?</p>
                        </list-item>
                    </list> 
                    <italic>Accuracy is defined as the ratio between the number of correctly classified clusters and the overall number of clusters for every dataset pair.</italic> 
                    <list list-type="bullet">
                        <list-item>
                            <p>Please define "medF1-score". What is the algorithm for computing this number?</p>
                        </list-item>
                    </list> 
                    <italic>medF1-score was a shortened term for median F1-score. We have removed all references to medF1-score and replaced with median F1-score. F1-score, the harmonic mean of the precision and recall, is calculated for each cell type. A median F1-score is reported for every dataset pair.</italic> 
                    <list list-type="bullet">
                        <list-item>
                            <p>For the lower half of panel B, please consider using a format similar to the one in Figure 2B from Kiselev et al. (2018
                                <ext-link ext-link-type="uri" xlink:href="https://f1000research.com/articles/9-223#rep-ref-63065-1">1</ext-link>). For example, please use a log10 axis for time, so readers can see the difference between methods.</p>
                        </list-item>
                    </list> 
                    <italic>We have modified the figure accordingly.</italic> 
                    <list list-type="bullet">
                        <list-item>
                            <p>Why is "medF1-score" used for Figure 4C and "accuracy" for Figure 4B?</p>
                        </list-item>
                    </list> </p>
                <p>
                    <italic>A F1-score cannot be calculated when the query and reference datasets contain different cell types. Therefore when comparing datasets with varying cell composition we instead utilized an accuracy metric (as defined above). We have added text to the manuscript that defines accuracy and median F1-score (Benchmarking methods), identifies which datasets were compared with each metric, and provides an explanation of why certain datasets were characterized with accuracy or F1-score.&#x00a0;</italic>
                </p>
                <p>Why does Figure 4A have 6 methods, Figure 4B have 5 methods, and Figure 4C have 3 methods? Is it possible to include all 6 methods for all panels? Could you please comment on the reasons for excluding or including methods in each analysis?</p>
                <p>
                    <italic>We agree that it is confusing for differing tools to be shown in different panels. We have therefore benchmarked these methods in a more consistent fashion to enable comparison of each method across different benchmarking tests. One exception however is scPred, which we were unable to successfully run on the Allen Brain Institute atlas data (Figure 4C), which we have noted in the figure legend.</italic> 
                    <list list-type="bullet">
                        <list-item>
                            <p>"As we and others observe25, novel algorithms may not be necessary for cell type classification, at least within the current limitations of sequencing technology and our broadstroke understanding of cell &#x201c;types&#x201d;. Rather, the generation of community curated reference databases is likely to be critical for reproducible annotation of cell types in scRNA-seq datasets."</p>
                        </list-item>
                    </list> I agree that a community curated reference database would be a valuable contribution to the field. You might consider creating a table or other type of descriptive listing that helps the reader to understand all of the references that were used in this article. Consider including tissue source, healthy or disease status, number of cells and genes, technology used for the assay, DOI, data URL, NCBI GEO accession, or any other details that the reader might find helpful.</p>
                <p>
                    <italic>In addition to the dataset details provided in the Data Availability section and the details provided in Table 1, we have also included a supplemental table that references the datasets used in each figure, and an ExperimentHub package allowing direct access to these resources in R.&#x00a0;</italic>
                </p>
                <p>Thank you for providing a GitHub repository with data files! Please also consider sharing the same data in compressed plain text format (e.g. "file.tsv.gz"). In addition to GitHub, please consider using a specialty service that is funded for the purpose of permanently archiving research data such as NIH Figshare (
                    <ext-link ext-link-type="uri" xlink:href="https://nih.figshare.com/">https://nih.figshare.com</ext-link>). There are other options (Zenodo, Open Science Framework OSF, etc.).</p>
                <p>
                    <italic>The datasets used in this study were all published by other research groups and are hosted in various data repositories including GEO and figshare. As mentioned above we have provided additional methods to access these published and publicly available resources.&#x00a0;</italic> 
                    <list list-type="bullet">
                        <list-item>
                            <p>"As an alternative, clustifyr also supports per-cell annotation, however the runtime is greatly increased and the accuracy of the cell type classifications are decreased due to the sparsity of scRNA-seq datasets, and requires a consensus aggregation step across multiple cells to obtain reliable cell type annotations."</p>
                        </list-item>
                    </list> You might consider offering another alternative option. One extreme is to use the cluster averages, while the other extreme is to use single cells. Perhaps there might be a middle ground where clustifyr could automatically use k-means or some other algorithm to form clusters within the user-defined clusters. This would give the user even more flexibility.</p>
                <p>After reviewing the code, I can see that there is an "overcluster()" function that seems to do exactly what I suggested. Please consider describing this in the article and showing an example of how it works. In retrospect, I can see that the section titled "Subclustering" was supposed to describe this topic &#x2014; I misunderstood this section on the first read.</p>
                <p>
                    <italic>We have added an additional figure panel (Figure 2E) to illustrate this functionality.&#x00a0;</italic>
                </p>
                <p>You may want to double-check all of the links in all of your HTML pages. I see three URLs: 
                    <list list-type="bullet">
                        <list-item>
                            <p>
                                <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdatahub/">https://github.com/rnabioco/clustifyrdatahub/</ext-link>
                            </p>
                        </list-item>
                        <list-item>
                            <p>
                                <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyr">https://github.com/rnabioco/clustifyr</ext-link>
                            </p>
                        </list-item>
                        <list-item>
                            <p>
                                <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata">https://github.com/rnabioco/clustifyrdata</ext-link>
                            </p>
                        </list-item>
                    </list> </p>
                <p>I can see that the "clustifyrdatahub" repo has code for creating ".rda" files from the reference datasets.</p>
                <p>I also see similar scripts at 
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/rnabioco/clustifyrdata/tree/master/data-raw">https://github.com/rnabioco/clustifyrdata/tree/master/data-raw</ext-link>
                </p>
                <p>Readers might be confused when they see two different repos with similar scripts. You might consider deleting the "clustifyrdatahub" repo if it is not necessary.</p>
                <p>
                    <italic>We apologize to the reviewer for the confusion of multiple data repositories. We have organized clustifyrdata into an ExperimentalHub Bioconductor package at the request of reviewer #1, resulting in overlapping content in the clustifyrdatahub repository. We have mentioned these differences in the documentation of these repositories and added text to the manuscript to point readers to the experimentHub package, which is currently being submitted to bioconductor.&#x00a0;</italic>
                </p>
                <p>I'm happy to see that the data is organized and annotated in the GitHub repo. Specifically, in the GitHub "clustifyrdata" repo, in the "README.md" file, the table shows the name of the reference, the number of cell types, the number of genes, the organism, and a link to the publication. Please consider adding some version of this table to the article, so the reader can understand the scope of this article.</p>
                <p>
                    <italic>We have added a table (Table 1) to the main manuscript that contains additional details about each dataset.&#x00a0;</italic>
                </p>
                <p>After reviewing the code, I was able to resolve some of my misunderstandings caused by lack of clarity in the terse descriptions in this article. To reduce the chance of misunderstanding by other readers, you might consider clarifying or adding details to the descriptions of functions and results. For example, the article does not mention that GSEA is used to work with gene lists.</p>
                <p>
                    <italic>We have added additional details about the gene list methods (including GSEA) to the article. clustify() and clustify_lists() are the most important functions implemented in clustifyr, which we believe are now sufficiently described in the revised manuscript. Additional package and function level documentation is provided at 
                        <ext-link ext-link-type="uri" xlink:href="https://rnabioco.github.io/clustifyr/">https://rnabioco.github.io/clustifyr/</ext-link> , which we&#x2019;ve now provided as a link in the software availability section.</italic>
                </p>
            </body>
        </sub-article>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report61913">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.25358.r61913</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Korthauer</surname>
                        <given-names>Keegan</given-names>
                    </name>
                    <xref ref-type="aff" rid="r61913a1">1</xref>
                    <xref ref-type="aff" rid="r61913a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-4565-1654</uri>
                </contrib>
                <aff id="r61913a1">
                    <label>1</label>BC Children's Hospital Research Institute, Vancouver, BC, Canada</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>20</day>
                <month>4</month>
                <year>2020</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2020 Korthauer K</copyright-statement>
                <copyright-year>2020</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport61913" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.22969.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>The article introduces a user-friendly and inter-operable R package for cell-type assignment of single-cell RNA-sequencing data. As clearly stated by the authors, the method heavily relies on the (1) results of and (2) any assumptions made by the clustering algorithm applied to the query dataset. The method has potential to be widely useful given its flexibility to take input and give output from many different existing (and future) algorithms. Although the methods proposed are not novel (simple correlation metrics), the software serves to streamline one of the most common procedures in single-cell RNA-sequencing analysis. As detailed below I have some questions regarding the evaluation of the method compared to existing approaches, and a suggestion to more widely distribute the prebuilt references curated as part of the study.</p>
            <p> </p>
            <p> Major comments: 
                <list list-type="order">
                    <list-item>
                        <p>The 'unseen population rejection test' is an informative measure. However, it is not clear without going back to the scRNAseq_Benchmark (Abdelaal 
                            <italic>et al.</italic>, 2019
                            <sup>
                                <xref ref-type="bibr" rid="rep-ref-61913-1">1</xref>
                            </sup>) how clustifyr's performance compares to other tools. It would be useful to give some quantitative or visualization that conveys this comparison.</p>
                    </list-item>
                    <list-item>
                        <p>The approach is aimed at being "normalization-agnostic" as stated in 'Benchmarking' section. However, it's not clear whether this refers to clustifyr in general, or just using the rank correlation setting. If in general, this property should be demonstrated.</p>
                    </list-item>
                    <list-item>
                        <p>The benchmarking results provided are very helpful, but it's not clear why only a (differing) subset of the methods was applied to each evaluation (i.e. panels of Figure 4 in particular).</p>
                    </list-item>
                </list> </p>
            <p> Minor comments: 
                <list list-type="order">
                    <list-item>
                        <p>From the description of the method, it seems that if the query dataset is 'over-clustered', meaning a cell-type is incorrectly split into two clusters, clustifyr can return the same cell type assignment for both clusters (provided the correct reference had the highest correlation, and that correlation was above the threshold). Is this correct? If not, please clarify.</p>
                    </list-item>
                    <list-item>
                        <p>The prebuilt references in the clustifyrdata github repository has potential utility to researchers who don't already have a reference dataset. It might be a good fit to build these reference datasets as a&#x00a0;Bioconductor ExperimentHub package.</p>
                    </list-item>
                </list>
            </p>
            <p>Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?</p>
            <p>Partly</p>
            <p>Is the rationale for developing the new software tool clearly explained?</p>
            <p>Yes</p>
            <p>Is the description of the software tool technically sound?</p>
            <p>Yes</p>
            <p>Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?</p>
            <p>Yes</p>
            <p>Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?</p>
            <p>Partly</p>
            <p>Reviewer Expertise:</p>
            <p>Statistical genomics, bioinformatics</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <back>
            <ref-list>
                <title>References</title>
                <ref id="rep-ref-61913-1">
                    <label>1</label>
                    <mixed-citation publication-type="journal">
                        <person-group person-group-type="author"/>:
                        <article-title>A comparison of automatic cell identification methods for single-cell RNA sequencing data</article-title>.
                        <source>
                            <italic>Genome Biology</italic>
                        </source>.<year>2019</year>;<volume>20</volume>(<issue>1</issue>) :
                        <elocation-id>10.1186/s13059-019-1795-z</elocation-id>
                        <pub-id pub-id-type="doi">10.1186/s13059-019-1795-z</pub-id>
                    </mixed-citation>
                </ref>
            </ref-list>
        </back>
        <sub-article article-type="response" id="comment5694-61913">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Riemondy</surname>
                            <given-names>Kent</given-names>
                        </name>
                        <aff>University of Colorado, Anschutz Medical Campus, USA</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>None to disclose.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>7</day>
                    <month>7</month>
                    <year>2020</year>
                </pub-date>
            </front-stub>
            <body>
                <p>We thank the reviewer for their helpful criticisms. Our responses are indicated in italics below.&#x00a0;</p>
                <p>Major comments: 
                    <list list-type="bullet">
                        <list-item>
                            <p>The 'unseen population rejection test' is an informative measure. However, it is not clear without going back to the scRNAseq_Benchmark (Abdelaal et al., 2019
                                <ext-link ext-link-type="uri" xlink:href="https://f1000research.com/articles/9-223#rep-ref-61913-1">1</ext-link>) how clustifyr's performance compares to other tools. It would be useful to give some quantitative or visualization that conveys this comparison.</p>
                        </list-item>
                    </list> 
                    <italic>We agree and have provided an additional figure panel (4E) that provides a visual comparison of clustifyr&#x2019;s performance compared to tools assessed by the scRNAseq_Benchmark.&#x00a0;</italic> 
                    <list list-type="bullet">
                        <list-item>
                            <p>The approach is aimed at being "normalization-agnostic" as stated in 'Benchmarking' section. However, it's not clear whether this refers to clustifyr in general, or just using the rank correlation setting. If in general, this property should be demonstrated.</p>
                        </list-item>
                    </list> </p>
                <p>
                    <italic>We are referring the property of rank correlation rather than a specific feature of clustifyr. We have amended the text (subsection: Variable gene selection and normalization) to make this point more clear and provide recommendations that users try to implement the same normalization scheme for reference and query data if possible.</italic>
                </p>
                <p>
                    <italic>&#x00a0;</italic> 
                    <list list-type="bullet">
                        <list-item>
                            <p>The benchmarking results provided are very helpful, but it's not clear why only a (differing) subset of the methods was applied to each evaluation (i.e. panels of Figure 4 in particular).</p>
                        </list-item>
                    </list> </p>
                <p>
                    <italic>We agree that the benchmarking would be more clearly presented by providing more complete assessment of methods across each evaluation. We have updated figures 4A,B,C, and D to consistently present clustifyr&#x2019;s performance and accuracy compared to other methods. Of note, we were unable to benchmark scPred when examining the Allen Brain Atlas data ( Figure 4C), due to an error that we were unable to troubleshoot.</italic>
                </p>
                <p>Minor comments: 
                    <list list-type="bullet">
                        <list-item>
                            <p>From the description of the method, it seems that if the query dataset is 'over-clustered', meaning a cell-type is incorrectly split into two clusters, clustifyr can return the same cell type assignment for both clusters (provided the correct reference had the highest correlation, and that correlation was above the threshold). Is this correct? If not, please clarify.</p>
                        </list-item>
                    </list> 
                    <italic>The reviewer's comment is correct, clustifyr will assign the cell type with the highest correlation, that meets a minimum cut-off value. For over-clustered query cell types, clustifyr will therefore return the same cell-type label, despite the overclustering. Clustifyr also provides a function (overcluster_test()) to intentionally overcluster the query dataset to potentially identify subpopulations that were grouped into another cell type due to inappropriate query dataset clustering. We have included an additional figure panel (2E) to illustrate this functionality.</italic> 
                    <list list-type="bullet">
                        <list-item>
                            <p>The prebuilt references in the clustifyrdata github repository has potential utility to researchers who don't already have a reference dataset. It might be a good fit to build these reference datasets as a Bioconductor ExperimentHub package.</p>
                        </list-item>
                    </list> </p>
                <p>
                    <italic>We thank the reviewer for this suggestion and have built an ExperimentHub package that includes the prebuilt references in the clustifyrdata repository. The package (clustifyrdatahub) has been submitted to Bioconductor.</italic>
                </p>
            </body>
        </sub-article>
    </sub-article>
</article>
