<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" dtd-version="1.2" xml:lang="en">
    <front>
        <journal-meta>
            <journal-id journal-id-type="pmc">F1000Research</journal-id>
            <journal-title-group>
                <journal-title>F1000Research</journal-title>
            </journal-title-group>
            <issn pub-type="epub">2046-1402</issn>
            <publisher>
                <publisher-name>F1000 Research Limited</publisher-name>
                <publisher-loc>London, UK</publisher-loc>
            </publisher>
        </journal-meta>
        <article-meta>
            <article-id pub-id-type="doi">10.12688/f1000research.15666.1</article-id>
            <article-categories>
                <subj-group subj-group-type="heading">
                    <subject>Research Article</subject>
                </subj-group>
                <subj-group>
                    <subject>Articles</subject>
                </subj-group>
            </article-categories>
            <title-group>
                <article-title>A systematic performance evaluation of clustering methods for single-cell RNA-seq data</article-title>
                <fn-group content-type="pub-status">
                    <fn>
                        <p>[version 1; peer review: 2 approved with reservations]</p>
                    </fn>
                </fn-group>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Du&#x00f2;</surname>
                        <given-names>Angelo</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-4338-2497</uri>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="no">
                    <name>
                        <surname>Robinson</surname>
                        <given-names>Mark D.</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Funding Acquisition</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Resources</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-3048-5518</uri>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <contrib contrib-type="author" corresp="yes">
                    <name>
                        <surname>Soneson</surname>
                        <given-names>Charlotte</given-names>
                    </name>
                    <role content-type="http://credit.niso.org/">Conceptualization</role>
                    <role content-type="http://credit.niso.org/">Data Curation</role>
                    <role content-type="http://credit.niso.org/">Formal Analysis</role>
                    <role content-type="http://credit.niso.org/">Investigation</role>
                    <role content-type="http://credit.niso.org/">Methodology</role>
                    <role content-type="http://credit.niso.org/">Software</role>
                    <role content-type="http://credit.niso.org/">Visualization</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Original Draft Preparation</role>
                    <role content-type="http://credit.niso.org/">Writing &#x2013; Review &amp; Editing</role>
                    <uri content-type="orcid">https://orcid.org/0000-0003-3833-2169</uri>
                    <xref ref-type="corresp" rid="c1">a</xref>
                    <xref ref-type="aff" rid="a1">1</xref>
                    <xref ref-type="aff" rid="a2">2</xref>
                </contrib>
                <aff id="a1">
                    <label>1</label>Institute of Molecular Life Sciences, University of Zurich, Zurich, 8057, Switzerland</aff>
                <aff id="a2">
                    <label>2</label>SIB Swiss Institute of Bioinformatics, Zurich, 8057, Switzerland</aff>
            </contrib-group>
            <author-notes>
                <corresp id="c1">
                    <label>a</label>
                    <email xlink:href="mailto:charlottesoneson@gmail.com">charlottesoneson@gmail.com</email>
                </corresp>
                <fn fn-type="conflict">
                    <p>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>26</day>
                <month>7</month>
                <year>2018</year>
            </pub-date>
            <pub-date pub-type="collection">
                <year>2018</year>
            </pub-date>
            <volume>7</volume>
            <elocation-id>1141</elocation-id>
            <history>
                <date date-type="accepted">
                    <day>20</day>
                    <month>7</month>
                    <year>2018</year>
                </date>
            </history>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2018 Du&#x00f2; A et al.</copyright-statement>
                <copyright-year>2018</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <self-uri content-type="pdf" xlink:href="https://f1000research.com/articles/7-1141/pdf"/>
            <abstract>
                <p>Subpopulation identification, usually via some form of unsupervised clustering, is a fundamental step in the analysis of many single-cell RNA-seq data sets. This has motivated the development and application of a broad range of clustering methods, based on various underlying algorithms. Here, we provide a systematic and extensible performance evaluation of 12 clustering algorithms, including both methods developed explicitly for scRNA-seq data and more general-purpose methods. The methods were evaluated using nine publicly available scRNA-seq data sets as well as three simulations with varying degree of cluster separability. The same feature selection approaches were used for all methods, allowing us to focus on the investigation of the performance of the clustering algorithms themselves.</p>
                <p>We evaluated the ability of recovering known subpopulations, the stability and the run time of the methods. Additionally, we investigated whether the performance could be improved by generating consensus partitions from multiple individual clustering methods. We found substantial differences in the performance, run time and stability between the methods, with SC3 and Seurat showing the most favorable results. Additionally, we found that consensus clustering typically did not improve the performance compared to the best of the combined methods, but that several of the top-performing methods already perform some type of consensus clustering. The R scripts providing an extensible framework for the evaluation of new methods and data sets are available on GitHub (
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/markrobinsonuzh/scRNAseq_clustering_comparison">https://github.com/markrobinsonuzh/scRNAseq_clustering_comparison</ext-link>).</p>
            </abstract>
            <kwd-group kwd-group-type="author">
                <kwd>Clustering</kwd>
                <kwd>Single-Cell RNA-seq</kwd>
                <kwd>RNA-seq</kwd>
                <kwd>Benchmarking</kwd>
            </kwd-group>
            <funding-group>
                <award-group id="fund-1">
                    <funding-source>Chan Zuckerberg Initiative</funding-source>
                    <award-id>182828</award-id>
                </award-group>
                <award-group id="fund-2">
                    <funding-source>Swiss National Science Foundation</funding-source>
                    <award-id>310030_175841</award-id>
                </award-group>
                <funding-statement>We acknowledge funding support from the Swiss National Science Foundation (Grant Number 310030_175841 to MDR) and the Chan Zuckerberg Initiative (Grant Number 182828 to MDR).</funding-statement>
                <funding-statement>
                    <italic>The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.</italic>
                </funding-statement>
            </funding-group>
        </article-meta>
    </front>
    <body>
        <sec sec-type="intro">
            <title>Introduction</title>
            <p>Recent advances in single-cell RNA-seq (scRNA-seq) technologies have enabled the simultaneous measurement of expression levels of thousands of genes across hundreds to thousands of individual cells
                <sup>
                    <xref ref-type="bibr" rid="ref-1">1</xref>&#x2013;
                    <xref ref-type="bibr" rid="ref-8">8</xref>
                </sup>. This opens up new possibilities for deconvolution of expression patterns seen in bulk samples, detection of previously unknown cell populations and deeper characterization of known ones. However, computational analyses are complicated by the high variability, low capture efficiency and high dropout rates of scRNA-seq assays
                <sup>
                    <xref ref-type="bibr" rid="ref-9">9</xref>&#x2013;
                    <xref ref-type="bibr" rid="ref-11">11</xref>
                </sup>, as well as by strong batch effects that are often confounded by the experimental factor of interest
                <sup>
                    <xref ref-type="bibr" rid="ref-12">12</xref>
                </sup>.</p>
            <p>Given a collection of single cells, a common analysis task involves identification and characterization of subpopulations, e.g., cell types or cell states. With lower-dimensional single-cell assays such as flow cytometry, cell type detection is often done manually, by visual inspection of a series of two-dimensional scatter plots of marker pairs (&#x201c;gating&#x201d;) and subsequent identification of clusters of cells with specific abundance patterns. With large numbers of markers, such strategies quickly become unfeasible, and they are also likely to miss previously uncharacterized cell populations. Instead, subpopulation detection in higher-dimensional single-cell experiments such as mass cytometry (CyTOF) and scRNA-seq is often done automatically, via some form of clustering. As a consequence, a large number of clustering approaches specifically designed for or adapted to these types of assays are available in the literature.</p>
            <p>While extensive evaluations of clustering methods have been performed for flow and mass cytometry data
                <sup>
                    <xref ref-type="bibr" rid="ref-13">13</xref>,
                    <xref ref-type="bibr" rid="ref-14">14</xref>
                </sup>, there are to date fewer such studies available for scRNA-seq. The latter is complicated by the large number of different data generation protocols available for scRNA-seq, which in turn has a big effect on the data characteristics. Menon
                <sup>
                    <xref ref-type="bibr" rid="ref-15">15</xref>
                </sup> specifically evaluated three methods (
                <monospace>Seurat</monospace>
                <sup>
                    <xref ref-type="bibr" rid="ref-16">16</xref>
                </sup>, 
                <monospace>WGCNA</monospace>
                <sup>
                    <xref ref-type="bibr" rid="ref-17">17</xref>
                </sup> and 
                <monospace>BackSPIN</monospace>
                <sup>
                    <xref ref-type="bibr" rid="ref-18">18</xref>
                </sup>), illustrating their different behavior in low and high read depth data. A recent preprint
                <sup>
                    <xref ref-type="bibr" rid="ref-19">19</xref>
                </sup> compared 11 clustering tools on scRNA-seq from the 10x Genomics platform, showing that different methods generally produced clusterings with little overlap. An overview of several different types of clustering algorithms for scRNA-seq data is given by Andrews and Hemberg
                <sup>
                    <xref ref-type="bibr" rid="ref-20">20</xref>
                </sup>.</p>
            <p>In this paper, we extend these initial studies to a broader range of data sets with different characteristics, and additionally consider simulated data with different degrees of cluster separability. We evaluate 12 clustering algorithms, including both methods specifically developed for scRNA-seq data, methods developed for other types of single-cell data, and more general approaches, on a total of 12 different data sets. In order to focus on the performance of the clustering algorithms themselves, we use the same preprocessing approach (specifically cell and gene filtering) for all methods, and investigate the impact of the preprocessing separately. In addition to investigating how well the clustering methods are able to recover the true partition if the number of subpopulations is known, we evaluate whether they are able to correctly determine the number of clusters. Further, we study the stability and run time of the methods and investigate whether performance can be improved by generating a consensus partition based on results from multiple individual clustering methods, and the impact of the choice of methods to include in such an aggregation.</p>
            <p>We observed large differences in the clustering results as well as in the run times of the different methods. 
                <monospace>SC3</monospace> and 
                <monospace>Seurat</monospace> generally performed favorably, with 
                <monospace>Seurat</monospace> being several orders of magnitude faster. In addition, 
                <monospace>Seurat</monospace> typically achieved the best agreement with the true partition when the number of clusters were the same, while other methods, like 
                <monospace>FlowSOM</monospace>, achieved a better agreement with the truth if the number of clusters was higher than the true number. Finally, we show that generally, combining two methods into an ensemble did not improve the performance compared to the best of the individual methods.</p>
            <p>Given the high level of activity in methods research for preprocessing, clustering and visualization of scRNA-seq data, it is expected that many new algorithms (or new flavors of existing ones) will be proposed. In order to facilitate re-assessment as new innovations emerge and to provide extensibility to new methods and data sets, we provide the complete code to run all analyses in this study (
                <ext-link ext-link-type="uri" xlink:href="https://github.com/markrobinsonuzh/scRNAseq_clustering_comparison">https://github.com/markrobinsonuzh/scRNAseq_clustering_comparison</ext-link>). The current system uses a Makefile to run a set of R scripts for clustering, summarization and visualization of the results. In addition, all filtered (and unfiltered) data sets used in this study are readily available from the links provided in the GitHub repository.</p>
        </sec>
        <sec sec-type="methods">
            <title>Methods</title>
            <sec>
                <title>Real data sets</title>
                <p>Three real scRNA-seq data sets were downloaded from 
                    <italic toggle="yes">conquer</italic>
                    <sup>
                        <xref ref-type="bibr" rid="ref-21">21</xref>
                    </sup> and used for our evaluations: GSE60749-GPL13112 (here denoted 
                    <bold>Kumar</bold>
                    <sup>
                        <xref ref-type="bibr" rid="ref-22">22</xref>
                    </sup>), SRP073808 (
                    <bold>Koh</bold>
                    <sup>
                        <xref ref-type="bibr" rid="ref-23">23</xref>
                    </sup>) and GSE52529-GPL16791 (
                    <bold>Trapnell</bold>
                    <sup>
                        <xref ref-type="bibr" rid="ref-24">24</xref>
                    </sup>). 
                    <xref ref-type="table" rid="T1">Table 1</xref> and 
                    <xref ref-type="other" rid="SF1">Supplementary Figure 1</xref> give an overview of all data sets used in this study. For each of the data sets from 
                    <italic toggle="yes">conquer</italic>, the gene-level length-scaled TPM values (below referred to as &#x201c;counts&#x201d; since they are on the same scale as the raw read counts) and the phenotype were extracted from the MultiAssayExperiment
                    <sup>
                        <xref ref-type="bibr" rid="ref-25">25</xref>
                    </sup> object provided by 
                    <italic toggle="yes">conquer</italic> and used to create a SingleCellExperiment object. We also estimated transcript compatibility counts (TCCs) for each of these data set using 
                    <monospace>kallisto</monospace>
                    <sup>
                        <xref ref-type="bibr" rid="ref-26">26</xref>,
                        <xref ref-type="bibr" rid="ref-27">27</xref>
                    </sup> v0.44, and used these as an alternative to the gene-level count matrix as input to the clustering algorithms.</p>
                <table-wrap id="T1" orientation="portrait" position="anchor">
                    <label>Table 1. </label>
                    <caption>
                        <title>Overview of the data sets used in the study.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1" valign="top">Data set</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Sequencing
                                    <break/>protocol</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">#
                                    <break/>cells</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">#
                                    <break/>features</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Median total
                                    <break/>counts per
                                    <break/>cell</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Median #
                                    <break/>features
                                    <break/>per cell</th>
                                <th align="left" colspan="1" rowspan="1" valign="top"># subpopulations</th>
                                <th align="left" colspan="1" rowspan="1" valign="top">Description</th>
                                <th align="center" colspan="1" rowspan="1" valign="top">Ref.</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td colspan="1" rowspan="1">

                                    <bold>Koh</bold>
</td>
                                <td colspan="1" rowspan="1">SMARTer</td>
                                <td align="right" colspan="1" rowspan="1">531</td>
                                <td align="right" colspan="1" rowspan="1">48,981</td>
                                <td align="right" colspan="1" rowspan="1">1,390,268</td>
                                <td align="right" colspan="1" rowspan="1">14,277</td>
                                <td align="center" colspan="1" rowspan="1">9</td>
                                <td colspan="1" rowspan="1">FACS purified H7 human
                                    <break/>embryonic stem cells in
                                    <break/>different differention stages</td>
                                <td align="center" colspan="1" rowspan="1">
                                    <xref ref-type="bibr" rid="ref-23">23</xref>
                                </td>
                            </tr>
                            <tr>
                                <td colspan="1" rowspan="1">

                                    <bold>KohTCC</bold>
</td>
                                <td colspan="1" rowspan="1">SMARTer</td>
                                <td align="right" colspan="1" rowspan="1">531</td>
                                <td align="right" colspan="1" rowspan="1">811,938</td>
                                <td align="right" colspan="1" rowspan="1">1,391,012</td>
                                <td align="right" colspan="1" rowspan="1">66,086</td>
                                <td align="center" colspan="1" rowspan="1">9</td>
                                <td colspan="1" rowspan="1">FACS purified H7 human
                                    <break/>embryonic stem cells in
                                    <break/>different differention stages</td>
                                <td align="center" colspan="1" rowspan="1">
                                    <xref ref-type="bibr" rid="ref-23">23</xref>
                                </td>
                            </tr>
                            <tr>
                                <td colspan="1" rowspan="1">

                                    <bold>Kumar</bold>
</td>
                                <td colspan="1" rowspan="1">SMARTer</td>
                                <td align="right" colspan="1" rowspan="1">246</td>
                                <td align="right" colspan="1" rowspan="1">45,159</td>
                                <td align="right" colspan="1" rowspan="1">1,687,810</td>
                                <td align="right" colspan="1" rowspan="1">26,146</td>
                                <td align="center" colspan="1" rowspan="1">3</td>
                                <td colspan="1" rowspan="1">Mouse embryonic stem
                                    <break/>cells, cultured with different
                                    <break/>inhibition factors</td>
                                <td align="center" colspan="1" rowspan="1">
                                    <xref ref-type="bibr" rid="ref-22">22</xref>
                                </td>
                            </tr>
                            <tr>
                                <td colspan="1" rowspan="1">

                                    <bold>KumarTCC</bold>
</td>
                                <td colspan="1" rowspan="1">SMARTer</td>
                                <td align="right" colspan="1" rowspan="1">263</td>
                                <td align="right" colspan="1" rowspan="1">803,405</td>
                                <td align="right" colspan="1" rowspan="1">717,438</td>
                                <td align="right" colspan="1" rowspan="1">63,566</td>
                                <td align="center" colspan="1" rowspan="1">3</td>
                                <td colspan="1" rowspan="1">Mouse embryonic stem
                                    <break/>cells, cultured with different
                                    <break/>inhibition factors</td>
                                <td align="center" colspan="1" rowspan="1">
                                    <xref ref-type="bibr" rid="ref-22">22</xref>
                                </td>
                            </tr>
                            <tr>
                                <td colspan="1" rowspan="1">

                                    <bold>SimKumar4easy</bold>
</td>
                                <td colspan="1" rowspan="1">-</td>
                                <td align="right" colspan="1" rowspan="1">500</td>
                                <td align="right" colspan="1" rowspan="1">43,606</td>
                                <td align="right" colspan="1" rowspan="1">1,769,155</td>
                                <td align="right" colspan="1" rowspan="1">29,979</td>
                                <td align="center" colspan="1" rowspan="1">4</td>
                                <td colspan="1" rowspan="1">Simulation using different
                                    <break/>proportions of differentially
                                    <break/>expressed genes</td>
                                <td align="center" colspan="1" rowspan="1">
                                    <xref ref-type="bibr" rid="ref-28">28</xref>
                                </td>
                            </tr>
                            <tr>
                                <td colspan="1" rowspan="1">

                                    <bold>SimKumar4hard</bold>
</td>
                                <td colspan="1" rowspan="1">-</td>
                                <td align="right" colspan="1" rowspan="1">499</td>
                                <td align="right" colspan="1" rowspan="1">43,638</td>
                                <td align="right" colspan="1" rowspan="1">1,766,843</td>
                                <td align="right" colspan="1" rowspan="1">30,094</td>
                                <td align="center" colspan="1" rowspan="1">4</td>
                                <td colspan="1" rowspan="1">Simulation using different
                                    <break/>proportions of differentially
                                    <break/>expressed genes</td>
                                <td align="center" colspan="1" rowspan="1">
                                    <xref ref-type="bibr" rid="ref-28">28</xref>
                                </td>
                            </tr>
                            <tr>
                                <td colspan="1" rowspan="1">

                                    <bold>SimKumar8hard</bold>
</td>
                                <td colspan="1" rowspan="1">-</td>
                                <td align="right" colspan="1" rowspan="1">499</td>
                                <td align="right" colspan="1" rowspan="1">43,601</td>
                                <td align="right" colspan="1" rowspan="1">1,769,174</td>
                                <td align="right" colspan="1" rowspan="1">30,068</td>
                                <td align="center" colspan="1" rowspan="1">8</td>
                                <td colspan="1" rowspan="1">Simulation using different
                                    <break/>proportions of differentially
                                    <break/>expressed genes</td>
                                <td align="center" colspan="1" rowspan="1">
                                    <xref ref-type="bibr" rid="ref-28">28</xref>
                                </td>
                            </tr>
                            <tr>
                                <td colspan="1" rowspan="1">

                                    <bold>Trapnell</bold>
</td>
                                <td colspan="1" rowspan="1">SMARTer</td>
                                <td align="right" colspan="1" rowspan="1">222</td>
                                <td align="right" colspan="1" rowspan="1">41,111</td>
                                <td align="right" colspan="1" rowspan="1">1,925,259</td>
                                <td align="right" colspan="1" rowspan="1">13,809</td>
                                <td align="center" colspan="1" rowspan="1">3</td>
                                <td colspan="1" rowspan="1">Human skeletal muscle
                                    <break/>myoblast cells, differention
                                    <break/>induced by low-serum
                                    <break/>medium</td>
                                <td align="center" colspan="1" rowspan="1">
                                    <xref ref-type="bibr" rid="ref-24">24</xref>
                                </td>
                            </tr>
                            <tr>
                                <td colspan="1" rowspan="1">

                                    <bold>TrapnellTCC</bold>
</td>
                                <td colspan="1" rowspan="1">SMARTer</td>
                                <td align="right" colspan="1" rowspan="1">227</td>
                                <td align="right" colspan="1" rowspan="1">684,953</td>
                                <td align="right" colspan="1" rowspan="1">1,819,294</td>
                                <td align="right" colspan="1" rowspan="1">66,864</td>
                                <td align="center" colspan="1" rowspan="1">3</td>
                                <td colspan="1" rowspan="1">Human skeletal muscle
                                    <break/>myoblast cells, differention
                                    <break/>induced by low-serum
                                    <break/>medium</td>
                                <td align="center" colspan="1" rowspan="1">
                                    <xref ref-type="bibr" rid="ref-24">24</xref>
                                </td>
                            </tr>
                            <tr>
                                <td colspan="1" rowspan="1">

                                    <bold>Zhengmix4eq</bold>
</td>
                                <td colspan="1" rowspan="1">10xGenomics
                                    <break/>GemCode</td>
                                <td align="right" colspan="1" rowspan="1">3,994</td>
                                <td align="right" colspan="1" rowspan="1">15,568</td>
                                <td align="right" colspan="1" rowspan="1">1,215</td>
                                <td align="right" colspan="1" rowspan="1">487</td>
                                <td align="center" colspan="1" rowspan="1">4</td>
                                <td colspan="1" rowspan="1">Mixtures of FACS
                                    <break/>purified peripheral blood
                                    <break/>mononuclear cells</td>
                                <td align="center" colspan="1" rowspan="1">
                                    <xref ref-type="bibr" rid="ref-5">5</xref>
                                </td>
                            </tr>
                            <tr>
                                <td colspan="1" rowspan="1">

                                    <bold>Zhengmix4uneq</bold>
</td>
                                <td colspan="1" rowspan="1">10xGenomics
                                    <break/>GemCode</td>
                                <td align="right" colspan="1" rowspan="1">6,498</td>
                                <td align="right" colspan="1" rowspan="1">16,443</td>
                                <td align="right" colspan="1" rowspan="1">1,145</td>
                                <td align="right" colspan="1" rowspan="1">485</td>
                                <td align="center" colspan="1" rowspan="1">4</td>
                                <td colspan="1" rowspan="1">Mixtures of FACS
                                    <break/>purified peripheral blood
                                    <break/>mononuclear cells</td>
                                <td align="center" colspan="1" rowspan="1">
                                    <xref ref-type="bibr" rid="ref-5">5</xref>
                                </td>
                            </tr>
                            <tr>
                                <td colspan="1" rowspan="1">

                                    <bold>Zhengmix8eq</bold>
</td>
                                <td colspan="1" rowspan="1">10xGenomics
                                    <break/>GemCode</td>
                                <td align="right" colspan="1" rowspan="1">3,994</td>
                                <td align="right" colspan="1" rowspan="1">15,716</td>
                                <td align="right" colspan="1" rowspan="1">1,298</td>
                                <td align="right" colspan="1" rowspan="1">523</td>
                                <td align="center" colspan="1" rowspan="1">8</td>
                                <td colspan="1" rowspan="1">Mixtures of FACS
                                    <break/>purified peripheral blood
                                    <break/>mononuclear cells</td>
                                <td align="center" colspan="1" rowspan="1">
                                    <xref ref-type="bibr" rid="ref-5">5</xref>
                                </td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <p>The selected cell phenotype was used to define the "true" partition of cells when evaluating the clustering methods. For the 
                    <bold>Kumar</bold> data set, we grouped the cells by the genetic perturbation and the medium in which they were grown. For the 
                    <bold>Trapnell</bold> data set we used the time point (after the switch of growth medium) at which the cells were captured, and for the 
                    <bold>Koh</bold> data set we used the cell type annotated by the data collectors (obtained through FACS sorting). We note that the definition of the ground truth constitutes an intrinsic difficulty in the evaluation of clustering methods, since it is plausible that there are several different, but still biologically interpretable, ways of partitioning cells in a given data set, several of which can represent equally strong signals. By using ground truths that are defined independently of the scRNA-seq assay, we avoid artificial inflation of the signal that could result if the truth was derived from the scRNA-seq data itself.</p>
                <p>In addition to the data sets from 
                    <italic toggle="yes">conquer</italic>, we obtained UMI counts from the Zheng data set
                    <sup>
                        <xref ref-type="bibr" rid="ref-5">5</xref>
                    </sup>, generated by the 10x Genomics GemCode protocol, from 
                    <ext-link ext-link-type="uri" xlink:href="https://support.10xgenomics.com/single-cell-gene-expression/datasets">https://support.10xgenomics.com/single-cell-gene-expression/datasets</ext-link>. We downloaded counts for eight pre-sorted cell types (B-cells, naive cytotoxic T-cells, CD14 monocytes, regulatory T-cells, CD56 NK cells, memory T-cells, CD4 T-helper cells and naive T-cells) and combined them into three data sets. For the data set denoted 
                    <bold>Zhengmix4eq</bold>, we combined randomly selected B-cells, CD14 monocytes, naive cytotoxic T-cells and regulatory T-cells in equal proportions (1,000 cells per subpopulation). For the 
                    <bold>Zhengmix4uneq</bold> data set, we combined the same four cell types, but in unequal proportions (1,000 B-cells, 500 naive cytotoxic T-cells, 2,000 CD14 monocytes and 3,000 regulatory T-cells). For the 
                    <bold>Zhengmix8eq</bold> data set, we combined cells from all eight populations, in approximately equal proportions (400&#x2013;600 cells per population). For these data sets, we used the annotated cell type (obtained by pre-sorting of the cells) as the true cell label.</p>
            </sec>
            <sec>
                <title>Simulated data sets</title>
                <p>Using one subpopulation of the 
                    <bold>Kumar</bold> data set as input, we simulated scRNA-seq data with known group structure, using the 
                    <monospace>splatter</monospace> package
                    <sup>
                        <xref ref-type="bibr" rid="ref-28">28</xref>
                    </sup> v1.2.0. We generated three data sets, each consisting of 500 cells, with varying degree of cluster separability. For the 
                    <bold>SimKumar4easy</bold> data set, we generated 4 subpopulations with relative abundances 0.1, 0.15, 0.5 and 0.25, and probabilities of differential expression set to 0.05, 0.1, 0.2 and 0.4 for the four subpopulations, respectively. The 
                    <bold>SimKumar4hard</bold> data set consists of 4 subpopulations with relative abundances 0.2, 0.15, 0.4 and 0.25, and probabilities of differential expression 0.01, 0.05, 0.05 and 0.08. Finally, the 
                    <bold>SimKumar8hard</bold> data set consists of 8 subpopulations with relative abundances 0.13, 0.07, 0.1, 0.05, 0.4, 0.1, 0.1 and 0.05, and probabilites of differential expression equal to 0.03, 0.03, 0.03, 0.05, 0.05, 0.07, 0.08 and 0.1, respectively. The GitHub repository (
                    <ext-link ext-link-type="uri" xlink:href="https://github.com/markrobinsonuzh/scRNAseq_clustering_comparison">https://github.com/markrobinsonuzh/scRNAseq_clustering_comparison</ext-link>) contains a link to a 
                    <monospace>countsimQC</monospace> report
                    <sup>
                        <xref ref-type="bibr" rid="ref-29">29</xref>
                    </sup>, comparing the main characteristics of the simulated data sets to those of the underlying 
                    <bold>Kumar</bold> data set.</p>
            </sec>
            <sec>
                <title>Data processing</title>
                <p>The 
                    <monospace>scater</monospace> package
                    <sup>
                        <xref ref-type="bibr" rid="ref-30">30</xref>
                    </sup> v1.6.3 was used to perform quality control of the data sets. Features with zero counts across all cells, as well as all cells with total count or total number of detected features more than 3 median absolute deviations (MADs) below the median across all cells (on the log scale), were excluded. Depending on the availability of manual annotation, we filtered out cells that were classified as doublets or debris. The 
                    <monospace>scater</monospace> package was also used to normalize the count values, based on normalization factors calculated by the deconvolution method from the 
                    <monospace>scran</monospace> package
                    <sup>
                        <xref ref-type="bibr" rid="ref-31">31</xref>
                    </sup> v1.6.2, and to perform dimension reduction using PCA
                    <sup>
                        <xref ref-type="bibr" rid="ref-32">32</xref>
                    </sup> and t-SNE
                    <sup>
                        <xref ref-type="bibr" rid="ref-33">33</xref>
                    </sup>. Either the raw feature counts or the log-transformed normalized counts were used as input to the clustering algorithms.</p>
            </sec>
            <sec>
                <title>Gene filtering</title>
                <p>We evaluated three methods for reducing the number of genes provided as input to the clustering methods. For each filtering method, we retained 10% of the original number of genes (with a non-zero count in at least one cell) in the respective data sets. First, we retained only the genes with the highest average expression (log-normalized count) value across all cells (denoted Expr below). Second, we used 
                    <monospace>Seurat</monospace>
                    <sup>
                        <xref ref-type="bibr" rid="ref-16">16</xref>
                    </sup> to estimate the variability of the features and retained only the most highly variable ones (HVG). Finally, we used 
                    <monospace>M3Drop</monospace>
                    <sup>
                        <xref ref-type="bibr" rid="ref-34">34</xref>
                    </sup> to model the dropout rate of the genes as a function of the mean expression level using the Michaelis-Menten equation (M3Drop). The gene-wise Michaelis-Menten constants are computed and log-transformed, and the genes are then ranked by their p-value from a Z-test comparing the gene-wise constants to a global constant obtained by combining all the genes. After filtering, we used 
                    <monospace>scran</monospace> to renormalize each data set, excluding cells with negative size factors. 
                    <xref ref-type="other" rid="SF1">Supplementary Figure 2</xref> shows the overlap between the retained genes with the different filtering methods, for each of the 12 data sets, and 
                    <xref ref-type="other" rid="SF1">Supplementary Table 1</xref> provides the number of cells retained after each type of filtering.</p>
            </sec>
            <sec>
                <title>Clustering methods</title>
                <p>Twelve clustering methods were evaluated in this study (see 
                    <xref ref-type="table" rid="T2">Table 2</xref> for an overview). We included general-purpose clustering methods, such as hierarchical clustering and K-means, as well as methods developed specifically for scRNA-seq data, such as 
                    <monospace>Seurat</monospace> and 
                    <monospace>SC3</monospace>.</p>
                <table-wrap id="T2" orientation="portrait" position="anchor">
                    <label>Table 2. </label>
                    <caption>
                        <title>Clustering methods.</title>
                    </caption>
                    <table content-type="article-table" frame="hsides">
                        <thead>
                            <tr>
                                <th align="left" colspan="1" rowspan="1">Method</th>
                                <th align="left" colspan="1" rowspan="1">Description</th>
                                <th align="left" colspan="1" rowspan="1">Reference</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <monospace>ascend</monospace> (v0.5.0)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">PCA dimension reduction (dim=30) and iterative hierarchical clustering</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref-35">35</xref>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <monospace>CIDR</monospace> (v0.1.5)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">PCA dimension reduction based on zero-imputed similarities, followed by hierarchical clustering</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref-36">36</xref>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <monospace>FlowSOM</monospace> (v1.12.0)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">PCA dimension reduction (dim=50) followed by self-organizing maps (5x5, 8x8 or 15x15 grid,
                                    <break/>depending on the number of cells in the data set) and hierarchical consensus meta-clustering to
                                    <break/>merge clusters</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref-37">37</xref>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <monospace>PCAHC</monospace>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">PCA dimension reduction (dim=30) and hierarchical clustering with Ward.D2 linkage</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref-32">32</xref>,
                                    <xref ref-type="bibr" rid="ref-38">38</xref>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <monospace>PCAKmeans</monospace>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">PCA dimension reduction (dim=30) and K-means clustering with 25 random starts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref-32">32</xref>,
                                    <xref ref-type="bibr" rid="ref-39">39</xref>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <monospace>pcaReduce</monospace> (v1.0)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">PCA dimension reduction (dim=30) and k-means clustering through an iterative process.
                                    <break/>Stepwise merging of clusters by joint probabilities and reducing the number of dimensions by PC
                                    <break/>with lowest variance. Repeated 100 times followed consensus clustering using the clue package</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref-40">40</xref>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <monospace>RtsneKmeans</monospace>
                                </td>
                                <td align="left" colspan="1" rowspan="1" valign="top">t-SNE dimension reduction (initial PCA dim=50, t-SNE dim=3, perplexity=30) and K-means
                                    <break/>clustering with 25 random starts</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref-33">33</xref>,
                                    <xref ref-type="bibr" rid="ref-39">39</xref>,
                                    <xref ref-type="bibr" rid="ref-41">41</xref>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <monospace>SAFE</monospace> (v2.1.0)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Ensemble clustering using SC3, CIDR, Seurat and t-SNE + Kmeans</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref-42">42</xref>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <monospace>SC3</monospace> (v1.8.0)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">PCA dimension reduction or Laplacian graph. K-means clustering on different dimensions.
                                    <break/>Hierarchical clustering on consensus matrix obtained by K-means</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref-43">43</xref>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <monospace>SC3svm</monospace> (v1.8.0)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Using SC3 to derive the clusters for half of the cells, then using a support vector machine (SVM)
                                    <break/>to classify the rest</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref-43">43</xref>,
                                    <xref ref-type="bibr" rid="ref-44">44</xref>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <monospace>Seurat</monospace> (v2.3.1)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">Dimension reduction by PCA (dim=30) followed by nearest neighbor graph clustering</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref-16">16</xref>
                                </td>
                            </tr>
                            <tr>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <monospace>TSCAN</monospace> (v1.18.0)</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">PCA dimension reduction followed by model-based clustering</td>
                                <td align="left" colspan="1" rowspan="1" valign="top">
                                    <xref ref-type="bibr" rid="ref-45">45</xref>
                                </td>
                            </tr>
                        </tbody>
                    </table>
                </table-wrap>
                <p>All methods except 
                    <monospace>Seurat</monospace> allow explicit specification of the desired number of clusters (k). 
                    <monospace>Seurat</monospace> instead requires a resolution parameter, which indirectly controls the number of clusters. For each data set, we ran each method with a range of k values (from 2 to either 10 or 15, depending on the true number of subpopulations in the data set). We ran 
                    <monospace>Seurat</monospace> with a range of resolution parameter values, approximately corresponding to the range of k values evaluated for the other methods. A subset of the methods provide an estimate of the true number of clusters; we record this estimate for comparison with the true number of subpopulations. For each choice of k (or resolution), we ran each method five times, allowing us to investigate the intrinsic stability of the obtained partitions. Note that the data is the same for all five instances, and thus only the stochasticity of the clustering method affects our stability evaluation. All parameter values except for the number of clusters were set to reasonable values following the authors&#x2019; recommendations or the respective manuals (
                    <xref ref-type="table" rid="T2">Table 2</xref>). Gene and cell filtering within the clustering methods were disabled whenever possible, since these steps were performed in a uniform way during the preprocessing and gene selection steps.</p>
            </sec>
            <sec>
                <title>Evaluation criteria</title>
                <p>In order to evaluate how well the inferred clusters recovered the true subpopulations, we used the Hubert-Arabie Adjusted Rand Index (ARI) for comparing two partitions
                    <sup>
                        <xref ref-type="bibr" rid="ref-46">46</xref>
                    </sup>. The metric is adjusted for chance, such that independent clusterings have an expected index of zero and identical partitions have an ARI equal to 1, and was calculated using the implementation in the 
                    <monospace>mclust</monospace> R package v5.4. We also used the ARI to evaluate the stability of the clusters, by comparing the partitions from each pair of the five independent runs for each method with a given number of clusters.</p>
                <p>We used a normalized Shannon entropy
                    <sup>
                        <xref ref-type="bibr" rid="ref-47">47</xref>
                    </sup> to evaluate whether the methods preferentially partitioned the cells into clusters of equal size, or whether they preferred one large and many small clusters. Given proportions 
                    <italic toggle="yes">p</italic>
                    <sub>1</sub>, &#x2026;, 
                    <italic toggle="yes">p
                        <sub>N</sub>
                    </italic> of cells assigned to each of 
                    <italic toggle="yes">N</italic> clusters, the normalized Shannon entropy is defined by</p>
                <p>
                    <disp-formula id="e1">
                        <mml:math display="block" id="math1">
                            <mml:mrow>
                                <mml:mfrac>
                                    <mml:mi>H</mml:mi>
                                    <mml:mrow>
                                        <mml:msub>
                                            <mml:mi>H</mml:mi>
                                            <mml:mrow>
                                                <mml:mtext mathvariant="italic">max</mml:mtext>
                                                <mml:mo>&#x2061;</mml:mo>
                                            </mml:mrow>
                                        </mml:msub>
                                    </mml:mrow>
                                </mml:mfrac>
                                <mml:mo>=</mml:mo>
                                <mml:mo>&#x2212;</mml:mo>
                                <mml:mstyle displaystyle="true">
                                    <mml:munderover>
                                        <mml:mo>&#x2211;</mml:mo>
                                        <mml:mrow>
                                            <mml:mi>i</mml:mi>
                                            <mml:mo>=</mml:mo>
                                            <mml:mn>1</mml:mn>
                                        </mml:mrow>
                                        <mml:mi>N</mml:mi>
                                    </mml:munderover>
                                    <mml:mrow>
                                        <mml:msub>
                                            <mml:mi>p</mml:mi>
                                            <mml:mi>i</mml:mi>
                                        </mml:msub>
                                        <mml:mfrac>
                                            <mml:mrow>
                                                <mml:msub>
                                                    <mml:mrow>
                                                        <mml:mtext mathvariant="italic">log</mml:mtext>
                                                        <mml:mo>&#x2061;</mml:mo>
                                                    </mml:mrow>
                                                    <mml:mn>2</mml:mn>
                                                </mml:msub>
                                                <mml:msub>
                                                    <mml:mi>p</mml:mi>
                                                    <mml:mi>i</mml:mi>
                                                </mml:msub>
                                            </mml:mrow>
                                            <mml:mrow>
                                                <mml:msub>
                                                    <mml:mrow>
                                                        <mml:mtext mathvariant="italic">log</mml:mtext>
                                                        <mml:mo>&#x2061;</mml:mo>
                                                    </mml:mrow>
                                                    <mml:mn>2</mml:mn>
                                                </mml:msub>
                                                <mml:mi>N</mml:mi>
                                            </mml:mrow>
                                        </mml:mfrac>
                                        <mml:mo>.</mml:mo>
                                    </mml:mrow>
                                </mml:mstyle>
                            </mml:mrow>
                            <mml:mspace width="5em"/>
                            <mml:mrow>
                                <mml:mo stretchy="false">(</mml:mo>
                                <mml:mn>1</mml:mn>
                                <mml:mo stretchy="false">)</mml:mo>
                            </mml:mrow>
                        </mml:math>
                    </disp-formula>
                </p>
                <p>Since the true degree of equality of the cluster sizes varies between data sets, we subtracted the normalized entropy calculated from the true partition to obtain the final performance index.</p>
                <p>To evaluate the similarities between the partitions obtained by different methods, we first calculated a consensus partition from the five independent runs for each method, using the 
                    <monospace>clue</monospace> R package
                    <sup>
                        <xref ref-type="bibr" rid="ref-48">48</xref>
                    </sup> v0.3-55. Next, for each data set and each imposed number of clusters, we calculated the ARI between the partitions for each pair of methods, and used hierarchical clustering based on the median of these ARI values across all data sets to generate a dendrogram representing the similarity among the clusters obtained by different methods. To investigate how representative this dendrogram is, we also clustered the methods based on each data set separately, and calculated the fraction of such dendrograms in which each subcluster in the overall dendrogram appeared.</p>
                <p>Finally, we investigated whether clustering performance was improved by combining two methods into an ensemble. For each data set, and with the true number of clusters imposed, we calculated a consensus partition for each pair of methods using the 
                    <monospace>clue</monospace> R package, and used the ARI to evaluate the agreement with the true cell labels. We then compared the ensemble performance to the performances of the two individual methods used to construct the ensemble.</p>
            </sec>
        </sec>
        <sec sec-type="results">
            <title>Results</title>
            <sec>
                <title>Large differences in performance across data sets and methods</title>
                <p>The 12 methods were tested on real data sets as well as simulations with a varying degree of complexity (
                    <xref ref-type="table" rid="T1">Table 1</xref>) and across a range of the number of subpopulations. Focusing on the agreement between the true partitions and the clusterings obtained by imposing the true number of clusters showed a large difference between data sets as well as between methods (
                    <xref ref-type="fig" rid="f1">Figure 1</xref>; a summary across different numbers of clusters can be found in 
                    <xref ref-type="other" rid="SF1">Supplementary Figure 3</xref>).</p>
                <fig fig-type="figure" id="f1" orientation="portrait" position="float">
                    <label>Figure 1. </label>
                    <caption>
                        <title>Median ARI scores, representing the agreement between the true partition and the one obtained by each method, when the number of clusters is fixed to the true number.</title>
                        <p>Each row corresponds to a different data set, each panel to a different gene filtering method, and each column to a different clustering method. The methods and the data sets are ordered by their mean ARI across the filterings and data sets. Some methods failed to return a clustering with the correct number of clusters for certain data sets (indicated by white squares).</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/17093/679678ad-586a-4224-81e2-3dc9505c3f5a_figure1.gif"/>
                </fig>
                <p>As expected, excellent performances were achieved for the well-separated data sets with a strong difference between the groups of cells (
                    <bold>Kumar</bold>, 
                    <bold>KumarTCC</bold> and 
                    <bold>SimKumar4easy</bold>). When filtering by expression or variability, close to all methods achieved a correct partitioning of the cells in these data sets, while the 
                    <monospace>M3Drop</monospace> filtering led to poorer performance for the simulated data set. On the other hand, all methods failed to recover the partition of the cells by time point in the 
                    <bold>Trapnell</bold> data sets, where the ARIs were consistently below 0.5. This indicates that there are other, stronger, signals in this data set that dominate the clustering.</p>
                <p>We note that the 
                    <monospace>M3Drop</monospace> filtering consistently led to worse performance for the simulated data sets, while the performance was more similar to the other filterings for the real data sets, which may indicate that the simulated dropout pattern is not consistent with the one being modeled by the 
                    <monospace>M3Drop</monospace> package. Due to negative size factor estimates, a larger number of cells had to be excluded in the 
                    <bold>Zhengmix</bold> data sets after the 
                    <monospace>M3Drop</monospace> filtering compared to the expression or HVG filtering (
                    <xref ref-type="other" rid="SF1">Supplementary Table 1</xref>). At most just over 20% of the cells in the expression and HVG filtering and up to approximately 40% of the cells for the 
                    <monospace>M3Drop</monospace> filtering were excluded, making a direct comparison between the filterings difficult. Furthermore, the genes retained in the 
                    <monospace>M3Drop</monospace> and expression filterings showed a low degree of overlap in many of the data sets (
                    <xref ref-type="other" rid="SF1">Supplementary Figure 2</xref>). Overall, only small differences were seen between the results for the data sets containing gene abundances and those containing transcript compatibility counts (TCCs).</p>
                <p>While none of the methods consistently outperformed the others over the full range of the imposed numbers of clusters in all data sets, 
                    <monospace>SC3</monospace> and 
                    <monospace>Seurat</monospace> often showed the best performance. These methods were also the only ones that achieved a good separation of the cell types in the droplet-based 
                    <bold>Zhengmix</bold> data sets, which have a much higher degree of sparsity and a larger number of cells than the other data sets. This is consistent with a previous study
                    <sup>
                        <xref ref-type="bibr" rid="ref-15">15</xref>
                    </sup> showing that 
                    <monospace>Seurat</monospace> performed better than other types of algorithms on data with low read depth. Generally, the performance of 
                    <monospace>Seurat</monospace> was also not strongly affected by the gene filtering approach (except for the simulated data sets), while other methods, like 
                    <monospace>SAFE</monospace>, were more sensitive to the choice of input genes for some data sets. 
                    <monospace>FlowSOM</monospace> showed a poor performance for the true number of clusters (see 
                    <xref ref-type="other" rid="SF1">Supplementary Figure 4</xref> for an illustration, together with a selection of other data set/method combinations with poor ARI values). However, if the number of clusters was increased, the performance of 
                    <monospace>FlowSOM</monospace> improved considerably, and if the methods instead were compared at the number of clusters that gave the optimal performance for each method, 
                    <monospace>FlowSOM</monospace> showed a moderate performance (
                    <xref ref-type="other" rid="SF1">Supplementary Figure 5</xref>). 
                    <monospace>RtsneKmeans</monospace>, a general-purpose method, showed a higher average performance across the data sets and filterings than many of the clustering algorithms specifically developed for scRNA-seq data. Compared to 
                    <monospace>SC3</monospace> and 
                    <monospace>Seurat</monospace>, 
                    <monospace>RtsneKmeans</monospace> showed poorer performance for the 
                    <bold>SimKumar8hard</bold> and 
                    <bold>Zhengmix4uneq</bold> data sets. The subpopulations in these data sets are nested in the t-SNE space, explaining the difficulty in clustering for the K-means algorithm (
                    <xref ref-type="other" rid="SF1">Supplementary Figure 1</xref>).</p>
                <p>We also investigated whether the number of detected features per cell differed between the clusters, using a Kruskal-Wallis test
                    <sup>
                        <xref ref-type="bibr" rid="ref-49">49</xref>
                    </sup>. No strong association was found for the simulated data sets (
                    <xref ref-type="other" rid="SF1">Supplementary Figure 6</xref>), indicating that there is low inherent bias in the clustering algorithms. For most of the real data sets we found highly significant differences in the number of detected features between cells in different clusters. However, it is unclear whether this represents a technical effect or a biological difference between the cell populations.</p>
            </sec>
            <sec>
                <title>Run times vary widely between methods</title>
                <p>We measured the elapsed time for each run, using a single core and excluding the time to estimate the number of clusters if this was done via a separate function. Since the run times are strongly dependent on the number of features and cells in a data set, we represent them as normalized run times, by dividing with the time required for 
                    <monospace>RtsneKmeans</monospace> for the same data set (
                    <xref ref-type="fig" rid="f2">Figure 2A</xref>). 
                    <monospace>Seurat</monospace> was the fastest method, while 
                    <monospace>pcaReduce</monospace>, 
                    <monospace>SAFE</monospace> and 
                    <monospace>SC3</monospace> were the slowest, sometimes by a large margin. Clustering only half of the cells with 
                    <monospace>SC3</monospace> and predicting the class of the others with a Support Vector Machine (
                    <monospace>SC3svm</monospace>) gave slightly shorter run times than applying the 
                    <monospace>SC3</monospace> clustering to all cells. The method could potentially be accelerated by using a lower proportion of cells as a training subset. A detailed overview of the run time and the dependence on the number of clusters is given in 
                    <xref ref-type="other" rid="SF1">Supplementary Figures 7 and 8</xref>. Apart from 
                    <monospace>SC3</monospace> and 
                    <monospace>SC3svm</monospace>, the imposed number of clusters did not affect the run time.</p>
                <fig fig-type="figure" id="f2" orientation="portrait" position="float">
                    <label>Figure 2. </label>
                    <caption>
                        <p>(
                            <bold>A</bold>) Normalized run times, using 
                            <monospace>RtsneKmeans</monospace> as the reference method, across all data set instances and number of clusters. (
                            <bold>B</bold>) Run time versus performance (ARI) for a subset of data sets and filterings, for the true number of clusters.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/17093/679678ad-586a-4224-81e2-3dc9505c3f5a_figure2.gif"/>
                </fig>
                <p>Plotting the run time versus the Adjusted Rand Index for a subset of the data sets (excluding the ones with the strongest signal, where all methods found the correct clusters, and the TCC data sets) (
                    <xref ref-type="fig" rid="f2">Figure 2B</xref>) further illustrated the variability between the methods. Interestingly, 
                    <monospace>Seurat</monospace> was generally the fastest method, especially for the droplet-based data sets, but at the same time provided among the best partitionings of the data.</p>
            </sec>
            <sec>
                <title>High stability between clustering runs</title>
                <p>
                    <xref ref-type="fig" rid="f1">Figure 1</xref> illustrated the average performance of each method across the five runs on each data set, for the true number of clusters. By comparing the partitions obtained in the individual runs, we could also obtain a measure of the stability of each method (
                    <xref ref-type="fig" rid="f3">Figure 3A</xref>).</p>
                <fig fig-type="figure" id="f3" orientation="portrait" position="float">
                    <label>Figure 3. </label>
                    <caption>
                        <p>(
                            <bold>A</bold>) Median stability (ARI across different runs on the same data set) for the methods, with the annotated number of clusters imposed. Some methods failed to return a clustering with the correct number of clusters for certain data sets (indicated by white squares). (
                            <bold>B</bold>) The difference between the normalized entropy of the obtained clusterings and that of the true partitions, across all data sets and for the annotated number of clusters. (
                            <bold>C</bold>) The difference between the number of clusters giving the maximal ARI and the annotated number of clusters, across all data sets.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/17093/679678ad-586a-4224-81e2-3dc9505c3f5a_figure3.gif"/>
                </fig>
                <p>
                    <monospace>CIDR</monospace>, 
                    <monospace>PCAHC</monospace>, 
                    <monospace>TSCAN</monospace>, ascend and 
                    <monospace>Seurat</monospace> returned the same clusters in all five instances for all data sets, while the stability of the other methods depended on the data set. Again, the stability was lower for the simulated data sets after gene filtering by 
                    <monospace>M3Drop</monospace> (note that the same genes were used in all five runs), indicating that the selection of genes may be suboptimal.</p>
                <p>A summary of the variability both within and between the different filterings is shown in 
                    <xref ref-type="other" rid="SF1">Supplementary Figure 9</xref>. It is worth noting that comparing the performances between the different filtering approaches is difficult for two reasons: first, the variability of the clustering runs for a given filtering might exceed the variation between the filterings, and second, filtering with 
                    <monospace>M3Drop</monospace> led to the exclusion of a large number of cells in the 
                    <bold>Zhengmix</bold> data sets, and these cells can not be used for the comparison. For the stable methods 
                    <monospace>CIDR</monospace>, 
                    <monospace>TSCAN</monospace>, 
                    <monospace>ascend</monospace> and 
                    <monospace>PCAHC</monospace>, the type of filtering had a relatively large impact on the clustering solutions, and often filtering on the mean gene expression and the gene variability gave more similar clusters than filtering with 
                    <monospace>M3Drop</monospace>. The stochastic methods showed both a high variability between the individual runs for a given filtering and between runs with different filterings.</p>
                <p>

                    <bold>Qualitative differences between cluster characteristics</bold>
</p>
                <p>By computing the Shannon entropy for the various partitions, we obtained a measure of the equality of the sizes of the clusters (
                    <xref ref-type="fig" rid="f3">Figure 3B</xref>). Since the true degree of cluster size uniformity as well as the number of clusters are different between data sets, we compared the normalized Shannon entropy of the clusterings to that of the true partitions. Thus, a positive value of this statistic indicates that a method tends to produce more equally sized clusters than the true ones, and a negative value instead indicates that the method tends to return more unequal cluster sizes, e.g., one large cluster and a few small ones. Most methods gave cluster sizes that were compatible with the true sizes for most data sets (a statistic close to 0), while especially 
                    <monospace>FlowSOM</monospace> was more variable, and often tended to group the cells into one large cluster and a few very small ones (see 
                    <xref ref-type="other" rid="SF1">Supplementary Figure 4</xref> for an example). One consequence of this was that 
                    <monospace>FlowSOM</monospace> often showed higher ARI values for a larger number of clusters, while the performance of many of the other methods decreased with increasing k (
                    <xref ref-type="other" rid="SF1">Supplementary Figure 3</xref>). These methods tended to have more equally sized clusters for larger numbers of clusters than the true number, leading to a higher disagreement between the true classification and the clusterings (the entropy across the range of k is shown in 
                    <xref ref-type="other" rid="SF1">Supplementary Figure 10</xref>).</p>
            </sec>
            <sec>
                <title>The optimal number of clusters can differ from the &#x201d;true&#x201d; one</title>
                <p>Above, we investigated the performance and stability of the methods when the true number of clusters (the number of different labels in the partitioning considered as the ground truth) was imposed. Whether this number of clusters actually provided the highest ARI value (i.e., the best agreement with the ground truth) mainly depended on the difficulty of the clustering task (
                    <xref ref-type="fig" rid="f3">Figure 3C</xref>), and the choice of method. No method achieved the best performance at the annotated number of clusters in all the data sets, although generally, the methods reached their maximum performance at or near the annotated number of clusters. The notable exception was 
                    <monospace>FlowSOM</monospace>, which required a relatively large number of clusters to reach its maximal performance.</p>
                <p>
                    <monospace>SC3</monospace>, 
                    <monospace>CIDR</monospace>, 
                    <monospace>ascend</monospace>, 
                    <monospace>SAFE</monospace> and 
                    <monospace>TSCAN</monospace> all have built-in functionality for estimating the optimal number of clusters. In most cases, the estimated number was close to the true one; however, 
                    <monospace>ascend</monospace> and 
                    <monospace>CIDR</monospace> had a tendency to underestimate the number of clusters, while 
                    <monospace>SC3</monospace> and 
                    <monospace>TSCAN</monospace> instead tended to overestimate the number (
                    <xref ref-type="other" rid="SF1">Supplementary Figure 11</xref>). The tendency of 
                    <monospace>SC3</monospace> to overestimate the cluster number is consistent with a previous publication
                    <sup>
                        <xref ref-type="bibr" rid="ref-15">15</xref>
                    </sup>. The agreement with the true partition at the estimated number of clusters is shown in 
                    <xref ref-type="other" rid="SF1">Supplementary Figure 12</xref>. 
                    <monospace>SC3</monospace> is still the best-performing method in this situation.</p>
            </sec>
            <sec>
                <title>Inconsistent degree of similarity between methods</title>
                <p>The similarity between each pair of methods was quantified by means of the ARIs for each pair of consensus clusterings (across the five runs of each method for each data set and number of clusters). 
                    <xref ref-type="fig" rid="f4">Figure 4</xref> shows a dendrogram of the methods obtained by hierarchical clustering based on the average ARI values across all data sets for the true number of clusters. The numbers shown at the internal nodes indicate the stability of the subclusters, that is, the fraction of the corresponding dendrograms from the individual data sets where a particular subcluster could be found. In general, the groupings of the methods shown in the dendrogram were unstable across data sets and number of clusters, indicated by the low stability fractions of all subclusters. This is consistent with previous studies showing generally poor concordance that varied across data sets
                    <sup>
                        <xref ref-type="bibr" rid="ref-19">19</xref>,
                        <xref ref-type="bibr" rid="ref-42">42</xref>
                    </sup>. Even 
                    <monospace>SC3</monospace> and 
                    <monospace>SC3svm</monospace> had surprisingly different clusterings; in less than a third of the data sets, these two methods showed the most similar clusterings. In addition, no apparent association between the similarity of the clusterings and the type of input or the dimension reduction or underlying type of clustering algorithm was found.</p>
                <fig fig-type="figure" id="f4" orientation="portrait" position="float">
                    <label>Figure 4. </label>
                    <caption>
                        <title>Clustering of the methods based on the average similarity of their partitions across data sets, for the true number of clusters.</title>
                        <p>Numbers on internal nodes indicate the fraction of dendrograms from individual data sets where a particular subcluster was found.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/17093/679678ad-586a-4224-81e2-3dc9505c3f5a_figure4.gif"/>
                </fig>
            </sec>
            <sec>
                <title>Ensembles often don&#x2019;t improve clustering performance</title>
                <p>Next, we investigated whether we could improve the clustering performance by combining methods into an ensemble. For each pair of methods, we generated a consensus clustering and evaluated its agreement with the true partition using the ARI. In general, the performance of the ensemble was worse than the better of the two combined methods, and better than the worse of the two methods (
                    <xref ref-type="fig" rid="f5">Figure 5A</xref>), suggesting that we would obtain a better performance by choosing a single good clustering method rather than combining multiple different ones. This is largely consistent with a recent study evaluating the combination of four methods (
                    <monospace>SC3</monospace>, 
                    <monospace>CIDR</monospace>, 
                    <monospace>Seurat</monospace>, 
                    <monospace>tSNE+Kmeans</monospace>), where the ensemble performance was generally on par with the best individual method
                    <sup>
                        <xref ref-type="bibr" rid="ref-42">42</xref>
                    </sup>. It is still possible that an ensemble method could provide a general improvement over a 
                    <italic toggle="yes">given</italic> single method, since it is unlikely that the same method will be the best performing in all conceivable data sets. In fact, among the methods we evaluated, both 
                    <monospace>SC3</monospace> and 
                    <monospace>SAFE</monospace> combine multiple individual methods to achieve the final clustering result. Studying individual combinations in more detail, we observed that combining 
                    <monospace>SC3</monospace> or 
                    <monospace>Seurat</monospace> with almost any other method led to a worse performance than obtained by these methods alone (consistent with the observation that they were among the methods giving the best performance). On the other hand, methods like 
                    <monospace>CIDR</monospace>, 
                    <monospace>FlowSOM</monospace> and 
                    <monospace>TSCAN</monospace> could often be improved by combining them with another method (
                    <xref ref-type="fig" rid="f5">Figure 5B</xref>).</p>
                <fig fig-type="figure" id="f5" orientation="portrait" position="float">
                    <label>Figure 5. </label>
                    <caption>
                        <title>Comparison between individual methods and ensembles.</title>
                        <p>(
                            <bold>A</bold>) Difference between the ARI of each ensemble and the ARI of the best (left) and worst (right) of the two methods in the ensemble, across all data sets and for the true number of clusters. (
                            <bold>B</bold>) Difference between the ARI of each ensemble and each of the components, across all data sets and for the true number of clusters. The histogram in row i, column j represents the differences between the ARIs of the ensemble of the methods in row i and column j and the ARI of the method in row i on its own.</p>
                    </caption>
                    <graphic orientation="portrait" position="float" xlink:href="https://f1000research-files.f1000.com/manuscripts/17093/679678ad-586a-4224-81e2-3dc9505c3f5a_figure5.gif"/>
                </fig>
            </sec>
        </sec>
        <sec sec-type="discussion | conclusions">
            <title>Discussion and conclusions</title>
            <p>In this study, we have evaluated 12 clustering methods on both real and simulated scRNA-seq data. There were large differences in the ability of the methods to recover the annotated clusters, and performance was also strongly dependent on the degree of separation between the true classes. 
                <monospace>SC3</monospace> and 
                <monospace>Seurat</monospace>, two clustering methods developed specifically for single-cell RNA-seq data, delivered the overall best performance, and were the only ones to properly recover the cell types in the droplet-based data sets. There was, however, a large difference in the run time, with 
                <monospace>SC3</monospace> being several orders of magnitude slower than 
                <monospace>Seurat</monospace>. Another difference between these two methods is that 
                <monospace>SC3</monospace> includes a method for estimating the number of clusters (which has a tendency towards overestimation), while 
                <monospace>Seurat</monospace> will determine the number of clusters based on a resolution parameter set by the user.</p>
            <p>The same preprocessing steps and fixed gene sets were used for all clustering methods. This enabled us to investigate the impact on the clustering algorithm itself, rather than entire pipelines or workflows. The selection of the filtering approach had an impact on the majority of the methods and resulted in different clustering solutions. Specifically for the more difficult data sets there was a higher dissimilarity. However, this did not necessarily affect the performances of the methods.</p>
            <p>The stability of clustering algorithms can be evaluated by generating perturbed subsamples of the data set and redoing the clusterings. These subsamples can be created in several ways, e.g., by random subsampling with or without replacement, by adding noise to the original data
                <sup>
                    <xref ref-type="bibr" rid="ref-50">50</xref>
                </sup> or by simulating technical replicates
                <sup>
                    <xref ref-type="bibr" rid="ref-51">51</xref>
                </sup>. Freytag
                <sup>
                    <xref ref-type="bibr" rid="ref-19">19</xref>
                </sup> showed that 
                <monospace>SC3</monospace>, 
                <monospace>Seurat</monospace>, 
                <monospace>CIDR</monospace> and 
                <monospace>TSCAN</monospace> were stable under cell-wise perturbations. In our study, we evaluated the methods with respect to their sensitivity to random starts. Overall, the methods showed a high degree of stability across all data sets, except for the simulated data sets in combination with the M3Drop filtering, where the stochastic methods showed a decrease in stability. This may be due to a disagreement between the mean-dropout relationship in the simulated data and the one assumed by 
                <monospace>M3Drop</monospace>, leading to a suboptimal gene selection.</p>
            <p>The evaluated methods are based on a broad spectrum of approaches for dimensionality reduction and clustering. We note that the majority of the methods use PCA or PCoA for dimension reduction or Euclidean distances as the distance metric (
                <monospace>ascend</monospace> allows for other alternatives). Thus, no clear advice on the type of algorithm that is best suited for clustering single-cell RNA-seq data can be made based on our results. In fact, the two best-performing methods, 
                <monospace>SC3</monospace> and 
                <monospace>Seurat</monospace>, rely on very different underlying clustering algorithms.</p>
            <p>We investigated the impact of changing the imposed number of clusters for the different methods, which revealed that a subset of the methods, in particular 
                <monospace>FlowSOM</monospace>, consistently showed a better agreement with the true subpopulations if the number of clusters was increased beyond the true number. The reason for this appears to be that 
                <monospace>FlowSOM</monospace> tends to split off a few very small clusters. In addition to the number of clusters, most methods rely on other hyperparameters. In this study, we have fixed these to reasonable values. However, additional investigations into the effect of these hyperparameters on the results would be an interesting direction for future research.</p>
        </sec>
        <sec>
            <title>Data availability</title>
            <p>The R scripts providing an extensible framework for the evaluation of new methods and data sets are available on GitHub. All filtered (and unfiltered) data sets used in this study are readily available from the links provided: 
                <ext-link ext-link-type="uri" xlink:href="https://github.com/markrobinsonuzh/scRNAseq_clustering_comparison">https://github.com/markrobinsonuzh/scRNAseq_clustering_comparison</ext-link>.</p>
            <p>Archived R scripts as at time of publication are available from 
                <ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.5281/zenodo.1314743">https://doi.org/10.5281/zenodo.1314743</ext-link>
                <sup>
                    <xref ref-type="bibr" rid="ref-52">52</xref>
                </sup> under an MIT license.</p>
        </sec>
    </body>
    <back>
        <ack>
            <title>Acknowledgements</title>
            <p>We would like to thank the members of the Robinson group at the UZH for valuable input.</p>
        </ack>
        <sec id="SM1" sec-type="supplementary-material">
            <title>Supplementary material</title>
            <p id="SF1">Supplementary File 1: PDF file containing 
                <xref ref-type="other" rid="SF1">Supplementary Figures 1&#x2013;12</xref> and 
                <xref ref-type="other" rid="SF1">Supplementary Table 1</xref>.</p>
            <p>

                <ext-link ext-link-type="uri" xlink:href="https://f1000researchdata.s3.amazonaws.com/supplementary/15666/ff61eb1b-317a-4106-aa7b-53a8867ad93d.pdf">Click here to access the data</ext-link>.</p>
        </sec>
        <ref-list>
            <ref id="ref-1">
                <label>1</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Tang</surname>
                            <given-names>F</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Barbacioru</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wang</surname>
                            <given-names>Y</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>mRNA-Seq whole-transcriptome analysis of a single cell.</article-title>
                    <source>

                        <italic toggle="yes">Nat Methods.</italic>
</source>
                    <year>2009</year>;<volume>6</volume>(<issue>5</issue>):<fpage>377</fpage>&#x2013;<lpage>382</lpage>.
                    <pub-id pub-id-type="pmid">19349980</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nmeth.1315</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-2">
                <label>2</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Picelli</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Bj&#x00f6;rklund</surname>
                            <given-names>&#x00c5;K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Faridani</surname>
                            <given-names>OR</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Smart-seq2 for sensitive full-length transcriptome profiling in single cells.</article-title>
                    <source>

                        <italic toggle="yes">Nat Methods.</italic>
</source>
                    <year>2013</year>;<volume>10</volume>(<issue>11</issue>):<fpage>1096</fpage>&#x2013;<lpage>1098</lpage>.
                    <pub-id pub-id-type="pmid">24056875</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nmeth.2639</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-3">
                <label>3</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Klein</surname>
                            <given-names>AM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mazutis</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Akartuna</surname>
                            <given-names>I</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells.</article-title>
                    <source>

                        <italic toggle="yes">Cell.</italic>
</source>
                    <year>2015</year>;<volume>161</volume>(<issue>5</issue>):<fpage>1187</fpage>&#x2013;<lpage>1201</lpage>.
                    <pub-id pub-id-type="pmid">26000487</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.cell.2015.04.044</pub-id>
                    <pub-id pub-id-type="pmcid">4441768</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-4">
                <label>4</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Macosko</surname>
                            <given-names>EZ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Basu</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Satija</surname>
                            <given-names>R</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets.</article-title>
                    <source>

                        <italic toggle="yes">Cell.</italic>
</source>
                    <year>2015</year>;<volume>161</volume>(<issue>5</issue>):<fpage>1202</fpage>&#x2013;<lpage>1214</lpage>.
                    <pub-id pub-id-type="pmid">26000488</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.cell.2015.05.002</pub-id>
                    <pub-id pub-id-type="pmcid">4481139</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-5">
                <label>5</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Zheng</surname>
                            <given-names>GX</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Terry</surname>
                            <given-names>JM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Belgrader</surname>
                            <given-names>P</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Massively parallel digital transcriptional profiling of single cells.</article-title>
                    <source>

                        <italic toggle="yes">Nat Commun.</italic>
</source>
                    <year>2017</year>;<volume>8</volume>: 14049.
                    <pub-id pub-id-type="pmid">28091601</pub-id>
                    <pub-id pub-id-type="doi">10.1038/ncomms14049</pub-id>
                    <pub-id pub-id-type="pmcid">5241818</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-6">
                <label>6</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Svensson</surname>
                            <given-names>V</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Natarajan</surname>
                            <given-names>KN</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ly</surname>
                            <given-names>LH</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Power analysis of single-cell RNA-sequencing experiments.</article-title>
                    <source>

                        <italic toggle="yes">Nat Methods.</italic>
</source>
                    <year>2017</year>;<volume>14</volume>(<issue>4</issue>):<fpage>381</fpage>&#x2013;<lpage>387</lpage>.
                    <pub-id pub-id-type="pmid">28263961</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nmeth.4220</pub-id>
                    <pub-id pub-id-type="pmcid">5376499</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-7">
                <label>7</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Svensson</surname>
                            <given-names>V</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Vento-Tormo</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Teichmann</surname>
                            <given-names>SA</given-names>
                        </name>
</person-group>:
                    <article-title>Exponential scaling of single-cell RNA-seq in the past decade.</article-title>
                    <source>

                        <italic toggle="yes">Nat Protoc.</italic>
</source>
                    <year>2018</year>;<volume>13</volume>(<issue>4</issue>):<fpage>599</fpage>&#x2013;<lpage>604</lpage>.
                    <pub-id pub-id-type="pmid">29494575</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nprot.2017.149</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-8">
                <label>8</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ziegenhain</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Vieth</surname>
                            <given-names>B</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Parekh</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Quantitative single-cell transcriptomics.</article-title>
                    <source>

                        <italic toggle="yes">Brief Funct Genomics.</italic>
</source>
                    <year>2018</year>;<fpage>ely009</fpage>.
                    <pub-id pub-id-type="pmid">29579145</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bfgp/ely009</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-9">
                <label>9</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Gr&#x00fc;n</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kester</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>van Oudenaarden</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>Validation of noise models for single-cell transcriptomics.</article-title>
                    <source>

                        <italic toggle="yes">Nat Methods.</italic>
</source>
                    <year>2014</year>;<volume>11</volume>(<issue>6</issue>):<fpage>637</fpage>&#x2013;<lpage>640</lpage>.
                    <pub-id pub-id-type="pmid">24747814</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nmeth.2930</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-10">
                <label>10</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Bacher</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kendziorski</surname>
                            <given-names>C</given-names>
                        </name>
</person-group>:
                    <article-title>Design and computational analysis of single-cell RNA-sequencing experiments.</article-title>
                    <source>

                        <italic toggle="yes">Genome Biol.</italic>
</source>
                    <year>2016</year>;<volume>17</volume>(<issue>1</issue>):<fpage>63</fpage>.
                    <pub-id pub-id-type="pmid">27052890</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13059-016-0927-y</pub-id>
                    <pub-id pub-id-type="pmcid">4823857</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-11">
                <label>11</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Tung</surname>
                            <given-names>PY</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Blischak</surname>
                            <given-names>JD</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hsiao</surname>
                            <given-names>CJ</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Batch effects and the effective design of single-cell gene expression studies.</article-title>
                    <source>

                        <italic toggle="yes">Sci Rep.</italic>
</source>
                    <year>2017</year>;<volume>7</volume>: 39921.
                    <pub-id pub-id-type="pmid">28045081</pub-id>
                    <pub-id pub-id-type="doi">10.1038/srep39921</pub-id>
                    <pub-id pub-id-type="pmcid">5206706</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-12">
                <label>12</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hicks</surname>
                            <given-names>SC</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Townes</surname>
                            <given-names>FW</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Teng</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Missing data and technical variability in single-cell RNA-sequencing experiments.</article-title>
                    <source>

                        <italic toggle="yes">Biostatistics.</italic>
</source>
                    <year>2017</year>;<fpage>kxx053</fpage>.
                    <pub-id pub-id-type="pmid">29121214</pub-id>
                    <pub-id pub-id-type="doi">10.1093/biostatistics/kxx053</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-13">
                <label>13</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Aghaeepour</surname>
                            <given-names>N</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Finak</surname>
                            <given-names>G</given-names>
                        </name>
</person-group>,
                    <collab>FlowCAP Consortium</collab>:
                    <etal/>
                    <article-title>Critical assessment of automated flow cytometry data analysis techniques.</article-title>
                    <source>

                        <italic toggle="yes">Nat Methods.</italic>
</source>
                    <year>2013</year>;<volume>10</volume>(<issue>3</issue>):<fpage>228</fpage>&#x2013;<lpage>238</lpage>.
                    <pub-id pub-id-type="pmid">23396282</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nmeth.2365</pub-id>
                    <pub-id pub-id-type="pmcid">3906045</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-14">
                <label>14</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Weber</surname>
                            <given-names>LM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Robinson</surname>
                            <given-names>MD</given-names>
                        </name>
</person-group>:
                    <article-title>Comparison of clustering methods for high-dimensional single-cell flow and mass cytometry data.</article-title>
                    <source>

                        <italic toggle="yes">Cytometry A.</italic>
</source>
                    <year>2016</year>;<volume>89</volume>(<issue>12</issue>):<fpage>1084</fpage>&#x2013;<lpage>1096</lpage>.
                    <pub-id pub-id-type="pmid">27992111</pub-id>
                    <pub-id pub-id-type="doi">10.1002/cyto.a.23030</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-15">
                <label>15</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Menon</surname>
                            <given-names>V</given-names>
                        </name>
</person-group>:
                    <article-title>Clustering single cells: a review of approaches on high-and low-depth single-cell RNA-seq data.</article-title>
                    <source>

                        <italic toggle="yes">Brief Funct Genomics.</italic>
</source>
                    <year>2017</year>;<fpage>elx044</fpage>.
                    <pub-id pub-id-type="pmid">29236955</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bfgp/elx044</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-16">
                <label>16</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Satija</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Farrell</surname>
                            <given-names>JA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Gennert</surname>
                            <given-names>D</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Spatial reconstruction of single-cell gene expression data.</article-title>
                    <source>

                        <italic toggle="yes">Nat Biotechnol.</italic>
</source>
                    <year>2015</year>;<volume>33</volume>(<issue>5</issue>):<fpage>495</fpage>&#x2013;<lpage>502</lpage>.
                    <pub-id pub-id-type="pmid">25867923</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nbt.3192</pub-id>
                    <pub-id pub-id-type="pmcid">4430369</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-17">
                <label>17</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Langfelder</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Horvath</surname>
                            <given-names>S</given-names>
                        </name>
</person-group>:
                    <article-title>WGCNA: an R package for weighted correlation network analysis.</article-title>
                    <source>

                        <italic toggle="yes">BMC Bioinformatics.</italic>
</source>
                    <year>2008</year>;<volume>9</volume>:<fpage>559</fpage>.
                    <pub-id pub-id-type="pmid">19114008</pub-id>
                    <pub-id pub-id-type="doi">10.1186/1471-2105-9-559</pub-id>
                    <pub-id pub-id-type="pmcid">2631488</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-18">
                <label>18</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Zeisel</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Mu&#x00f1;oz-Manchado</surname>
                            <given-names>AB</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Codeluppi</surname>
                            <given-names>S</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Brain structure. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq.</article-title>
                    <source>

                        <italic toggle="yes">Science.</italic>
</source>
                    <year>2015</year>;<volume>347</volume>(<issue>6226</issue>):<fpage>1138</fpage>&#x2013;<lpage>1142</lpage>.
                    <pub-id pub-id-type="pmid">25700174</pub-id>
                    <pub-id pub-id-type="doi">10.1126/science.aaa1934</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-19">
                <label>19</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Freytag</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lonnstedt</surname>
                            <given-names>I</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ng</surname>
                            <given-names>M</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Cluster headache: Comparing clustering tools for 10X single cell sequencing data.</article-title>
                    <source>

                        <italic toggle="yes">bioRxiv.</italic>
</source>
                    <year>2017</year>.
                    <pub-id pub-id-type="doi">10.1101/203752</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-20">
                <label>20</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Andrews</surname>
                            <given-names>TS</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hemberg</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <article-title>Identifying cell populations with scRNASeq.</article-title>
                    <source>

                        <italic toggle="yes">Mol Aspects Med.</italic>
</source>
                    <year>2018</year>;<volume>59</volume>:<fpage>114</fpage>&#x2013;<lpage>122</lpage>.
                    <pub-id pub-id-type="pmid">28712804</pub-id>
                    <pub-id pub-id-type="doi">10.1016/j.mam.2017.07.002</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-21">
                <label>21</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Soneson</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Robinson</surname>
                            <given-names>MD</given-names>
                        </name>
</person-group>:
                    <article-title>Bias, robustness and scalability in single-cell differential expression analysis.</article-title>
                    <source>

                        <italic toggle="yes">Nat Methods.</italic>
</source>
                    <year>2018</year>;<volume>15</volume>(<issue>4</issue>):<fpage>255</fpage>&#x2013;<lpage>261</lpage>.
                    <pub-id pub-id-type="pmid">29481549</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nmeth.4612</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-22">
                <label>22</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kumar</surname>
                            <given-names>RM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Cahan</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Shalek</surname>
                            <given-names>AK</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Deconstructing transcriptional heterogeneity in pluripotent stem cells.</article-title>
                    <source>

                        <italic toggle="yes">Nature.</italic>
</source>
                    <year>2014</year>;<volume>516</volume>(<issue>7529</issue>):<fpage>56</fpage>&#x2013;<lpage>61</lpage>.
                    <pub-id pub-id-type="pmid">25471879</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nature13920</pub-id>
                    <pub-id pub-id-type="pmcid">4256722</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-23">
                <label>23</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Koh</surname>
                            <given-names>PW</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Sinha</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Barkal</surname>
                            <given-names>AA</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>An atlas of transcriptional, chromatin accessibility, and surface marker changes in human mesoderm development.</article-title>
                    <source>

                        <italic toggle="yes">Sci Data.</italic>
</source>
                    <year>2016</year>;<volume>3</volume>: 160109.
                    <pub-id pub-id-type="pmid">27996962</pub-id>
                    <pub-id pub-id-type="doi">10.1038/sdata.2016.109</pub-id>
                    <pub-id pub-id-type="pmcid">5170597</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-24">
                <label>24</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Trapnell</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Cacchiarelli</surname>
                            <given-names>D</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Grimsby</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells.</article-title>
                    <source>

                        <italic toggle="yes">Nat Biotechnol.</italic>
</source>
                    <year>2014</year>;<volume>32</volume>(<issue>4</issue>):<fpage>381</fpage>&#x2013;<lpage>386</lpage>.
                    <pub-id pub-id-type="pmid">24658644</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nbt.2859</pub-id>
                    <pub-id pub-id-type="pmcid">4122333</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-25">
                <label>25</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ramos</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Schiffer</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Re</surname>
                            <given-names>A</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Software for the integration of Multi-Omics experiments in Bioconductor.</article-title>
                    <source>

                        <italic toggle="yes">bioRxiv.</italic>
</source>
                    <year>2017</year>.
                    <pub-id pub-id-type="doi">10.1101/144774</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-26">
                <label>26</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Bray</surname>
                            <given-names>NL</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Pimentel</surname>
                            <given-names>H</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Melsted</surname>
                            <given-names>P</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Near-optimal probabilistic RNA-seq quantification.</article-title>
                    <source>

                        <italic toggle="yes">Nat Biotechnol.</italic>
</source>
                    <year>2016</year>;<volume>34</volume>:<fpage>525</fpage>&#x2013;<lpage>527</lpage>.
                    <pub-id pub-id-type="pmid">27043002</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nbt.3519</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-27">
                <label>27</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ntranos</surname>
                            <given-names>V</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kamath</surname>
                            <given-names>GM</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Zhang</surname>
                            <given-names>JM</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Fast and accurate single-cell RNA-Seq analysis by clustering of transcript-compatibility counts.</article-title>
                    <source>

                        <italic toggle="yes">Genome Biol.</italic>
</source>
                    <year>2016</year>;<volume>17</volume>(<issue>1</issue>):<fpage>112</fpage>.
                    <pub-id pub-id-type="pmid">27230763</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13059-016-0970-8</pub-id>
                    <pub-id pub-id-type="pmcid">4881296</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-28">
                <label>28</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Zappia</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Phipson</surname>
                            <given-names>B</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Oshlack</surname>
                            <given-names>A</given-names>
                        </name>
</person-group>:
                    <article-title>Splatter: simulation of single-cell RNA sequencing data.</article-title>
                    <source>

                        <italic toggle="yes">Genome Biol.</italic>
</source>
                    <year>2017</year>;<volume>18</volume>(<issue>1</issue>):<fpage>174</fpage>.
                    <pub-id pub-id-type="pmid">28899397</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13059-017-1305-0</pub-id>
                    <pub-id pub-id-type="pmcid">5596896</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-29">
                <label>29</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Soneson</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Robinson</surname>
                            <given-names>MD</given-names>
                        </name>
</person-group>:
                    <article-title>Towards unified quality verification of synthetic count data with 
                        <italic toggle="yes">countsim</italic>QC.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2018</year>;<volume>34</volume>(<issue>4</issue>):<fpage>691</fpage>&#x2013;<lpage>692</lpage>.
                    <pub-id pub-id-type="pmid">29028961</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btx631</pub-id>
                    <pub-id pub-id-type="pmcid">5860609</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-30">
                <label>30</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>McCarthy</surname>
                            <given-names>DJ</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Campbell</surname>
                            <given-names>KR</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lun</surname>
                            <given-names>AT</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R.</article-title>
                    <source>

                        <italic toggle="yes">Bioinformatics.</italic>
</source>
                    <year>2017</year>;<volume>33</volume>(<issue>8</issue>):<fpage>1179</fpage>&#x2013;<lpage>1186</lpage>.
                    <pub-id pub-id-type="pmid">28088763</pub-id>
                    <pub-id pub-id-type="doi">10.1093/bioinformatics/btw777</pub-id>
                    <pub-id pub-id-type="pmcid">5408845</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-31">
                <label>31</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Lun</surname>
                            <given-names>AT</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Bach</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Marioni</surname>
                            <given-names>JC</given-names>
                        </name>
</person-group>:
                    <article-title>Pooling across cells to normalize single-cell RNA sequencing data with many zero counts.</article-title>
                    <source>

                        <italic toggle="yes">Genome Biol.</italic>
</source>
                    <year>2016</year>;<volume>17</volume>(<issue>1</issue>):<fpage>75</fpage>.
                    <pub-id pub-id-type="pmid">27122128</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13059-016-0947-7</pub-id>
                    <pub-id pub-id-type="pmcid">4848819</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-32">
                <label>32</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Pearson</surname>
                            <given-names>K</given-names>
                        </name>
</person-group>:
                    <article-title>On lines and planes of closest fit to systems of points in space.</article-title>
                    <source>

                        <italic toggle="yes">Philos Mag.</italic>
</source>
                    <year>1901</year>;<volume>2</volume>:<fpage>559</fpage>&#x2013;<lpage>572</lpage>.
                    <pub-id pub-id-type="doi">10.1080/14786440109462720</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-33">
                <label>33</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>van der Maaten</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hinton</surname>
                            <given-names>G</given-names>
                        </name>
</person-group>:
                    <article-title>Visualizing data using t-SNE.</article-title>
                    <source>

                        <italic toggle="yes">J Mach Learn Res.</italic>
</source>
                    <year>2008</year>;<volume>9</volume>:<fpage>2579</fpage>&#x2013;<lpage>2605</lpage>.
                    <ext-link ext-link-type="uri" xlink:href="http://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-34">
                <label>34</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Andrews</surname>
                            <given-names>TS</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Hemberg</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <article-title>Dropout-based feature selection for scRNASeq.</article-title>
                    <source>

                        <italic toggle="yes">bioRxiv.</italic>
</source>
                    <year>2018</year>.
                    <pub-id pub-id-type="doi">10.1101/065094</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-35">
                <label>35</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Senabouth</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Lukowski</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Alquicira</surname>
                            <given-names>J</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>ascend: R package for analysis of single cell RNA-seq data.</article-title>
                    <source>

                        <italic toggle="yes">bioRxiv.</italic>
</source>
                    <year>2017</year>.
                    <pub-id pub-id-type="doi">10.1101/207704</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-36">
                <label>36</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Lin</surname>
                            <given-names>P</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Troup</surname>
                            <given-names>M</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ho</surname>
                            <given-names>JW</given-names>
                        </name>
</person-group>:
                    <article-title>CIDR: Ultrafast and accurate clustering through imputation for single-cell RNA-seq data.</article-title>
                    <source>

                        <italic toggle="yes">Genome Biol.</italic>
</source>
                    <year>2017</year>;<volume>18</volume>(<issue>1</issue>):<fpage>59</fpage>.
                    <pub-id pub-id-type="pmid">28351406</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s13059-017-1188-0</pub-id>
                    <pub-id pub-id-type="pmcid">5371246</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-37">
                <label>37</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Van Gassen</surname>
                            <given-names>S</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Callebaut</surname>
                            <given-names>B</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Van Helden</surname>
                            <given-names>MJ</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>Flowsom: Using self-organizing maps for visualization and interpretation of cytometry data.</article-title>
                    <source>

                        <italic toggle="yes">Cytometry A.</italic>
</source>
                    <year>2015</year>;<volume>87</volume>(<issue>7</issue>):<fpage>636</fpage>&#x2013;<lpage>645</lpage>.
                    <pub-id pub-id-type="pmid">25573116</pub-id>
                    <pub-id pub-id-type="doi">10.1002/cyto.a.22625</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-38">
                <label>38</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ward</surname>
                            <given-names>JH</given-names>
                            <suffix>Jr</suffix>
                        </name>
</person-group>:
                    <article-title>Hierarchical grouping to optimize an objective function.</article-title>
                    <source>

                        <italic toggle="yes">J Am Stat Assoc.</italic>
</source>
                    <year>1963</year>;<volume>58</volume>(<issue>301</issue>):<fpage>236</fpage>&#x2013;<lpage>244</lpage>.
                    <pub-id pub-id-type="doi">10.1080/01621459.1963.10500845</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-39">
                <label>39</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hartigan</surname>
                            <given-names>JA</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wong</surname>
                            <given-names>MA</given-names>
                        </name>
</person-group>:
                    <article-title>Algorithm as-136: A k-means clustering algorithm.</article-title>
                    <source>

                        <italic toggle="yes">J R Stat Soc Ser C Appl Stat.</italic>
</source>
                    <year>1979</year>;<volume>28</volume>(<issue>1</issue>):<fpage>100</fpage>&#x2013;<lpage>108</lpage>.
                    <pub-id pub-id-type="doi">10.2307/2346830</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-40">
                <label>40</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>&#x017d;urauskien&#x0117;</surname>
                            <given-names>J</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Yau</surname>
                            <given-names>C</given-names>
                        </name>
</person-group>:
                    <article-title>pcaReduce: hierarchical clustering of single cell transcriptional profiles.</article-title>
                    <source>

                        <italic toggle="yes">BMC Bioinformatics.</italic>
</source>
                    <year>2016</year>;<volume>17</volume>(<issue>1</issue>):<fpage>140</fpage>.
                    <pub-id pub-id-type="pmid">27005807</pub-id>
                    <pub-id pub-id-type="doi">10.1186/s12859-016-0984-y</pub-id>
                    <pub-id pub-id-type="pmcid">4802652</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-41">
                <label>41</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Van Der Maaten</surname>
                            <given-names>L</given-names>
                        </name>
</person-group>:
                    <article-title>Accelerating t-SNE using tree-based algorithms.</article-title>
                    <source>

                        <italic toggle="yes"> J Mach Learn Res.</italic>
</source>
                    <year>2014</year>;<volume>15</volume>:<fpage>1</fpage>&#x2013;<lpage>21</lpage>.
                    <ext-link ext-link-type="uri" xlink:href="http://lvdmaaten.github.io/publications/papers/JMLR_2014.pdf">Reference Source</ext-link>
                </mixed-citation>
            </ref>
            <ref id="ref-42">
                <label>42</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Yang</surname>
                            <given-names>Y</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Huh</surname>
                            <given-names>R</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Culpepper</surname>
                            <given-names>HW</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>SAFE-clustering: Single-cell aggregated (from ensemble) clustering for single-cell RNA-seq data.</article-title>
                    <source>

                        <italic toggle="yes">bioRxiv.</italic>
</source>
                    <year>2017</year>.
                    <pub-id pub-id-type="doi">10.1101/215723</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-43">
                <label>43</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kiselev</surname>
                            <given-names>VY</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Kirschner</surname>
                            <given-names>K</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Schaub</surname>
                            <given-names>MT</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>SC3: consensus clustering of single-cell RNA-seq data.</article-title>
                    <source>

                        <italic toggle="yes">Nat Methods.</italic>
</source>
                    <year>2017</year>;<volume>14</volume>(<issue>5</issue>):<fpage>483</fpage>&#x2013;<lpage>486</lpage>.
                    <pub-id pub-id-type="pmid">28346451</pub-id>
                    <pub-id pub-id-type="doi">10.1038/nmeth.4236</pub-id>
                    <pub-id pub-id-type="pmcid">5410170</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-44">
                <label>44</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Cortes</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Vapnik</surname>
                            <given-names>V</given-names>
                        </name>
</person-group>:
                    <article-title>Support-vector networks.</article-title>
                    <source>

                        <italic toggle="yes">Mach Learn.</italic>
</source>
                    <year>1995</year>;<volume>20</volume>(<issue>3</issue>):<fpage>273</fpage>&#x2013;<lpage>297</lpage>.
                    <pub-id pub-id-type="doi">10.1023/A:1022627411411</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-45">
                <label>45</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Ji</surname>
                            <given-names>Z</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Ji</surname>
                            <given-names>H</given-names>
                        </name>
</person-group>:
                    <article-title>TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis.</article-title>
                    <source>

                        <italic toggle="yes">Nucleic Acids Res.</italic>
</source>
                    <year>2016</year>;<volume>44</volume>(<issue>13</issue>):<fpage>e117</fpage>.
                    <pub-id pub-id-type="pmid">27179027</pub-id>
                    <pub-id pub-id-type="doi">10.1093/nar/gkw430</pub-id>
                    <pub-id pub-id-type="pmcid">4994863</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-46">
                <label>46</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hubert</surname>
                            <given-names>L</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Arabie</surname>
                            <given-names>P</given-names>
                        </name>
</person-group>:
                    <article-title>Comparing partitions.</article-title>
                    <source>

                        <italic toggle="yes">J Classif.</italic>
</source>
                    <year>1985</year>;<volume>2</volume>(<issue>1</issue>):<fpage>193</fpage>&#x2013;<lpage>218</lpage>.
                    <pub-id pub-id-type="doi">10.1007/BF01908075</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-47">
                <label>47</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Shannon</surname>
                            <given-names>CE</given-names>
                        </name>
</person-group>:
                    <article-title>A mathematical theory of communication.</article-title>
                    <source>

                        <italic toggle="yes">Bell Syst Tech J.</italic>
</source>
                    <year>1948</year>;<volume>27</volume>(<issue>3</issue>):<fpage>379</fpage>&#x2013;<lpage>423</lpage>.
                    <pub-id pub-id-type="doi">10.1002/j.1538-7305.1948.tb01338.x</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-48">
                <label>48</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Hornik</surname>
                            <given-names>K</given-names>
                        </name>
</person-group>:
                    <article-title>A CLUE for CLUster Ensembles.</article-title>
                    <source>

                        <italic toggle="yes">J Stat Softw.</italic>
</source>
                    <year>2005</year>;<volume>14</volume>(<issue>12</issue>):<fpage>1</fpage>&#x2013;<lpage>25</lpage>.
                    <pub-id pub-id-type="doi">10.18637/jss.v014.i12</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-49">
                <label>49</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Kruskal</surname>
                            <given-names>WH</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Wallis</surname>
                            <given-names>WA</given-names>
                        </name>
</person-group>:
                    <article-title>Use of ranks in one-criterion variance analysis.</article-title>
                    <source>

                        <italic toggle="yes">J Am Stat Assoc.</italic>
</source>
                    <year>1952</year>;<volume>47</volume>(<issue>260</issue>):<fpage>583</fpage>&#x2013;<lpage>621</lpage>.
                    <pub-id pub-id-type="doi">10.2307/2280779</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-50">
                <label>50</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Von Luxburg</surname>
                            <given-names>U</given-names>
                        </name>
</person-group>:
                    <article-title>Clustering stability: an overview.</article-title>
                    <source>

                        <italic toggle="yes">Foundations and Trends in Machine Learning.</italic>
</source>
                    <year>2010</year>;<volume>2</volume>(<issue>3</issue>):<fpage>235</fpage>&#x2013;<lpage>274</lpage>.
                    <pub-id pub-id-type="doi">10.1561/2200000008</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-51">
                <label>51</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Severson</surname>
                            <given-names>DT</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Owen</surname>
                            <given-names>RP</given-names>
                        </name>

                        <name name-style="western">
                            <surname>White</surname>
                            <given-names>MJ</given-names>
                        </name>

                        <etal/>
</person-group>:
                    <article-title>BEARscc determines robustness of single-cell clusters using simulated technical replicates.</article-title>
                    <source>

                        <italic toggle="yes">Nat Commun.</italic>
</source>
                    <year>2018</year>;<volume>9</volume>(<issue>1</issue>): 1187.
                    <pub-id pub-id-type="pmid">29567991</pub-id>
                    <pub-id pub-id-type="doi">10.1038/s41467-018-03608-y</pub-id>
                    <pub-id pub-id-type="pmcid">5864873</pub-id>
                </mixed-citation>
            </ref>
            <ref id="ref-52">
                <label>52</label>
                <mixed-citation publication-type="journal">
                    <person-group person-group-type="author">

                        <name name-style="western">
                            <surname>Du&#x00f2;</surname>
                            <given-names>A</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Soneson</surname>
                            <given-names>C</given-names>
                        </name>

                        <name name-style="western">
                            <surname>Robinson</surname>
                            <given-names>M</given-names>
                        </name>
</person-group>:
                    <article-title>markrobinsonuzh/scRNAseq_clustering_comparison: F1000 v1 (Version 0.9).</article-title>
                    <source>

                        <italic toggle="yes">Zenodo.</italic>
</source>
                    <year>2018</year>.
                    <ext-link ext-link-type="uri" xlink:href="http://www.doi.org/10.5281/zenodo.1314743">http://www.doi.org/10.5281/zenodo.1314743</ext-link>
                </mixed-citation>
            </ref>
        </ref-list>
    </back>
    <sub-article article-type="reviewer-report" id="report36545">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.17093.r36545</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Freytag</surname>
                        <given-names>Saskia</given-names>
                    </name>
                    <xref ref-type="aff" rid="r36545a1">1</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-2185-7068</uri>
                </contrib>
                <aff id="r36545a1">
                    <label>1</label>Department of Medical Biology&#x00a0;, University of Melbourne, Parkville, Vic, Australia</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>3</day>
                <month>8</month>
                <year>2018</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2018 Freytag S</copyright-statement>
                <copyright-year>2018</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport36545" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.15666.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>Overview</p>
            <p> </p>
            <p> The authors present comprehensive benchmarking of clustering tools in R on real and simulated single-cell RNA-seq datasets. Their work includes performance, stability and run time analysis. Furthermore, they also investigate whether combining results from different methods increases performance.</p>
            <p> </p>
            <p> </p>
            <p> Major comments</p>
            <p> &#x00a0; 
                <list list-type="bullet">
                    <list-item>
                        <p>Throughout the entire manuscript the authors should make it clear that only clustering tools available in R were investigated. This is important, as there are quite a number of popular python applications for clustering of single cell RNA-seq data available.</p>
                    </list-item>
                    <list-item>
                        <p>Like Jean Fan, I am concerned about the appropriateness of the Trapnell et al. dataset and the Zheng et al. 10x datasets. Furthermore for the Zheng et al. dataset, I would like to know why the authors did not use all 10 pre-sorted cell populations available? Furthermore, how did the authors choose which cell populations to combine for their Zhengmix4 and Zhengmix8 datasets?</p>
                    </list-item>
                </list> </p>
            <p> </p>
            <p> Minor comments 
                <list list-type="bullet">
                    <list-item>
                        <p>The authors show nicely that Seurat is not very strongly affected by gene filtering. Could this be a result of its clustering approach being based on the 500 most variable genes?</p>
                    </list-item>
                    <list-item>
                        <p>On page 7 in the paragraph &#x201c;Run Times vary widely between methods&#x201d; the authors use Adjusted Rand Index instead of its already introduced abbreviation&#x00a0;</p>
                    </list-item>
                    <list-item>
                        <p>Could the size of Figure 5 be increased?</p>
                    </list-item>
                    <list-item>
                        <p>Why did some methods get raw and some methods log-transformed normalized counts?</p>
                    </list-item>
                    <list-item>
                        <p>Consider changing Supplementary Figure 2 to a visual representation that represents size differences between sets, like UpSetR plots.</p>
                    </list-item>
                    <list-item>
                        <p>On page 10 the authors say: &#x201d;In addition, no apparent association between the similarity of the clusterings and the type of input or dimension reduction or underlying type of clustering algorithm was found.&#x201d; Could the authors explain in more detail how this analysis was performed.</p>
                    </list-item>
                    <list-item>
                        <p>On page 6, the authors speculate that there are stronger signals that dominate clustering in the Trapnell et al dataset that are not time points. What could these be? Have the authors investigated cell cycle?</p>
                    </list-item>
                </list>
            </p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Yes</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>Yes</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Partly</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Yes</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>NA</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <sub-article article-type="response" id="comment3937-36545">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Soneson</surname>
                            <given-names>Charlotte</given-names>
                        </name>
                        <aff>Friedrich Miescher Institute for Biomedical Research, Switzerland</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>31</day>
                    <month>8</month>
                    <year>2018</year>
                </pub-date>
            </front-stub>
            <body>
                <p>
                    <italic>Thank you for reviewing our manuscript and for your constructive comments. Below are point-by-point responses to the individual comments.</italic>
                </p>
                <p> </p>
                <p> Throughout the entire manuscript the authors should make it clear that only clustering tools available in R were investigated. This is important, as there are quite a number of popular python applications for clustering of single cell RNA-seq data available.</p>
                <p> </p>
                <p> 
                    <italic>This has been clarified in the Abstract as well as in the Methods part of the text. Some of the most widely used clustering methods implemented in Python (e.g., scanpy) implement the same or similar clustering methods as those evaluated in this study, and could thus be considered to be implicitly investigated. Also, the evaluation system we provide (via the code in the GitHub repository and the associated data package) is not strictly limited to methods implemented in R; other methods can be included e.g. using system() calls. </italic>
                </p>
                <p> </p>
                <p> Like Jean Fan, I am concerned about the appropriateness of the Trapnell et al. dataset and the Zheng et al. 10x datasets. Furthermore for the Zheng et al. dataset, I would like to know why the authors did not use all 10 pre-sorted cell populations available? Furthermore, how did the authors choose which cell populations to combine for their Zhengmix4 and Zhengmix8 datasets?</p>
                <p> </p>
                <p> 
                    <italic>We agree that the Trapnell data set was not generated with the purpose of finding cell types - however, we still find it useful to illustrate the performance of the methods in a data set where the &#x201c;true clusters&#x201d; (defined as the time point at which the cells where collected) do not represent the main/strongest signal in the data (see e.g. the t-SNE plots in Supplementary Figure 1). We have clarified this in the &#x201c;Methods-Real data sets&#x201d; section of the revised paper.</italic>
                </p>
                <p>
                    <italic> </italic>
                </p>
                <p>
                    <italic> For the Zhengmix data sets, our aim was to generate data sets with a mix of well-separated (e.g., B-cells vs T-cells) &#x00a0;and similar cell types (e.g., different types of T-cells). In addition, we wanted to investigate if the number of cell populations and/or the equality of the population sizes had an impact on the performance. The included cell type combinations were selected to allow us to address these questions; however, given the richness of this data set, there are certainly many more possible combinations to explore. We have expanded the description in the &#x201c;Methods-Real data sets&#x201d; section a bit to highlight these goals. </italic>
                </p>
                <p> </p>
                <p> The authors show nicely that Seurat is not very strongly affected by gene filtering. Could this be a result of its clustering approach being based on the 500 most variable genes?</p>
                <p> </p>
                <p> 
                    <italic>In all our investigations, we preselect the genes that are used as input for each clustering algorithm using three different variable selection methods, and internal variable selection or filtering steps are disabled. Specifically, for Seurat we perform the PCA using all the genes remaining after our filtering, and the clustering is then performed in the principal component space. Thus, the stability of Seurat should be affected in the same way as that of the other methods by the selection of variables. </italic>
                </p>
                <p> </p>
                <p> On page 7 in the paragraph &#x201c;Run Times vary widely between methods&#x201d; the authors use Adjusted Rand Index instead of its already introduced abbreviation</p>
                <p> </p>
                <p> 
                    <italic>Thanks for noticing this, we now use the abbreviation also here.</italic>
                </p>
                <p> </p>
                <p> Could the size of Figure 5 be increased?</p>
                <p> </p>
                <p> 
                    <italic>We have increased the size of Figure 5B.</italic>
                </p>
                <p> </p>
                <p> Why did some methods get raw and some methods log-transformed normalized counts?</p>
                <p> </p>
                <p> 
                    <italic>The methods are based on different distributional assumptions and underlying models, affecting the type of values that are most suitably used as input. We followed the recommendations of the authors of the respective methods, and the type of input used for each method is summarized in Figure 4.</italic>
                </p>
                <p> </p>
                <p> Consider changing Supplementary Figure 2 to a visual representation that represents size differences between sets, like UpSetR plots.</p>
                <p> </p>
                <p> 
                    <italic>We have replaced the Venn diagrams in Supplementary Figure 2 with UpSet plots. </italic>
                </p>
                <p> </p>
                <p> On page 10 the authors say: &#x201d;In addition, no apparent association between the similarity of the clusterings and the type of input or dimension reduction or underlying type of clustering algorithm was found.&#x201d; Could the authors explain in more detail how this analysis was performed.</p>
                <p> </p>
                <p> 
                    <italic>This conclusion is drawn based on Figure 4, where no association between the clustering of methods by cluster similarity and any of the method characteristics can be seen. This has been clarified in the &#x201c;Results-Inconsistent degree of similarity between methods&#x201d; section of the revised paper. </italic>
                </p>
                <p> </p>
                <p> On page 6, the authors speculate that there are stronger signals that dominate clustering in the Trapnell et al dataset that are not time points. What could these be? Have the authors investigated cell cycle?</p>
                <p> </p>
                <p> 
                    <italic>We have not explicitly investigated the interpretation of the strongest signal in the Trapnell data set. However, Supplementary Figure 1 suggests that the annotation that we used to define the &#x201c;true&#x201d; clusters (the time at which the cells were collected) does not fully explain the grouping of the cells in the t-SNE visualization (in particular, the T12 and T24 groups are intermingled). As noted above, the main purpose of including this data set was to investigate the behaviour of the various methods in a data set where the clusters were less apparent.</italic>
                </p>
            </body>
        </sub-article>
    </sub-article>
    <sub-article article-type="reviewer-report" id="report36544">
        <front-stub>
            <article-id pub-id-type="doi">10.5256/f1000research.17093.r36544</article-id>
            <title-group>
                <article-title>Reviewer response for version 1</article-title>
            </title-group>
            <contrib-group>
                <contrib contrib-type="author">
                    <name>
                        <surname>Fan</surname>
                        <given-names>Jean</given-names>
                    </name>
                    <xref ref-type="aff" rid="r36544a1">1</xref>
                    <xref ref-type="aff" rid="r36544a2">2</xref>
                    <role>Referee</role>
                    <uri content-type="orcid">https://orcid.org/0000-0002-0212-5451</uri>
                </contrib>
                <aff id="r36544a1">
                    <label>1</label>Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA, USA</aff>
                <aff id="r36544a2">
                    <label>2</label>Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA</aff>
            </contrib-group>
            <author-notes>
                <fn fn-type="conflict">
                    <p>
                        <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                </fn>
            </author-notes>
            <pub-date pub-type="epub">
                <day>27</day>
                <month>7</month>
                <year>2018</year>
            </pub-date>
            <permissions>
                <copyright-statement>Copyright: &#x00a9; 2018 Fan J</copyright-statement>
                <copyright-year>2018</copyright-year>
                <license xlink:href="https://creativecommons.org/licenses/by/4.0/">
                    <license-p>This is an open access peer review report distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
                </license>
            </permissions>
            <related-article ext-link-type="doi" id="relatedArticleReport36544" related-article-type="peer-reviewed-article" xlink:href="10.12688/f1000research.15666.1"/>
            <custom-meta-group>
                <custom-meta>
                    <meta-name>recommendation</meta-name>
                    <meta-value>approve-with-reservations</meta-value>
                </custom-meta>
            </custom-meta-group>
        </front-stub>
        <body>
            <p>Overview</p>
            <p> </p>
            <p> Duo et al compare multiple single-cell RNA-seq clustering approaches on real and simulated single-cell RNA-seq datasets.&#x00a0;</p>
            <p> </p>
            <p> </p>
            <p> Major comments</p>
            <p> </p>
            <p> - Quite a number of single-cell RNA-seq datasets are available for benchmarking but only a few were explored here. While an exhaustive interrogation of all single-cell RNA-seq datasets available is beyond the scope of this paper, it would be worthwhile for the readers if the authors could comment briefly on the appropriateness of the datasets used here in terms of their cell-type diversity or other factors that may impact benchmarking. As the authors note, a method's performance is inherently tied to the degree to which the tested subpopulations are truly &#x00a0;(or artificially) transcriptionally distinct. In particular, I am concerned about the appropriateness of the Trapnell dataset, as it was originally intended for pseutotime/trajectory inference and may not even contain discrete transcriptional subpopulations. The poor performance as noted in Figure 1 for this dataset may simply arise from different methods cutting along this continuous trajectory in different ways. Similarly, for the Zheng 10x datasets, since each cell-type was sorted and sequenced separately, there is inevitably some degree of confounding of cell-type specific effects with batch effects that could make clustering much easier.&#x00a0;</p>
            <p> </p>
            <p> - As datasets get bigger, the scalability of each method will be an important consideration. The authors provide a preliminary look into this via the different run time of each method in Figure 2, but how this run time depends on the number of cells is unclear. Readers will be interested in whether some methods scale better than others. It is worth having an additional figure of run time as a function of number of cells (via downsampling cells and then extrapolating to larger datasets) to fully capture the scalability of each method.&#x00a0;</p>
            <p> </p>
            <p> - With regard to the stability between cluster runs, some methods may internally set various random seeds to ensure reproducibility. Please double check that the stability observed in Figure 3 is not simply the result of which methods uses random seeds. If a method does use an (or likely multiple) internal random seed, the seed must be changed to accurately assess stability.&#x00a0;</p>
            <p> </p>
            <p> </p>
            <p> Minor comments</p>
            <p> </p>
            <p> - There are quite a number of single-cell RNA-seq clustering approaches and the list keeps growing (https://github.com/seandavi/awesome-single-cell). Only a fraction is represented in this comparison. While an exhaustive comparison of all methods is beyond the scope of this paper, the authors should comment briefly on how these particular 12 clustering algorithms were chosen.</p>
            <p> </p>
            <p> - While nearly all methods assessed use dimensionality reduction as a first step, it is unclear why some were allowed to reduce to 30 dimensions while others 50. It seems that particularly as datasets get larger with presumably more cell-types captured in each datasets, we will likely want to increase the number of PCs to fully capture the variation present in the data. While the authors have left the investigation into the effects of the number of PCs to future research, they should briefly note the reason for the choice of PCs used for each method.</p>
            <p>Is the work clearly and accurately presented and does it cite the current literature?</p>
            <p>Partly</p>
            <p>If applicable, is the statistical analysis and its interpretation appropriate?</p>
            <p>Partly</p>
            <p>Are all the source data underlying the results available to ensure full reproducibility?</p>
            <p>Yes</p>
            <p>Is the study design appropriate and is the work technically sound?</p>
            <p>Partly</p>
            <p>Are the conclusions drawn adequately supported by the results?</p>
            <p>Partly</p>
            <p>Are sufficient details of methods and analysis provided to allow replication by others?</p>
            <p>Yes</p>
            <p>Reviewer Expertise:</p>
            <p>NA</p>
            <p>I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.</p>
        </body>
        <sub-article article-type="response" id="comment3938-36544">
            <front-stub>
                <contrib-group>
                    <contrib contrib-type="author">
                        <name>
                            <surname>Soneson</surname>
                            <given-names>Charlotte</given-names>
                        </name>
                        <aff>Friedrich Miescher Institute for Biomedical Research, Switzerland</aff>
                    </contrib>
                </contrib-group>
                <author-notes>
                    <fn fn-type="conflict">
                        <p>
                            <bold>Competing interests: </bold>No competing interests were disclosed.</p>
                    </fn>
                </author-notes>
                <pub-date pub-type="epub">
                    <day>31</day>
                    <month>8</month>
                    <year>2018</year>
                </pub-date>
            </front-stub>
            <body>
                <p>
                    <italic>Thank you for reviewing our manuscript and for your constructive comments. Below are point-by-point responses to the individual comments.</italic>
                </p>
                <p> </p>
                <p> Quite a number of single-cell RNA-seq datasets are available for benchmarking but only a few were explored here. While an exhaustive interrogation of all single-cell RNA-seq datasets available is beyond the scope of this paper, it would be worthwhile for the readers if the authors could comment briefly on the appropriateness of the datasets used here in terms of their cell-type diversity or other factors that may impact benchmarking. As the authors note, a method's performance is inherently tied to the degree to which the tested subpopulations are truly &#x00a0;(or artificially) transcriptionally distinct. In particular, I am concerned about the appropriateness of the Trapnell dataset, as it was originally intended for pseutotime/trajectory inference and may not even contain discrete transcriptional subpopulations. The poor performance as noted in Figure 1 for this dataset may simply arise from different methods cutting along this continuous trajectory in different ways. Similarly, for the Zheng 10x datasets, since each cell-type was sorted and sequenced separately, there is inevitably some degree of confounding of cell-type specific effects with batch effects that could make clustering much easier.</p>
                <p> </p>
                <p> 
                    <italic>There is indeed a large (and increasing) number of public scRNA-seq data sets available, generated with many different types of protocols. However, the main issue (especially with droplet-based data sets) is that no independent annotation of the cells is available, which implies that they are not suitable for unbiased benchmarking like we are doing here. Many public droplet-based data sets do contain &#x201c;cell type labels&#x201d;, but these are typically inferred by clustering the cells based on the scRNA-seq data itself, and thus any evaluation risks being biased in favor of methods similar to the one used to derive the labels in the first place. This is the main reason behind the selection of these data sets. We agree that the Trapnell data set was not generated with the purpose of finding cell types - however, we still find it useful to illustrate the performance of the methods in a data set where the &#x201c;true clusters&#x201d; (defined as the time point at which the cells where collected) do not represent the main/strongest signal in the data (see e.g. the t-SNE plots in Supplementary Figure 1). For the Zheng data set, it&#x2019;s true that there could be confounding with batch effects, and ambiguous cells may be excluded, which would also make clusters more distinct. For our Zhengmix data sets, we therefore included both very different (e.g., B-cells and T-cells) and more similar (e.g., different types of T-cells) cell types (Supplementary Figure 1). We have expanded the discussion in the &#x201c;Methods-Real data sets&#x201d; section of the revised paper to clarify these issues. </italic>
                </p>
                <p> </p>
                <p> As datasets get bigger, the scalability of each method will be an important consideration. The authors provide a preliminary look into this via the different run time of each method in Figure 2, but how this run time depends on the number of cells is unclear. Readers will be interested in whether some methods scale better than others. It is worth having an additional figure of run time as a function of number of cells (via downsampling cells and then extrapolating to larger datasets) to fully capture the scalability of each method.</p>
                <p> </p>
                <p> 
                    <italic>Thanks for pointing this out. We have included a plot illustrating the scalability, investigated by downsampling of the largest data set, in Supplementary Figure 9. </italic>
                </p>
                <p> </p>
                <p> With regard to the stability between cluster runs, some methods may internally set various random seeds to ensure reproducibility. Please double check that the stability observed in Figure 3 is not simply the result of which methods uses random seeds. If a method does use an (or likely multiple) internal random seed, the seed must be changed to accurately assess stability.</p>
                <p> </p>
                <p> 
                    <italic>Two of the methods (TSCAN and monocle) set random seeds internally and do not allow these to be changed by the user. Other methods (SC3, Seurat and RaceID2) set a random seed but let the user specify it. For these methods, we explicitly set the random seed to different values in the five runs. We have clarified this in the &#x201c;Results-High stability between clustering runs&#x201d; section of the revised text. </italic>
                </p>
                <p> </p>
                <p> There are quite a number of single-cell RNA-seq clustering approaches and the list keeps growing (https://github.com/seandavi/awesome-single-cell). Only a fraction is represented in this comparison. While an exhaustive comparison of all methods is beyond the scope of this paper, the authors should comment briefly on how these particular 12 clustering algorithms were chosen.</p>
                <p> </p>
                <p> 
                    <italic>The methods were chosen to represent the most common types of algorithms used for clustering of scRNA-seq data. We have tried to include the most widely used methods, but also to include methods from tangential fields as well as more traditional clustering methods to serve as a baseline. We have clarified this in the text.</italic>
                </p>
                <p> </p>
                <p> While nearly all methods assessed use dimensionality reduction as a first step, it is unclear why some were allowed to reduce to 30 dimensions while others 50. It seems that particularly as datasets get larger with presumably more cell-types captured in each datasets, we will likely want to increase the number of PCs to fully capture the variation present in the data. While the authors have left the investigation into the effects of the number of PCs to future research, they should briefly note the reason for the choice of PCs used for each method.</p>
                <p> </p>
                <p> 
                    <italic>We extracted 50 principal components for the methods that performed an additional dimension reduction (by t-SNE), and 30 principal components for methods where the clustering was done in the principal component space. The only exception was FlowSOM; this was unintentional and has been harmonized in the revised version to use the same number of PCs as the rest of the methods. </italic>
                </p>
            </body>
        </sub-article>
    </sub-article>
</article>
