An unsupervised learning method for reconstructing cell spatial organization with&nbsp;application to the DREAM Single Cell Transcriptomics Challenge

Yang Chen; Disheng Mao; Yuping Zhang; Zhengqing Ouyang

doi:10.12688/f1000research.20446.1

Home Browse An unsupervised learning method for reconstructing cell spatial organization...

ALL Metrics

Views

Downloads

Get PDF

Get XML

Export

▬

✚

Method Article

An unsupervised learning method for reconstructing cell spatial organization with application to the DREAM Single Cell Transcriptomics Challenge

[version 1; peer review: 2 not approved]

Yang Chen¹, Disheng Mao², Yuping Zhang², Zhengqing Ouyang ¹

PUBLISHED 19 Feb 2020

Author details Author details

¹ Department of Biostatistics and Epidemiology, School of Public Health and Health Sciences, University of Massachusetts, Amherst, MA, USA
² Department of Statistics, University of Connecticut, Storrs, CT, USA

Yang Chen
Roles: Data Curation, Formal Analysis, Investigation, Methodology, Software, Validation, Visualization, Writing – Original Draft Preparation

Disheng Mao
Roles: Investigation, Methodology

Yuping Zhang
Roles: Funding Acquisition, Investigation, Methodology, Resources, Supervision, Writing – Review & Editing

Zhengqing Ouyang
Roles: Conceptualization, Formal Analysis, Funding Acquisition, Investigation, Methodology, Project Administration, Resources, Supervision, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

Abstract

Single cell RNA sequencing (scRNA-seq) data analysis is important for building a global transcription landscape of all cell types in tissues, tracing cell lineages, and reconstructing cell spatial organizations. In this article, we propose an unsupervised learning method to predict spatial positions and gene expression of individual cells in Drosophila embryos using a small number of driver genes. Specifically, we develop a two-stage clustering approach, and compute a probability matrix of the spatial positions of single cells. This method is applied to dataset in the DREAM Single Cell Transcriptomics Challenge. The comparison with the “gold standard” suggests that our method is effective in reconstructing the cell positions and gene expression patterns in spatial tissues.

Keywords

Spatial organization, single cell RNA-seq, Drosophila embryo, clustering, DREAM challenge

Corresponding author: Zhengqing Ouyang

Competing interests: No competing interests were disclosed.

Grant information: This work was partially supported by the Faculty Research Excellence Program Award at UConn (to YZ).
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright: © 2020 Chen Y et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Chen Y, Mao D, Zhang Y and Ouyang Z. An unsupervised learning method for reconstructing cell spatial organization with application to the DREAM Single Cell Transcriptomics Challenge [version 1; peer review: 2 not approved]. F1000Research 2020, 9:124 (https://doi.org/10.12688/f1000research.20446.1) First published: 19 Feb 2020, 9:124 (https://doi.org/10.12688/f1000research.20446.1) Latest published: 07 Jun 2021, 9:124 (https://doi.org/10.12688/f1000research.20446.2)

Introduction

The development of single cell RNA sequencing (scRNA-seq) has provided a powerful solution for building a global transcription landscape of all cell types in tissues, finding new cell types, cell lineage tracing, spatial reconstruction, and combining with other omics^1–4. The single cell is originally made from dissociated tissues without spatial information, and spatial gene expression pattern is unknown. In situ hybridization (ISH) and its variants can detect the spatial information of mRNA transcripts and produce gene expression reference atlas. Using enough marker genes, users can reconstruct the spatial position of single cell RNA-seq data by combing the ISH reference atlas^3–7. Some works have also combined sequential fluorescence in situ hybridization (seqFISH) and multiplexed error robust fluorescence in situ hybridization (MERFISH) with scRNA-seq data to map cell types to the reference atlas^8–10.

Recent methods have successfully mapped cells from scRNA-seq data to the spatial positions using dozens of landmark genes^3–7. Nikos et al. developed a DistMap method for mapping the ~1300 Drosophila embryo cells into ~3000 bins in the spatial position using 84 marker genes³. Satija⁴ et al. mapped 851 cells of zebrafish embryo into 64 bins in spatial embryo using 47 genes. Kaia et al. computed the correspondence score and mapped 139 cells into a Platynereis dumerilii brain using a set of 98 genes⁵. Andreas et al. reconstructed spatial single enterocytes along the villus axis in 1-D space using 50 bottom and top landmark genes for 1383 cells⁶. Mor et al. proposed novoSpaRc for spatial mapping of the scRNA-seq cells into an existing reference atlas to infer spatial gene expression⁷. In these methods, the dimension and resolution of the spatial region, as well as the number of marker genes, are key factors to affect the recovery of the spatial position.

The DREAM Single Cell Transcriptomics Challenge aims to develop new algorithms to find embryo spatial pattern. Participating teams are asked to predict the positions in the embryo of 1297 cells using the expression pattern from 60 (sub challenge 1), 40 (sub challenge 2), and 20 (sub challenge 3) driver genes from in situ hybridization data. The challenge is different from the published methods as it endeavors to use less marker genes to infer the spatial locations of cells.

In this article, we introduce an unsupervised learning approach for the three challenges, and validate the results using the “gold standard” method derived from DistMap which uses 84 genes. The paper is organized as follows: Methods, we briefly describe the solutions for all three sub-challenges; Results, we present results of three sub-challenges on the data of the DREAM Single Cell Transcriptomics Challenge; finally, we discuss our results and summarize our work.

Methods

Dataset

The dataset is from Drosophila embryos. The scRNA-seq dataset is from ~1000 handpicked stage 6 fly embryos using Drop-seq¹¹. It contains both raw and normalized UMI counts with 1297 cells and 8924 genes per cell. A total of 84 driver genes are used. In situ hybridization expression patterns are from the Berkeley Drosophila Transcription Network Project (BDNTP). The BDNTP reference atlas are binarized. The bin number of one half of the embryo is 3039. The spatial coordinates of these bins are also specified. The dataset files can be downloaded from the DREAM Single Cell Transcriptomics Challenge after registration with Synapse free of charge (https://www.synapse.org/#!Synapse:syn16782360).

We directly use the normalized scRNA-seq data, the in situ matrix and the geometry of the embryo. The gene names “E.spl.m5.HLH” and “Blimp.1” are replaced by “E(spl)m5-HLH” and “Blimp-1”.

Gene selection

We use a hierarchical clustering method to select 60, 40, and 20 driver genes from the 84 genes based on the normalized scRNA-seq data.

Based on the belief that the scRNA-seq gene expression pattern is similar to the driver genes’ pattern, we propose to select the essential driver genes based on the information provided by scRNA-seq data. Namely, if two genes have high correlation in the scRNA-seq data, we assume the same pattern happens in the in situ matrix. Therefore, we choose only one of them without losing too much of the information. To find the correlated genes, we perform hierarchical clustering on the normalized scRNA-seq data to separate all 84 genes into 60 clusters (with the Euclidean distance and the Mcquitty linkage). The Mcquitty linkage gives more weights for objects in small clusters than those in large clusters in calculating the distance between two clusters. Thus, it is suitable for situations with many small clusters. Since the numbers of clusters are fairly large in the sub-challenge 1 and 2, we opt to use the Mcquitty linkage for distance calculation. In sub-challenge 3, since the total number of clusters is shrunk to 20, which is smaller than sub-challenge 1 and 2, we choose to use the ward linkage in the hierarchical clustering part to obtain larger-sized clusters from the data. After this step, the gene selection process remains the same as sub-challenge 1 and 2.

After getting the clusters, we pick the most representative gene of each cluster by calculating the distance between each member gene and the cluster center based on the Euclidean distance and selecting the closest one.

Binarization of scRNA-seq data

We perform binarization on the normalized scRNA-seq data for the selected genes based on the “binarizeSingleCellData()” function in DistMap (https://github.com/rajewsky-lab/distmap). The details of binarization is as follows: for each quantile threshold, we perform binarization on the scRNA-seq data for each gene. If the gene expression value is larger than the quantile gene expression value, it will be set as 1, otherwise it is 0. Then we compute the difference between the correlation matrix of binarized scRNA-seq data and the correlation matrix of in situ matrix based on the root-mean-square error. Last, we select the quantile threshold which has the smallest difference to perform binarization for scRNA-seq data.

Compute the probability matrix between cells and bins

Given the binarized scRNA-seq data and in situ matrix, we calculate the probability matrix between cells and bins based on the selected driver genes. Here, we assume the selected driver gene number as n_g. The probability p_ij of a cell c_i (i∈[1,1297]) originating from the bin b_j (j∈[1,3039]) can be expressed as follows.

p_{i j} = \frac{n_{s}^{i j}}{n_{g}} (1)

$n_{s}^{i j}$ is the number of the same gene expression value (0 or 1) in the two binarized vectors of the scRNA-seq data and the in situ matrix for cell c_i and bin b_j.

Select top bins based on the probability matrix

The probability of a cell originating from a bin is determined by the gene expression in the bin and cell. More genes can improve the prediction of cell position. The bins with the higher probability are possibly the potential cell position. For sub-challenges 1–3, we follow the same process shown in Figure 1. To make the results more stable, we select enough bins (see below) based on the probability values. Then we use clustering to determine a more stable cell position.

Figure 1. Workflow of our spatial position prediction method for scRNA-seq.

To select the potential bins for predicting cell position, we check the distribution of the maximum values in the probability matrix for the bins (Figure 2). Then we use the third quartile of probability values to select the top bins in sub challenge 1 when using 60 driver genes. We use the first quartile of probability values in sub challenge 2 when using 40 driver genes. And we use all bins in sub challenge 3 when using 20 driver genes. If the number of selected bins is 0, the bin which has the maximum probability will be the predicted position. If the number of selected bins is larger than 100, then only the top 100 bins will be kept based on the probability.

Figure 2. The distribution of the maximum probability values in the probability matrix for the bins for the 60, 40 and 20 driver genes scenarios.

To check the effect of threshold on the prediction results, we test our method under different thresholds as shown in Figure 4 (b)–(d).

Silhouette score to determine the hierarchical clustering number

For high probability bins, we need to perform clustering to select the cluster which has the maximum sum of probability as cell position. Here, we use hierarchical clustering on the selected bins. The cluster number is determined by the silhouette score, which measures the average distance of a point to other points in its cluster compared to the smallest average distance to other clusters. The silhouette score ranges from -1 to +1. The higher the silhouette score, the closer the point is closer to its own cluster and the farer it is away from other clusters.

Predict cell positions based on the clustering result

We use the average silhouette score across all points to select the clustering number. We use NbClust package¹² to perform hierarchical clustering with the “centroid” method. Based on the silhouette score, we obtain the best clustering number. Then we compare the sums of probabilities of all clusters, and select the cluster which has the maximum sum of probabilities. We use the selected cluster center as a reference point to select 10 nearest bins as the top 10 most possible cell positions.

Performance evaluation

To evaluate the performance of our method, we use the three performance scores in the DREAM challenge (https://www.synapse.org/#!Synapse:syn17091286, https://github.com/dream-sctc/Scoring/blob/master/dream_scoring_clean.R). The first scoring metric is the primary score to estimate the precision of the assignment for the single cells. The second scoring metric is the average of the relative assignment metrics over all the single cells which is used when the first scores are equal for two methods. The third scoring metric is comparing prediction of gene patterns.

Ambiguous cells: If the predicted top 1 and top 2 positions are the same in the DistMap results, the prediction position will be ambiguous, and the cell will not be computed in the score 1–3. In this challenge, the number of ambiguous cells derived from DistMap are 287.

Results

Selecting genes

We calculated the sums of gene expression values in the in situ matrix for all selected driver genes for all 3039 bins. As Figure 3(a) shows, each bin has at least one gene expressed in the in situ matrix. It suggests that our selected driver genes can cover all bins. As the gene number decreases, the frequency of gene expression in each bin decreases. We also compared the overlapped genes in Figure 3(b). Among the 40 driver genes of the sub challenge 2, only one driver gene is not in the selected 60 driver genes of the sub challenge 1. Similarly, only 2 driver genes of the sub challenge 3 are not in the selected 60 driver genes. It suggests our method is consistent in selecting different number of driver genes.

Figure 3.

(a) The frequency of gene expression in each bin in the in situ matrix in different selected driver genes scenarios. (b) The overlaps among the selected genes from the three sub challenges.

Compare the predicted position and spatial gene expression

We used the score 1–3 to evaluate our method under the different selected driver genes scenarios. Figure 4(a) shows the scores of our submitted results for the sub-challenge 1, 2, 3. The blue bar is the score for the gold standard method using 84 driver genes from DistMap. For score 1, our method is close to the gold standard in sub challenge 1 when using 60 driver genes. The results of our method in sub challenge 2, 3 shows a larger difference. For score 2, our method shows high scores when using 60 and 40 driver genes. Score 2 is the average relative precision for all cells. It suggests that our method is robust for predicting the right position. The score 3 shows a small difference in our method when using 60 and 40 driver genes. Figure 4(b)-(d) shows the consistency of the score 1–3 over a range of thresholds in the different numbers of driver genes scenarios. Hence, Figure 4 shows that our method can obtain a close performance to the gold standard when using 60 driver genes.

Figure 4.

(a) Comparing the score 1, 2, 3 for sub challenge 1 (60 driver genes), 2 (40 driver genes) and 3 (20 driver genes) with the gold standard (84 driver genes). (b–d) Comparing the score 1, 2, 3 using different thresholds for (b) sub challenge 1; (c) sub challenge 2; (d) sub challenge 3. (The numbers are rounded for visualization.)

As shown in Figure 5, the spatial gene expression prediction accuracy is represented by MCC correlation between the predicted cell position in the in situ matrix and the binarized scRNA-seq data for each driver gene. Score 3 is based on the MCC correlation for each driver gene used in each sub challenge. Corresponding to Figure 4(a), the MCC between the DistMap (84) and our method (60 or 40) for each driver gene in sub challenges 1 and 2 are very close. In sub challenge 3, the MCC of gene “dpn”, “erm”, “ftz”, “h” from our method are much lower than DistMap. It is consistent to the lower score 3 in sub challenge 3.

Figure 5.

Comparing the spatial gene expression between DistMap (84) with our method using (a) 60 genes in sub challenge 1; (b) 40 genes in sub challenge 2; (c) 20 genes in sub challenge 3.

Conclusion

We described our method and its performance using 60, 40 and 20 driver genes by comparing with the gold standard (DistMap results). In sub challenge 1, our results shows a close performance to the gold standard. In sub challenges 2 and 3, when using 40 and 20 driver genes, the score 1 decreases and score 3 is still close to the gold standard. It suggests our method can predict cell positions using 60 genes and predict gene expression patterns using less genes. We tested the threshold for selecting top bins (Figure 4(b)–(d)): the results suggest that our method can achieve even better results when using the maximum threshold for sub challenges 1, 2, and 3.

Data availability

The dataset associated with the DREAM Single Cell Transcriptomics Challenge is available for registered participants at https://www.synapse.org/#!Synapse:syn16782360 and https://www.synapse.org/#!Synapse:syn18632189. Due to sharing protocol of Synapse, users should register in Synapse (free of charge; https://www.synapse.org/) using their email address, and agree to the dataset conditions of use. Once registered, users can download the files.

Synapse: SCTC Challenge zho_team Submission. Code and results underlying this article, https://doi.org/10.7303/syn17056435¹³.

Software availability

Source code implementation for the method presented in this article and used in the DREAM Single Cell Transcriptomics Challenge is available from: https://github.com/ouyang-lab/SCTC-Challenge-zho_team. Scoring scripts are available from: https://github.com/dream-sctc/Scoring/blob/master/dream_scoring_clean.R.

Archived source code as at time of publication: https://doi.org/10.5281/zenodo.3592532¹⁴

License: GLP 3.0

Acknowledgements

Data used in this publication were generated by Prof. Dr. Nikolaus Rajewsky, Max Delbrück at the Center for Molecular Medicine, and these results were obtained as part of the DREAM Single Cell Transcriptomics Challenge project through Synapse ID (syn15665609). We thank Pablo Meyer and Jovan Tanevski for providing the documents, code and suggestions for score 1–3.

Faculty Opinions recommended

References

1. Stuart T, Satija R: Integrative single-cell analysis. Nat Rev Genet. 2019; 20(5): 257–272. PubMed Abstract | Publisher Full Text
2. Farrell JA, Wang Y, Riesenfeld SJ, et al.: Single-cell reconstruction of developmental trajectories during zebrafish embryogenesis. Science. 2018; 360(6392): pii: eaar3131. PubMed Abstract | Publisher Full Text | Free Full Text
3. Karaiskos N, Wahle P, Alles J, et al.: The Drosophila embryo at single-cell transcriptome resolution. Science. 2017; 358(6360): 194–199. PubMed Abstract | Publisher Full Text
4. Satija R, Farrell JA, Gennert D, et al.: Spatial reconstruction of single-cell gene expression data. Nat Biotechnol. 2015; 33(5): 495–502. PubMed Abstract | Publisher Full Text | Free Full Text
5. Achim K, Pettit JB, Saraiva LR, et al.: High-throughput spatial mapping of single-cell RNA-seq data to tissue of origin. Nat Biotechnol. 2015; 33(5): 503–9. PubMed Abstract | Publisher Full Text
6. Moor AE, Harnik Y, Ben-Moshe S, et al.: Spatial reconstruction of single enterocytes uncovers broad zonation along the intestinal villus axis. Cell. 2018; 175(4): 1156–1167.e15. PubMed Abstract | Publisher Full Text
7. Nitzan M, Karaiskos N, Friedman N, et al.: Charting a tissue from single-cell transcriptomes. bioRxiv. 2018; 456350. Publisher Full Text
8. Shah S, Lubeck E, Zhou W, et al.: In situ transcription profiling of single cells reveals spatial organization of cells in the mouse hippocampus. Neuron. 2016; 92(2): 342–357. PubMed Abstract | Publisher Full Text | Free Full Text
9. Zhu Q, Shah S, Dries R, et al.: Identification of spatially associated subpopulations by combining scRNAseq and sequential fluorescence in situ hybridization data. Nat Biotechnol. 2018; 36: 1183–1190. PubMed Abstract | Publisher Full Text | Free Full Text
10. Moffitt JR, Bambah-Mukku D, Eichhorn SW, et al.: Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region. Science. 2018; 362(6416): pii: eaau5324. PubMed Abstract | Publisher Full Text | Free Full Text
11. Macosko EZ, Basu A, Satija R, et al.: Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets. Cell. 2015; 161(5): 1202–1214. PubMed Abstract | Publisher Full Text | Free Full Text
12. Charrad M, Ghazzali N, Boiteau V, et al.: NbClust: An R Package for Determining the Relevant Number of Clusters in a Data Set. J Stat Softw. 2014; 61(6): 1–36. Publisher Full Text
13. Chen Y, Mao D, Zhang Y, et al.: SCTC Challenge zho_team Submission. Synapse [dataset]. 2019. http://www.doi.org/10.7303/syn17056435
14. yancychy: ouyang-lab/SCTC-Challenge-zho_team v0.1 (Version v0.1). Zenodo. 2019. http://www.doi.org/10.5281/zenodo.3592532

Comments on this article Comments (0)

Version 2

VERSION 2 PUBLISHED 19 Feb 2020

Author details Author details

Yang Chen
Roles: Data Curation, Formal Analysis, Investigation, Methodology, Software, Validation, Visualization, Writing – Original Draft Preparation

Disheng Mao
Roles: Investigation, Methodology

Yuping Zhang
Roles: Funding Acquisition, Investigation, Methodology, Resources, Supervision, Writing – Review & Editing

Zhengqing Ouyang
Roles: Conceptualization, Formal Analysis, Funding Acquisition, Investigation, Methodology, Project Administration, Resources, Supervision, Writing – Review & Editing

Competing interests

No competing interests were disclosed.

Grant information

This work was partially supported by the Faculty Research Excellence Program Award at UConn (to YZ).
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Article Versions (2)

version 2

Revised

Published: 07 Jun 2021, 9:124

https://doi.org/10.12688/f1000research.20446.2

version 1

Published: 19 Feb 2020, 9:124

https://doi.org/10.12688/f1000research.20446.1

© 2020 Chen Y et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

SEE MORE DETAILS

CITE

how to cite this article

Chen Y, Mao D, Zhang Y and Ouyang Z. An unsupervised learning method for reconstructing cell spatial organization with application to the DREAM Single Cell Transcriptomics Challenge [version 1; peer review: 2 not approved]. F1000Research 2020, 9:124 (https://doi.org/10.12688/f1000research.20446.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Version 1

VERSION 1

PUBLISHED 19 Feb 2020

Views

Reviewer Report 23 Sep 2020

Mark S. Cembrowski, Dept. of Cellular and Physiological Sciences, University of British Columbia, Vancouver, BC, Canada

Larissa Kraus, Department of Cellular and Physiological Sciences, University of British Columbia, Vancouver, BC, Canada

Not Approved

https://doi.org/10.5256/f1000research.22477.r70705

In this manuscript, the authors describe an approach to infer spatial gene expression using scRNA-seq data. The authors aimed to identify spatial expression of cell clusters using 20, 40 or 60 marker genes, and compared to the “gold standard” Distmap method using 84 marker genes.

The authors begin to explain the rationale of the manuscript by describing the concept of the DREAM challenge, which aims to introduce novel computational techniques to map spatial gene expression using transcriptomic data. Since the DREAM challenge aims to implement novel methods, it is imperative to introduce and carefully compare common methods with the novel proposed approach, which was underdeveloped by the authors. As an example, the “gold standard” method is mentioned several times, though it is not clear what exactly the “gold standard” is or how well-controlled it is. As another key example, a discussion of any kind is completely missing.

The design of figures for a manuscript should be chosen to clearly emphasize the main points of the data. Here, the figures are difficult to interpret because key information and description is missing (e.g., legends are too brief, axis labels are missing, subplot headings are redundant, figure design is inconsistent). In addition, the design and presentation of the figures seem unmotivated and thereby difficult for the reader to see the importance of any particular data or easily draw conclusions from the results. In addition, consistent colour schemes and designs would help the reader to follow the manuscript and understand the results.

Is the rationale for developing the new method (or application) clearly explained?

Yes
Is the description of the method technically sound?

Yes
Are sufficient details provided to allow replication of the method development and its use by others?

Partly
If any results are presented, are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions about the method and its performance adequately supported by the findings presented in the article?

No

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Transcriptomics

We confirm that we have read this submission and believe that we have an appropriate level of expertise to state that we do not consider it to be of an acceptable scientific standard, for reasons outlined above.

CITE

Report a concern

Author Response 07 Jun 2021

Zhengqing Ouyang, Department of Biostatistics and Epidemiology, School of Public Health and Health Sciences, University of Massachusetts, Amherst, USA

07 Jun 2021

Author Response

Point-by-point response to the comments of Reviewer 2

Reviewer 2

In this manuscript, the authors describe an approach to infer spatial gene expression using scRNA-seq data. The authors ... Continue reading Point-by-point response to the comments of Reviewer 2

Reviewer 2

In this manuscript, the authors describe an approach to infer spatial gene expression using scRNA-seq data. The authors aimed to identify spatial expression of cell clusters using 20, 40 or 60 marker genes, and compared to the “gold standard” Distmap method using 84 marker genes.

The authors begin to explain the rationale of the manuscript by describing the concept of the DREAM challenge, which aims to introduce novel computational techniques to map spatial gene expression using transcriptomic data.
Since the DREAM challenge aims to implement novel methods, it is imperative to introduce and carefully compare common methods with the novel proposed approach, which was underdeveloped by the authors. As an example, the “gold standard” method is mentioned several times, though it is not clear what exactly the “gold standard” is or how well-controlled it is. As another key example, a discussion of any kind is completely missing.

We thank the reviewer for the constructive comments. The “gold standard” is generated by DistMap, which is the best available single cell position reference used in the DREAM Challenge. DistMap determines single cell position by searching the maximum similarity for cells and spatial bins using the in situ expression of 84 driver genes in the Berkeley Drosophila Transcription Network Project (BDNTP). It is shown that the 84 driver gens are sufficient to uniquely and individually label most of the 1,297 cells (Karaiskos, et al. 2017). Thus, it is used in the DREAM Challenge to assess the performance of methods using smaller numbers (60, 40, and 20) of driver genes. In the revised manuscript, we changed “gold standard” to “silver standard”, as the cell positions generated by DistMap are not experimentally validated. We added the description of the DistMap “silver standard” in the Method section.

We also added more introductions to the common methods for the cell position prediction from single cell transcriptome data in the Introduction section. We described the limitation of current methods and explain the motivation to study how the driver gene selection affecting the prediction of cell spatial position.

The comparisons of our methods (Zho_team) and other top performing methods have been described in the DREAM Single Cell Transcriptomics Challenge consortium paper (Gene selection for optimal prediction of cell position in tissues from single-cell transcriptomics data, Life Sci Alliance. 2020 Nov; 3(11): e202000867). Our manuscript is focused on describing the details of our method used in the DREAM Challenge.

We also expanded the Discussion section of our manuscript. In addition to summarizing our method and results, we also pointed that gene-gene correlation and spatial correlation are useful information for spatial cell position and gene expression prediction.

The design of figures for a manuscript should be chosen to clearly emphasize the main points of the data. Here, the figures are difficult to interpret because key information and description is missing (e.g., legends are too brief, axis labels are missing, subplot headings are redundant, figure design is inconsistent). In addition, the design and presentation of the figures seem unmotivated and thereby difficult for the reader to see the importance of any particular data or easily draw conclusions from the results.

In addition, consistent colour schemes and designs would help the reader to follow the manuscript and understand the results.

We thank the reviewer for the constructive comments. In the revised manuscript, we improved all figures and legends. We enlarged the font sizes for all figures. In Figure 1, we clarified the whole workflow to display the motivation of our method. In Figure 2, we changed it to display the number of bins with high similarity values (> 0.8) in the similarity matrix for each cell when the number of driver genes is 60, 40, or 20. In Figure 3, we enlarged the font size and added more information in the figure labels. In Figure 4, we enlarged the font size. In figure 5, we changed it to the comparison of the Pearson correlation of predicted gene expression from DistMap (using 84 driver genes) and our method (using 60, 40, or 20 genes).
Point-by-point response to the comments of Reviewer 2

Reviewer 2

In this manuscript, the authors describe an approach to infer spatial gene expression using scRNA-seq data. The authors aimed to identify spatial expression of cell clusters using 20, 40 or 60 marker genes, and compared to the “gold standard” Distmap method using 84 marker genes.

The authors begin to explain the rationale of the manuscript by describing the concept of the DREAM challenge, which aims to introduce novel computational techniques to map spatial gene expression using transcriptomic data.
Since the DREAM challenge aims to implement novel methods, it is imperative to introduce and carefully compare common methods with the novel proposed approach, which was underdeveloped by the authors. As an example, the “gold standard” method is mentioned several times, though it is not clear what exactly the “gold standard” is or how well-controlled it is. As another key example, a discussion of any kind is completely missing.

We thank the reviewer for the constructive comments. The “gold standard” is generated by DistMap, which is the best available single cell position reference used in the DREAM Challenge. DistMap determines single cell position by searching the maximum similarity for cells and spatial bins using the in situ expression of 84 driver genes in the Berkeley Drosophila Transcription Network Project (BDNTP). It is shown that the 84 driver gens are sufficient to uniquely and individually label most of the 1,297 cells (Karaiskos, et al. 2017). Thus, it is used in the DREAM Challenge to assess the performance of methods using smaller numbers (60, 40, and 20) of driver genes. In the revised manuscript, we changed “gold standard” to “silver standard”, as the cell positions generated by DistMap are not experimentally validated. We added the description of the DistMap “silver standard” in the Method section.

We also added more introductions to the common methods for the cell position prediction from single cell transcriptome data in the Introduction section. We described the limitation of current methods and explain the motivation to study how the driver gene selection affecting the prediction of cell spatial position.

The comparisons of our methods (Zho_team) and other top performing methods have been described in the DREAM Single Cell Transcriptomics Challenge consortium paper (Gene selection for optimal prediction of cell position in tissues from single-cell transcriptomics data, Life Sci Alliance. 2020 Nov; 3(11): e202000867). Our manuscript is focused on describing the details of our method used in the DREAM Challenge.

We also expanded the Discussion section of our manuscript. In addition to summarizing our method and results, we also pointed that gene-gene correlation and spatial correlation are useful information for spatial cell position and gene expression prediction.

The design of figures for a manuscript should be chosen to clearly emphasize the main points of the data. Here, the figures are difficult to interpret because key information and description is missing (e.g., legends are too brief, axis labels are missing, subplot headings are redundant, figure design is inconsistent). In addition, the design and presentation of the figures seem unmotivated and thereby difficult for the reader to see the importance of any particular data or easily draw conclusions from the results.

In addition, consistent colour schemes and designs would help the reader to follow the manuscript and understand the results.

We thank the reviewer for the constructive comments. In the revised manuscript, we improved all figures and legends. We enlarged the font sizes for all figures. In Figure 1, we clarified the whole workflow to display the motivation of our method. In Figure 2, we changed it to display the number of bins with high similarity values (> 0.8) in the similarity matrix for each cell when the number of driver genes is 60, 40, or 20. In Figure 3, we enlarged the font size and added more information in the figure labels. In Figure 4, we enlarged the font size. In figure 5, we changed it to the comparison of the Pearson correlation of predicted gene expression from DistMap (using 84 driver genes) and our method (using 60, 40, or 20 genes).
Competing Interests: No competing interests were disclosed. Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response 07 Jun 2021

Zhengqing Ouyang, Department of Biostatistics and Epidemiology, School of Public Health and Health Sciences, University of Massachusetts, Amherst, USA

07 Jun 2021

Author Response

Point-by-point response to the comments of Reviewer 2

Reviewer 2

In this manuscript, the authors describe an approach to infer spatial gene expression using scRNA-seq data. The authors ... Continue reading Point-by-point response to the comments of Reviewer 2

Reviewer 2

In this manuscript, the authors describe an approach to infer spatial gene expression using scRNA-seq data. The authors aimed to identify spatial expression of cell clusters using 20, 40 or 60 marker genes, and compared to the “gold standard” Distmap method using 84 marker genes.

The authors begin to explain the rationale of the manuscript by describing the concept of the DREAM challenge, which aims to introduce novel computational techniques to map spatial gene expression using transcriptomic data.
Since the DREAM challenge aims to implement novel methods, it is imperative to introduce and carefully compare common methods with the novel proposed approach, which was underdeveloped by the authors. As an example, the “gold standard” method is mentioned several times, though it is not clear what exactly the “gold standard” is or how well-controlled it is. As another key example, a discussion of any kind is completely missing.

We thank the reviewer for the constructive comments. The “gold standard” is generated by DistMap, which is the best available single cell position reference used in the DREAM Challenge. DistMap determines single cell position by searching the maximum similarity for cells and spatial bins using the in situ expression of 84 driver genes in the Berkeley Drosophila Transcription Network Project (BDNTP). It is shown that the 84 driver gens are sufficient to uniquely and individually label most of the 1,297 cells (Karaiskos, et al. 2017). Thus, it is used in the DREAM Challenge to assess the performance of methods using smaller numbers (60, 40, and 20) of driver genes. In the revised manuscript, we changed “gold standard” to “silver standard”, as the cell positions generated by DistMap are not experimentally validated. We added the description of the DistMap “silver standard” in the Method section.

We also added more introductions to the common methods for the cell position prediction from single cell transcriptome data in the Introduction section. We described the limitation of current methods and explain the motivation to study how the driver gene selection affecting the prediction of cell spatial position.

The comparisons of our methods (Zho_team) and other top performing methods have been described in the DREAM Single Cell Transcriptomics Challenge consortium paper (Gene selection for optimal prediction of cell position in tissues from single-cell transcriptomics data, Life Sci Alliance. 2020 Nov; 3(11): e202000867). Our manuscript is focused on describing the details of our method used in the DREAM Challenge.

We also expanded the Discussion section of our manuscript. In addition to summarizing our method and results, we also pointed that gene-gene correlation and spatial correlation are useful information for spatial cell position and gene expression prediction.

The design of figures for a manuscript should be chosen to clearly emphasize the main points of the data. Here, the figures are difficult to interpret because key information and description is missing (e.g., legends are too brief, axis labels are missing, subplot headings are redundant, figure design is inconsistent). In addition, the design and presentation of the figures seem unmotivated and thereby difficult for the reader to see the importance of any particular data or easily draw conclusions from the results.

In addition, consistent colour schemes and designs would help the reader to follow the manuscript and understand the results.

We thank the reviewer for the constructive comments. In the revised manuscript, we improved all figures and legends. We enlarged the font sizes for all figures. In Figure 1, we clarified the whole workflow to display the motivation of our method. In Figure 2, we changed it to display the number of bins with high similarity values (> 0.8) in the similarity matrix for each cell when the number of driver genes is 60, 40, or 20. In Figure 3, we enlarged the font size and added more information in the figure labels. In Figure 4, we enlarged the font size. In figure 5, we changed it to the comparison of the Pearson correlation of predicted gene expression from DistMap (using 84 driver genes) and our method (using 60, 40, or 20 genes).
Point-by-point response to the comments of Reviewer 2

Reviewer 2

In this manuscript, the authors describe an approach to infer spatial gene expression using scRNA-seq data. The authors aimed to identify spatial expression of cell clusters using 20, 40 or 60 marker genes, and compared to the “gold standard” Distmap method using 84 marker genes.

The authors begin to explain the rationale of the manuscript by describing the concept of the DREAM challenge, which aims to introduce novel computational techniques to map spatial gene expression using transcriptomic data.
Since the DREAM challenge aims to implement novel methods, it is imperative to introduce and carefully compare common methods with the novel proposed approach, which was underdeveloped by the authors. As an example, the “gold standard” method is mentioned several times, though it is not clear what exactly the “gold standard” is or how well-controlled it is. As another key example, a discussion of any kind is completely missing.

We thank the reviewer for the constructive comments. The “gold standard” is generated by DistMap, which is the best available single cell position reference used in the DREAM Challenge. DistMap determines single cell position by searching the maximum similarity for cells and spatial bins using the in situ expression of 84 driver genes in the Berkeley Drosophila Transcription Network Project (BDNTP). It is shown that the 84 driver gens are sufficient to uniquely and individually label most of the 1,297 cells (Karaiskos, et al. 2017). Thus, it is used in the DREAM Challenge to assess the performance of methods using smaller numbers (60, 40, and 20) of driver genes. In the revised manuscript, we changed “gold standard” to “silver standard”, as the cell positions generated by DistMap are not experimentally validated. We added the description of the DistMap “silver standard” in the Method section.

We also added more introductions to the common methods for the cell position prediction from single cell transcriptome data in the Introduction section. We described the limitation of current methods and explain the motivation to study how the driver gene selection affecting the prediction of cell spatial position.

The comparisons of our methods (Zho_team) and other top performing methods have been described in the DREAM Single Cell Transcriptomics Challenge consortium paper (Gene selection for optimal prediction of cell position in tissues from single-cell transcriptomics data, Life Sci Alliance. 2020 Nov; 3(11): e202000867). Our manuscript is focused on describing the details of our method used in the DREAM Challenge.

We also expanded the Discussion section of our manuscript. In addition to summarizing our method and results, we also pointed that gene-gene correlation and spatial correlation are useful information for spatial cell position and gene expression prediction.

The design of figures for a manuscript should be chosen to clearly emphasize the main points of the data. Here, the figures are difficult to interpret because key information and description is missing (e.g., legends are too brief, axis labels are missing, subplot headings are redundant, figure design is inconsistent). In addition, the design and presentation of the figures seem unmotivated and thereby difficult for the reader to see the importance of any particular data or easily draw conclusions from the results.

In addition, consistent colour schemes and designs would help the reader to follow the manuscript and understand the results.

We thank the reviewer for the constructive comments. In the revised manuscript, we improved all figures and legends. We enlarged the font sizes for all figures. In Figure 1, we clarified the whole workflow to display the motivation of our method. In Figure 2, we changed it to display the number of bins with high similarity values (> 0.8) in the similarity matrix for each cell when the number of driver genes is 60, 40, or 20. In Figure 3, we enlarged the font size and added more information in the figure labels. In Figure 4, we enlarged the font size. In figure 5, we changed it to the comparison of the Pearson correlation of predicted gene expression from DistMap (using 84 driver genes) and our method (using 60, 40, or 20 genes).
Competing Interests: No competing interests were disclosed. Close
Report a concern

Views

Reviewer Report 02 Sep 2020

Xianwen Ren, Biomedical Pioneering Innovation Center, Peking University, Beijing, China

Not Approved

https://doi.org/10.5256/f1000research.22477.r69520

Reconstructing the spatial information of single cells from single-cell RNA-seq data is a pivotal question to further release the revolutionary power of the scRNA-seq technology. Here the authors propose a computational method to infer the spatial positions of single cells of Drosophila embryos based on scRNA-seq data and a reference spatial map based on in situ hybridization of 84 driver genes. While the method is finely tuned for this specific scientific question, some concerns exist.

First, the title is misleading. The authors claimed "an unsupervised learning method". This is not valid because this method uses the reference spatial map of Drosophila embryos. Unless the method only uses the scRNA-seq data, it cannot be claimed "an unsupervised learning method".

Second, the current method has only been evaluated on one dataset. More independent evaluations should be added to demonstrate the generalizing capacity of this method.

Overall, this manuscript addresses an important scientific question. But the method needs more justifications and evidence to demonstrate its novelty and power.

Is the rationale for developing the new method (or application) clearly explained?

Yes
Is the description of the method technically sound?

Yes
Are sufficient details provided to allow replication of the method development and its use by others?

Yes
If any results are presented, are all the source data underlying the results available to ensure full reproducibility?

No source data required
Are the conclusions about the method and its performance adequately supported by the findings presented in the article?

Partly

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Bioinformatics

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

CITE

Report a concern

Author Response 07 Jun 2021

Zhengqing Ouyang, Department of Biostatistics and Epidemiology, School of Public Health and Health Sciences, University of Massachusetts, Amherst, USA

07 Jun 2021

Author Response

Point-by-point response to the comments of Reviewer 1

Reviewer 1

Reconstructing the spatial information of single cells from single-cell RNA-seq data is a pivotal question to further release ... Continue reading Point-by-point response to the comments of Reviewer 1

Reviewer 1

Reconstructing the spatial information of single cells from single-cell RNA-seq data is a pivotal question to further release the revolutionary power of the scRNA-seq technology. Here the authors propose a computational method to infer the spatial positions of single cells of Drosophila embryos based on scRNA-seq data and a reference spatial map based on in situ hybridization of 84 driver genes. While the method is finely tuned for this specific scientific question, some concerns exist.

1. First, the title is misleading. The authors claimed "an unsupervised learning method". This is not valid because this method uses the reference spatial map of Drosophila embryos. Unless the method only uses the scRNA-seq data, it cannot be claimed "an unsupervised learning method".

We thank the reviewer for the comments. To clarify, we used unsupervised hierarchical clustering for gene selection. Based on the selected genes, we predicted the spatial position of each individual cell. Unsupervised and supervised gene selection approaches are used by the methods in the DREAM Single Cell Transcriptomics Challenge for predicting cell spatial position. Table S2 of the consortium paper (Gene selection for optimal prediction of cell position in tissues from single-cell transcriptomics data, Life Sci Alliance. 2020 Nov; 3(11): e202000867) listed the top performing methods (included ours) using either unsupervised or supervised gene selection.

2.Second, the current method has only been evaluated on one dataset. More independent evaluations should be added to demonstrate the generalizing capacity of this method.

We thank the reviewer for the constructive comments. The results of our method (Zho_team) and other top performing methods have been systematically and independently evaluated on the Drosophila embryo dataset in the DREAM Single Cell Transcriptomics Challenge (https://www.synapse.org/#!Synapse:syn15665609/wiki/583234) and in the consortium paper aforementioned.

Our manuscript, collected in the F1000Research DREAM Challenges Gateway (https://f1000research.com/gateways/dreamchallenges/singlecelltranscriptomics), is focused on describing the details of our method used in the DREAM Challenge. To avoid ambiguity, in the revised manuscript, we changed the title of our manuscript to “Unsupervised gene selection for predicting cell spatial positions in the Drosophila embryo”.

Overall, this manuscript addresses an important scientific question. But the method needs more justifications and evidence to demonstrate its novelty and power.

We thank the reviewer for the comments. Our method is the only one that uses unsupervised hierarchical clustering for gene selection among the top performing methods as described in Table S2 of the DREAM Challenge consortium paper mentioned above. The power has been systematically demonstrated in our manuscript as well as the consortium paper using independent measurements and comprehensive comparisons.
Point-by-point response to the comments of Reviewer 1

Reviewer 1

Reconstructing the spatial information of single cells from single-cell RNA-seq data is a pivotal question to further release the revolutionary power of the scRNA-seq technology. Here the authors propose a computational method to infer the spatial positions of single cells of Drosophila embryos based on scRNA-seq data and a reference spatial map based on in situ hybridization of 84 driver genes. While the method is finely tuned for this specific scientific question, some concerns exist.

1. First, the title is misleading. The authors claimed "an unsupervised learning method". This is not valid because this method uses the reference spatial map of Drosophila embryos. Unless the method only uses the scRNA-seq data, it cannot be claimed "an unsupervised learning method".

We thank the reviewer for the comments. To clarify, we used unsupervised hierarchical clustering for gene selection. Based on the selected genes, we predicted the spatial position of each individual cell. Unsupervised and supervised gene selection approaches are used by the methods in the DREAM Single Cell Transcriptomics Challenge for predicting cell spatial position. Table S2 of the consortium paper (Gene selection for optimal prediction of cell position in tissues from single-cell transcriptomics data, Life Sci Alliance. 2020 Nov; 3(11): e202000867) listed the top performing methods (included ours) using either unsupervised or supervised gene selection.

2.Second, the current method has only been evaluated on one dataset. More independent evaluations should be added to demonstrate the generalizing capacity of this method.

We thank the reviewer for the constructive comments. The results of our method (Zho_team) and other top performing methods have been systematically and independently evaluated on the Drosophila embryo dataset in the DREAM Single Cell Transcriptomics Challenge (https://www.synapse.org/#!Synapse:syn15665609/wiki/583234) and in the consortium paper aforementioned.

Our manuscript, collected in the F1000Research DREAM Challenges Gateway (https://f1000research.com/gateways/dreamchallenges/singlecelltranscriptomics), is focused on describing the details of our method used in the DREAM Challenge. To avoid ambiguity, in the revised manuscript, we changed the title of our manuscript to “Unsupervised gene selection for predicting cell spatial positions in the Drosophila embryo”.

Overall, this manuscript addresses an important scientific question. But the method needs more justifications and evidence to demonstrate its novelty and power.

We thank the reviewer for the comments. Our method is the only one that uses unsupervised hierarchical clustering for gene selection among the top performing methods as described in Table S2 of the DREAM Challenge consortium paper mentioned above. The power has been systematically demonstrated in our manuscript as well as the consortium paper using independent measurements and comprehensive comparisons.
Competing Interests: No competing interests were disclosed. Close
Report a concern
Reviewer Response 17 Jun 2021

Xianwen Ren, Biomedical Pioneering Innovation Center, Peking University, Beijing, China

17 Jun 2021

Reviewer Response

The authors have clarified my concerns.
Competing Interests: NA
The authors have clarified my concerns.
The authors have clarified my concerns.
Competing Interests: NA Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response 07 Jun 2021

Zhengqing Ouyang, Department of Biostatistics and Epidemiology, School of Public Health and Health Sciences, University of Massachusetts, Amherst, USA

07 Jun 2021

Author Response

Point-by-point response to the comments of Reviewer 1

Reviewer 1

Reconstructing the spatial information of single cells from single-cell RNA-seq data is a pivotal question to further release ... Continue reading Point-by-point response to the comments of Reviewer 1

Reviewer 1

Reconstructing the spatial information of single cells from single-cell RNA-seq data is a pivotal question to further release the revolutionary power of the scRNA-seq technology. Here the authors propose a computational method to infer the spatial positions of single cells of Drosophila embryos based on scRNA-seq data and a reference spatial map based on in situ hybridization of 84 driver genes. While the method is finely tuned for this specific scientific question, some concerns exist.

1. First, the title is misleading. The authors claimed "an unsupervised learning method". This is not valid because this method uses the reference spatial map of Drosophila embryos. Unless the method only uses the scRNA-seq data, it cannot be claimed "an unsupervised learning method".

We thank the reviewer for the comments. To clarify, we used unsupervised hierarchical clustering for gene selection. Based on the selected genes, we predicted the spatial position of each individual cell. Unsupervised and supervised gene selection approaches are used by the methods in the DREAM Single Cell Transcriptomics Challenge for predicting cell spatial position. Table S2 of the consortium paper (Gene selection for optimal prediction of cell position in tissues from single-cell transcriptomics data, Life Sci Alliance. 2020 Nov; 3(11): e202000867) listed the top performing methods (included ours) using either unsupervised or supervised gene selection.

2.Second, the current method has only been evaluated on one dataset. More independent evaluations should be added to demonstrate the generalizing capacity of this method.

We thank the reviewer for the constructive comments. The results of our method (Zho_team) and other top performing methods have been systematically and independently evaluated on the Drosophila embryo dataset in the DREAM Single Cell Transcriptomics Challenge (https://www.synapse.org/#!Synapse:syn15665609/wiki/583234) and in the consortium paper aforementioned.

Our manuscript, collected in the F1000Research DREAM Challenges Gateway (https://f1000research.com/gateways/dreamchallenges/singlecelltranscriptomics), is focused on describing the details of our method used in the DREAM Challenge. To avoid ambiguity, in the revised manuscript, we changed the title of our manuscript to “Unsupervised gene selection for predicting cell spatial positions in the Drosophila embryo”.

Overall, this manuscript addresses an important scientific question. But the method needs more justifications and evidence to demonstrate its novelty and power.

We thank the reviewer for the comments. Our method is the only one that uses unsupervised hierarchical clustering for gene selection among the top performing methods as described in Table S2 of the DREAM Challenge consortium paper mentioned above. The power has been systematically demonstrated in our manuscript as well as the consortium paper using independent measurements and comprehensive comparisons.
Point-by-point response to the comments of Reviewer 1

Reviewer 1

Reconstructing the spatial information of single cells from single-cell RNA-seq data is a pivotal question to further release the revolutionary power of the scRNA-seq technology. Here the authors propose a computational method to infer the spatial positions of single cells of Drosophila embryos based on scRNA-seq data and a reference spatial map based on in situ hybridization of 84 driver genes. While the method is finely tuned for this specific scientific question, some concerns exist.

1. First, the title is misleading. The authors claimed "an unsupervised learning method". This is not valid because this method uses the reference spatial map of Drosophila embryos. Unless the method only uses the scRNA-seq data, it cannot be claimed "an unsupervised learning method".

We thank the reviewer for the comments. To clarify, we used unsupervised hierarchical clustering for gene selection. Based on the selected genes, we predicted the spatial position of each individual cell. Unsupervised and supervised gene selection approaches are used by the methods in the DREAM Single Cell Transcriptomics Challenge for predicting cell spatial position. Table S2 of the consortium paper (Gene selection for optimal prediction of cell position in tissues from single-cell transcriptomics data, Life Sci Alliance. 2020 Nov; 3(11): e202000867) listed the top performing methods (included ours) using either unsupervised or supervised gene selection.

2.Second, the current method has only been evaluated on one dataset. More independent evaluations should be added to demonstrate the generalizing capacity of this method.

We thank the reviewer for the constructive comments. The results of our method (Zho_team) and other top performing methods have been systematically and independently evaluated on the Drosophila embryo dataset in the DREAM Single Cell Transcriptomics Challenge (https://www.synapse.org/#!Synapse:syn15665609/wiki/583234) and in the consortium paper aforementioned.

Our manuscript, collected in the F1000Research DREAM Challenges Gateway (https://f1000research.com/gateways/dreamchallenges/singlecelltranscriptomics), is focused on describing the details of our method used in the DREAM Challenge. To avoid ambiguity, in the revised manuscript, we changed the title of our manuscript to “Unsupervised gene selection for predicting cell spatial positions in the Drosophila embryo”.

Overall, this manuscript addresses an important scientific question. But the method needs more justifications and evidence to demonstrate its novelty and power.

We thank the reviewer for the comments. Our method is the only one that uses unsupervised hierarchical clustering for gene selection among the top performing methods as described in Table S2 of the DREAM Challenge consortium paper mentioned above. The power has been systematically demonstrated in our manuscript as well as the consortium paper using independent measurements and comprehensive comparisons.
Competing Interests: No competing interests were disclosed. Close
Report a concern
Reviewer Response 17 Jun 2021

Xianwen Ren, Biomedical Pioneering Innovation Center, Peking University, Beijing, China

17 Jun 2021

Reviewer Response

The authors have clarified my concerns.
Competing Interests: NA
The authors have clarified my concerns.
The authors have clarified my concerns.
Competing Interests: NA Close
Report a concern

Comments on this article Comments (0)

Version 2

VERSION 2 PUBLISHED 19 Feb 2020

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2	3	4	5
Version 2 (revision) 07 Jun 21	read		read	read	read
Version 1 19 Feb 20	read	read

Xianwen Ren, Peking University, Beijing, China
Mark S. Cembrowski, University of British Columbia, Vancouver, Canada

Larissa Kraus, University of British Columbia, Vancouver, Canada
Yen-Chung Chen, New York University, New York, USA
Saumitra Dey Choudhury, All India Institute of Medical Sciences, New Delhi, India
Komal Kumar Bollepogu Raja, Baylor college of medicine, Houston, USA

Comments on this article

All Comments(0)

Add a comment

Browse by related subjects

Back to all reports

Reviewer Report

9 Views

28 Jun 2024 | for Version 2

Komal Kumar Bollepogu Raja, Baylor college of medicine, Houston, TX, USA

9 Views Cite this report Responses(0)

Approved With Reservations

In this manuscript Chen et al develop an unsupervised gene selection model that predicts cell positions. They have utilized published single cell RNA seq dataset from Drosophila embryo. Further, hierarchical clustering was used to select a set of driver genes to predict cell positions. The authors compare their model with Distmap and show that their model performs well with 60 genes. The manuscript is well written and the data is presented well. Spatial reconstruction of tissues using single cell and other omics approaches is an emerging field which allows identifying dynamic gene expression programs across tissues. Therefore, the current study can be a valuable resource. However, the authors have evaluated their model on only a single dataset. It will be worthwhile to evaluate this method on at least one more independent dataset to see its consistency. As of now, the method cannot be generalized to a wide range of tissues.

Minor comment:
1. The last line in the Abstract should be changed: "The comparison with the Ã¢ÂÂsilver standardÃ¢ÂÂ suggests that our method is effective in reconstructing the cell spatial positions and gene expression patterns in tissues." Instead of tissues I suggest Drosophila embryo.

Is the rationale for developing the new method (or application) clearly explained?

Yes
Is the description of the method technically sound?

Yes
Are sufficient details provided to allow replication of the method development and its use by others?

Yes
If any results are presented, are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions about the method and its performance adequately supported by the findings presented in the article?

Partly

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Single cell omics, Genomics and Bioinformatics

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

11 Views

28 Jun 2024 | for Version 2

Saumitra Dey Choudhury, All India Institute of Medical Sciences, New Delhi, India

11 Views Cite this report Responses(0)

Not Approved

In this manuscript titled Unsupervised gene selection for predicting cell spatial positions in the Drosophila embryo, the authors have attempted to computationally deduce the spatial positions of individual cells in Drosophila embryos from their Drop-seq data in reference to the in situ hybridization expression patterns of 84 driver genes from the BDNTP. Though the method is well defined and could be useful in general to map spatial gene expression using transcriptomic data, some concerns exist:

First, the Methods section does not have any explanation of the wet lab procedure of homogenization of embryos, making the single-cell suspension and cDNA library preparation. Second, the quality of the Drop-seq data cannot be established from the manuscript. Third, the word Unsupervised in the title is misleading. Fourth and most importantly, experimental validation of the performance of the method is required; at least for a few of the driver genes using antibodies/ immunofluorescence.

In general, this manuscript brings forth a strong computational method to extrapolate spatial cell positions but lacks the experimental validation to support it.

Is the rationale for developing the new method (or application) clearly explained?

Yes
Is the description of the method technically sound?

Yes
Are sufficient details provided to allow replication of the method development and its use by others?

Yes
If any results are presented, are all the source data underlying the results available to ensure full reproducibility?

No source data required
Are the conclusions about the method and its performance adequately supported by the findings presented in the article?

No

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Drosophila Neurogenetics, Single-cell RNA Transcriptomics, Confocal and Super-resolution Microscopy

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

11 Views

27 Jun 2024 | for Version 2

Yen-Chung Chen, New York University, New York, New York, USA

11 Views Cite this report Responses(0)

Not Approved

Chen et al. described a novel method for feature selection for spatially mapping single-cell transcriptomic data. Originally a leading solution in DREAM Single-Cell Transcriptomics challenge [1], this method has demonstrated its relative edge in the challenge and superior performance in certain aspects against distMap [2]. While being indeed an advance in methodology, the work left several aspect to be desired to generalize and address feature selection in spatial mapping.

First, the original 84-gene pool was pre-selected based on prior knowledge of importance and spatial expression pattern in developing fly embryos [3] while most single-cell transcriptomic atlas are and will be generated from samples lacking prior knowledge with such breadth and resolution. It therefore remains open whether this proposed method will remain efficient when feature selection has to be performed on all genes or all variable genes. To address this issue, the method should be tested on more recent spatial and classic single-cell transcriptomic dataset to benchmark whether the method performs favorably when selecting features from single-cell RNA-seq without supervision from prior knowledge.

Second, since assigning a cell to a bin can be considered as transferring the label of the bin to the cell of interest, potential users would be curious how the method holds up against the broad selection of existing label transfer methods, including Seurat [4], scVI [5], and classic classifiers like multimodal logistic regression, random forest classifier, and support vector machine for predicting which in situ bin a cell will match. It might be of interest that a previous iteration of Seurat used to demonstrate a similar spatial mapping task [6] and will be a great benchmarking dataset.

Finally, there are minor methodological decisions that were reported where discussion of rationale could help a potential user and future method development. Specifically, in section Gene selection, the choice between Ward and Mcquitty linkage appeared arbitrary to me and made me wonder that for a user working on a different dataset, how the decision should be made (e.g., with regard to a fixed total number of clusters?). In section Compute the similarity matrix between cells and bins, it is unclear why p_ij was chosen as the scoring scheme instead of correlation coefficient or mutual information.

In sum, the work represents a novel and efficient method addressing the pressing need to systematically supplement existing single-cell transcriptomic atlases with spatial information, and the authors have demonstrated its power in answering a specific question. However, its general applicability remains to be tested on available spatial transcriptomic datasets collected in other contexts and tissues (e.g., [7-9]).

Is the rationale for developing the new method (or application) clearly explained?

Yes
Is the description of the method technically sound?

Partly
Are sufficient details provided to allow replication of the method development and its use by others?

Yes
If any results are presented, are all the source data underlying the results available to ensure full reproducibility?

No source data required
Are the conclusions about the method and its performance adequately supported by the findings presented in the article?

Partly

References

1. Tanevski J, Nguyen T, Truong B, Karaiskos N, et al.: Gene selection for optimal prediction of cell position in tissues from single-cell transcriptomics data.Life Sci Alliance. 2020; 3 (11). PubMed Abstract | Publisher Full Text
2. Karaiskos N, Wahle P, Alles J, Boltengagen A, et al.: The Drosophila embryo at single-cell transcriptome resolution.Science. 2017; 358 (6360): 194-199 PubMed Abstract | Publisher Full Text
3. Fowlkes CC, Hendriks CL, Keränen SV, Weber GH, et al.: A quantitative spatiotemporal atlas of gene expression in the Drosophila blastoderm.Cell. 2008; 133 (2): 364-74 PubMed Abstract | Publisher Full Text
4. Hao Y, Stuart T, Kowalski MH, Choudhary S, et al.: Dictionary learning for integrative, multimodal and scalable single-cell analysis.Nat Biotechnol. 2024; 42 (2): 293-304 PubMed Abstract | Publisher Full Text
5. Gayoso A, Lopez R, Xing G, Boyeau P, et al.: A Python library for probabilistic analysis of single-cell omics data. Nature Biotechnology. 2022; 40 (2): 163-166 Publisher Full Text
6. Mayer C, Hafemeister C, Bandler RC, Machold R, et al.: Developmental diversification of cortical inhibitory interneurons.Nature. 2018; 555 (7697): 457-462 PubMed Abstract | Publisher Full Text
7. Maniatis S, Äijö T, Vickovic S, Braine C, et al.: Spatiotemporal dynamics of molecular pathology in amyotrophic lateral sclerosis.Science. 2019; 364 (6435): 89-93 PubMed Abstract | Publisher Full Text
8. Langlieb J, Sachdev NS, Balderrama KS, Nadaf NM, et al.: The molecular cytoarchitecture of the adult mouse brain.Nature. 2023; 624 (7991): 333-342 PubMed Abstract | Publisher Full Text
9. Wang M, Hu Q, Lv T, Wang Y, et al.: High-resolution 3D spatiotemporal transcriptomic maps of developing Drosophila embryos and larvae.Dev Cell. 2022; 57 (10): 1271-1283.e4 PubMed Abstract | Publisher Full Text

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Genomics, developmental biology, biostatistics

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

10 Views

17 Jun 2021 | for Version 2

Xianwen Ren, Biomedical Pioneering Innovation Center, Peking University, Beijing, China

10 Views Cite this report Responses(0)

Approved

The authors have clarified my concerns.

Competing Interests

No competing interests were disclosed.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

29 Views

23 Sep 2020 | for Version 1

Mark S. Cembrowski, Dept. of Cellular and Physiological Sciences, University of British Columbia, Vancouver, BC, Canada

Larissa Kraus, Department of Cellular and Physiological Sciences, University of British Columbia, Vancouver, BC, Canada

29 Views Cite this report Responses(1)

Not Approved

Is the rationale for developing the new method (or application) clearly explained?

Yes
Is the description of the method technically sound?

Yes
Are sufficient details provided to allow replication of the method development and its use by others?

Partly
If any results are presented, are all the source data underlying the results available to ensure full reproducibility?

Yes
Are the conclusions about the method and its performance adequately supported by the findings presented in the article?

No

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Transcriptomics

Respond to this report

Responses (1)

Author Response

07 Jun 2021

Zhengqing Ouyang, Department of Biostatistics and Epidemiology, School of Public Health and Health Sciences, University of Massachusetts, Amherst, USA

Point-by-point response to the comments of Reviewer 2

Reviewer 2

In this manuscript, the authors describe an approach to infer spatial gene expression using scRNA-seq data. The authors aimed to identify spatial expression of cell clusters using 20, 40 or 60 marker genes, and compared to the “gold standard” Distmap method using 84 marker genes.

The authors begin to explain the rationale of the manuscript by describing the concept of the DREAM challenge, which aims to introduce novel computational techniques to map spatial gene expression using transcriptomic data.
Since the DREAM challenge aims to implement novel methods, it is imperative to introduce and carefully compare common methods with the novel proposed approach, which was underdeveloped by the authors. As an example, the “gold standard” method is mentioned several times, though it is not clear what exactly the “gold standard” is or how well-controlled it is. As another key example, a discussion of any kind is completely missing.

We thank the reviewer for the constructive comments. The “gold standard” is generated by DistMap, which is the best available single cell position reference used in the DREAM Challenge. DistMap determines single cell position by searching the maximum similarity for cells and spatial bins using the in situ expression of 84 driver genes in the Berkeley Drosophila Transcription Network Project (BDNTP). It is shown that the 84 driver gens are sufficient to uniquely and individually label most of the 1,297 cells (Karaiskos, et al. 2017). Thus, it is used in the DREAM Challenge to assess the performance of methods using smaller numbers (60, 40, and 20) of driver genes. In the revised manuscript, we changed “gold standard” to “silver standard”, as the cell positions generated by DistMap are not experimentally validated. We added the description of the DistMap “silver standard” in the Method section.

We also added more introductions to the common methods for the cell position prediction from single cell transcriptome data in the Introduction section. We described the limitation of current methods and explain the motivation to study how the driver gene selection affecting the prediction of cell spatial position.

The comparisons of our methods (Zho_team) and other top performing methods have been described in the DREAM Single Cell Transcriptomics Challenge consortium paper (Gene selection for optimal prediction of cell position in tissues from single-cell transcriptomics data, Life Sci Alliance. 2020 Nov; 3(11): e202000867). Our manuscript is focused on describing the details of our method used in the DREAM Challenge.

We also expanded the Discussion section of our manuscript. In addition to summarizing our method and results, we also pointed that gene-gene correlation and spatial correlation are useful information for spatial cell position and gene expression prediction.

The design of figures for a manuscript should be chosen to clearly emphasize the main points of the data. Here, the figures are difficult to interpret because key information and description is missing (e.g., legends are too brief, axis labels are missing, subplot headings are redundant, figure design is inconsistent). In addition, the design and presentation of the figures seem unmotivated and thereby difficult for the reader to see the importance of any particular data or easily draw conclusions from the results.

In addition, consistent colour schemes and designs would help the reader to follow the manuscript and understand the results.

We thank the reviewer for the constructive comments. In the revised manuscript, we improved all figures and legends. We enlarged the font sizes for all figures. In Figure 1, we clarified the whole workflow to display the motivation of our method. In Figure 2, we changed it to display the number of bins with high similarity values (> 0.8) in the similarity matrix for each cell when the number of driver genes is 60, 40, or 20. In Figure 3, we enlarged the font size and added more information in the figure labels. In Figure 4, we enlarged the font size. In figure 5, we changed it to the comparison of the Pearson correlation of predicted gene expression from DistMap (using 84 driver genes) and our method (using 60, 40, or 20 genes).

View more View less

Competing Interests

No competing interests were disclosed.

Back to all reports

Reviewer Report

35 Views

02 Sep 2020 | for Version 1

Xianwen Ren, Biomedical Pioneering Innovation Center, Peking University, Beijing, China

35 Views Cite this report Responses(2)

Not Approved

Is the rationale for developing the new method (or application) clearly explained?

Yes
Is the description of the method technically sound?

Yes
Are sufficient details provided to allow replication of the method development and its use by others?

Yes
If any results are presented, are all the source data underlying the results available to ensure full reproducibility?

No source data required
Are the conclusions about the method and its performance adequately supported by the findings presented in the article?

Partly

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Bioinformatics

Respond to this report

Responses (2)

Author Response

07 Jun 2021

Zhengqing Ouyang, Department of Biostatistics and Epidemiology, School of Public Health and Health Sciences, University of Massachusetts, Amherst, USA

Point-by-point response to the comments of Reviewer 1

Reviewer 1

Reconstructing the spatial information of single cells from single-cell RNA-seq data is a pivotal question to further release the revolutionary power of the scRNA-seq technology. Here the authors propose a computational method to infer the spatial positions of single cells of Drosophila embryos based on scRNA-seq data and a reference spatial map based on in situ hybridization of 84 driver genes. While the method is finely tuned for this specific scientific question, some concerns exist.

1. First, the title is misleading. The authors claimed "an unsupervised learning method". This is not valid because this method uses the reference spatial map of Drosophila embryos. Unless the method only uses the scRNA-seq data, it cannot be claimed "an unsupervised learning method".

We thank the reviewer for the comments. To clarify, we used unsupervised hierarchical clustering for gene selection. Based on the selected genes, we predicted the spatial position of each individual cell. Unsupervised and supervised gene selection approaches are used by the methods in the DREAM Single Cell Transcriptomics Challenge for predicting cell spatial position. Table S2 of the consortium paper (Gene selection for optimal prediction of cell position in tissues from single-cell transcriptomics data, Life Sci Alliance. 2020 Nov; 3(11): e202000867) listed the top performing methods (included ours) using either unsupervised or supervised gene selection.

2.Second, the current method has only been evaluated on one dataset. More independent evaluations should be added to demonstrate the generalizing capacity of this method.

We thank the reviewer for the constructive comments. The results of our method (Zho_team) and other top performing methods have been systematically and independently evaluated on the Drosophila embryo dataset in the DREAM Single Cell Transcriptomics Challenge (https://www.synapse.org/#!Synapse:syn15665609/wiki/583234) and in the consortium paper aforementioned.

Our manuscript, collected in the F1000Research DREAM Challenges Gateway (https://f1000research.com/gateways/dreamchallenges/singlecelltranscriptomics), is focused on describing the details of our method used in the DREAM Challenge. To avoid ambiguity, in the revised manuscript, we changed the title of our manuscript to “Unsupervised gene selection for predicting cell spatial positions in the Drosophila embryo”.

Overall, this manuscript addresses an important scientific question. But the method needs more justifications and evidence to demonstrate its novelty and power.

We thank the reviewer for the comments. Our method is the only one that uses unsupervised hierarchical clustering for gene selection among the top performing methods as described in Table S2 of the DREAM Challenge consortium paper mentioned above. The power has been systematically demonstrated in our manuscript as well as the consortium paper using independent measurements and comprehensive comparisons.

View more View less

Competing Interests

No competing interests were disclosed.

Alongside their report, reviewers assign a status to the article:

Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested

Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.

Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions

[1] 1. Stuart T, Satija R: Integrative single-cell analysis. Nat Rev Genet. 2019; 20(5): 257–272. PubMed Abstract | Publisher Full Text

[2] 2. Farrell JA, Wang Y, Riesenfeld SJ, et al.: Single-cell reconstruction of developmental trajectories during zebrafish embryogenesis. Science. 2018; 360(6392): pii: eaar3131. PubMed Abstract | Publisher Full Text | Free Full Text

[3] 3. Karaiskos N, Wahle P, Alles J, et al.: The Drosophila embryo at single-cell transcriptome resolution. Science. 2017; 358(6360): 194–199. PubMed Abstract | Publisher Full Text

[4] 4. Satija R, Farrell JA, Gennert D, et al.: Spatial reconstruction of single-cell gene expression data. Nat Biotechnol. 2015; 33(5): 495–502. PubMed Abstract | Publisher Full Text | Free Full Text

[5] 5. Achim K, Pettit JB, Saraiva LR, et al.: High-throughput spatial mapping of single-cell RNA-seq data to tissue of origin. Nat Biotechnol. 2015; 33(5): 503–9. PubMed Abstract | Publisher Full Text

[6] 6. Moor AE, Harnik Y, Ben-Moshe S, et al.: Spatial reconstruction of single enterocytes uncovers broad zonation along the intestinal villus axis. Cell. 2018; 175(4): 1156–1167.e15. PubMed Abstract | Publisher Full Text

[7] 7. Nitzan M, Karaiskos N, Friedman N, et al.: Charting a tissue from single-cell transcriptomes. bioRxiv. 2018; 456350. Publisher Full Text

[8] 8. Shah S, Lubeck E, Zhou W, et al.: In situ transcription profiling of single cells reveals spatial organization of cells in the mouse hippocampus. Neuron. 2016; 92(2): 342–357. PubMed Abstract | Publisher Full Text | Free Full Text

[9] 9. Zhu Q, Shah S, Dries R, et al.: Identification of spatially associated subpopulations by combining scRNAseq and sequential fluorescence in situ hybridization data. Nat Biotechnol. 2018; 36: 1183–1190. PubMed Abstract | Publisher Full Text | Free Full Text

[10] 10. Moffitt JR, Bambah-Mukku D, Eichhorn SW, et al.: Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region. Science. 2018; 362(6416): pii: eaau5324. PubMed Abstract | Publisher Full Text | Free Full Text

[11] 11. Macosko EZ, Basu A, Satija R, et al.: Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets. Cell. 2015; 161(5): 1202–1214. PubMed Abstract | Publisher Full Text | Free Full Text

[12] 12. Charrad M, Ghazzali N, Boiteau V, et al.: NbClust: An R Package for Determining the Relevant Number of Clusters in a Data Set. J Stat Softw. 2014; 61(6): 1–36. Publisher Full Text

[13] 13. Chen Y, Mao D, Zhang Y, et al.: SCTC Challenge zho_team Submission. Synapse [dataset]. 2019. http://www.doi.org/10.7303/syn17056435

[14] 14. yancychy: ouyang-lab/SCTC-Challenge-zho_team v0.1 (Version v0.1). Zenodo. 2019. http://www.doi.org/10.5281/zenodo.3592532

An unsupervised learning method for reconstructing cell spatial organization with application to the DREAM Single Cell Transcriptomics Challenge

Abstract

Keywords

Introduction

Methods

Dataset

Gene selection

Binarization of scRNA-seq data

Compute the probability matrix between cells and bins

Select top bins based on the probability matrix

Figure 1. Workflow of our spatial position prediction method for scRNA-seq.

Figure 2. The distribution of the maximum probability values in the probability matrix for the bins for the 60, 40 and 20 driver genes scenarios.

Silhouette score to determine the hierarchical clustering number

Predict cell positions based on the clustering result

Performance evaluation

Results

Selecting genes

Figure 3.

Compare the predicted position and spatial gene expression

Figure 4.

Figure 5.

Conclusion

Data availability

Software availability

Acknowledgements

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated