scRICA: An R package for multiple-sample single-cell RNA-seq data integrative comparative analysis

Yan Li; Jason Shapiro; Qiaoshan Lin; Qing Gong; Michiko Ryu; Mengjie Chen

doi:10.12688/f1000research.153698.1

Home Browse scRICA: An R package for multiple-sample single-cell RNA-seq data...

ALL Metrics

-

Views

-

Downloads

Get PDF

Get XML

Export

▬

✚

Software Tool Article

scRICA: An R package for multiple-sample single-cell RNA-seq data integrative comparative analysis

[version 1; peer review: 1 approved with reservations, 1 not approved]

Yan Li ¹, Jason Shapiro¹, Qiaoshan Lin¹, Qing Gong², Michiko Ryu³, Mengjie Chen^1,4

Yan Li ¹, Jason Shapiro¹, [...] Qiaoshan Lin¹, Qing Gong², Michiko Ryu³, Mengjie Chen^1,4

PUBLISHED 20 Jan 2025

Author details Author details

¹ Center for Research Informatics, University of Chicago Division of the Biological Sciences, Chicago, Illinois, 60637, USA
² Department of Mathematics and Statistics, Loyola University Chicago, Chicago, Illinois, 60637, USA
³ Department of Physics, The University of Chicago, Chicago, Illinois, 60637, USA
⁴ Department of Medicine, Section of Genetic Medicine, The University of Chicago, Chicago, Illinois, 60637, USA

Yan Li
Roles: Conceptualization, Formal Analysis, Investigation, Resources, Software, Supervision, Writing – Original Draft Preparation, Writing – Review & Editing

Jason Shapiro
Roles: Conceptualization, Methodology, Software, Writing – Original Draft Preparation, Writing – Review & Editing

Qiaoshan Lin
Roles: Conceptualization, Software, Validation, Writing – Review & Editing

Qing Gong
Roles: Validation, Writing – Review & Editing

Michiko Ryu
Roles: Software, Validation, Writing – Review & Editing

Mengjie Chen
Roles: Funding Acquisition, Investigation, Methodology, Software, Writing – Original Draft Preparation, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

This article is included in the Bioinformatics gateway.

Abstract

Single-cell sequencing technologies offer unprecedented resolution to inspect transcriptomes and generate critical biological insights. As the number of cells and cell types increase in single-cell studies, the effort required to analyze the data surges dramatically, especially when comparative explorations need to be performed on large datasets with different cell types and various sample attributes, such as clinical samples from different age and ancestry groups. Due to the sequential nature of single-cell data analysis, many steps involving multiple method choices and parameter options need to be considered. The computational skills required for integrative and comparative analyses of large datasets with various sample attributes represent a substantial obstacle for many researchers. To address this challenge, we have developed scRICA, a systematic workflow tailored for integrative and comparative single-cell RNA sequencing (scRNA-seq) analysis. This approach streamlines the analytical process, ensuring efficient utilization of computational resources and facilitating scalability for large-scale datasets. With scRICA, researchers can conduct integrative and comparative scRNA-seq analyses with ease, empowering them to derive meaningful insights from their data in a timely manner. scRICA offers a versatile approach by allowing users to input various parameter options from a metadata table, which are inherited throughout the entire analysis workflow. This functionality greatly enhances the efficiency of programming for comparative analyses involving multiple sample attributes. As an R package, scRICA provides a user-friendly interface within the R environment, making it accessible to researchers familiar with R programming. Additionally, scRICA offers a command line execution option, allowing users to seamlessly integrate it into their computational pipelines or execute analyses on High-Performance Computing (HPC) systems. This combination of features ensures flexibility, ease of use, and scalability, making scRICA a valuable tool for comprehensive and efficient single-cell RNA sequencing analysis.

Keywords

single cells RNA-seq; workflow; integration and comparative analysis; data visualization

Corresponding author: Yan Li

Competing interests: No competing interests were disclosed.

Grant information: This work has been supported by the Biological Sciences Division at the University of Chicago with additional funding provided by the Institute for Translational Medicine, CTSA grant number UL1TR000430 from the National Institutes of Health.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright: © 2025 Li Y et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Li Y, Shapiro J, Lin Q et al. scRICA: An R package for multiple-sample single-cell RNA-seq data integrative comparative analysis [version 1; peer review: 1 approved with reservations, 1 not approved]. F1000Research 2025, 14:108 (https://doi.org/10.12688/f1000research.153698.1) First published: 20 Jan 2025, 14:108 (https://doi.org/10.12688/f1000research.153698.1) Latest published: 20 Jan 2025, 14:108 (https://doi.org/10.12688/f1000research.153698.1)

Introduction

In the past decade, rapid development of single-cell sequencing technologies has revolutionized our ability to characterize the cells that comprise complex tissues. With these tools at hand, many large-scale studies have been launched to create comprehensive reference maps of cell types from various tissues under different physiological and pathological conditions,^1–3 notable examples include the Human Cell Atlas,⁴ an international collaborative project attempting to create a blueprint of all normal cell types in humans. As the number of samples increase in single-cell studies, the effort required to analyze this type of data surges dramatically, especially when comparative explorations need to be performed on data with different cell types and various sample attributes.⁵ For example, in a study with samples from type 2 diabetes and healthy controls, researchers are interested in characterizing cell types not only by disease phenotype, but also by age, gender and ancestry. These clinical or demographic attributes are often nested within the phenotype data, making the programming for the analysis of each sub-category tedious. Therefore, a thorough comparative exploration of scRNA-seq data across different groups of sample attributes presents a substantial computational obstacle.

In fact, the inherent complexity of multi-sample scRNA-seq data necessitates a multi-step process using advanced statistical and computational methods for analysis. For example, the best practice pipeline implemented in Seurat includes the identification of an anchor gene set of highly expressed genes, data using the anchor gene set, data centering and scaling, nearest-neighbor graph construction, cluster detection, etc. Due to the sequential nature of the analysis, users may need to consider combinations of parameter options for multiple steps. This poses challenges for both programming efficiency and research reproducibility.

Here we introduce scRICA, an R package that greatly reduces programming efforts required for multi-sample scRNA-seq data analysis. scRICA allows parameter options inherited through the entire analysis workflow for each attribute group or sub-category of samples so that users can efficiently conduct a systematic exploration across attribute groups. Furthermore, scRICA produces detailed reports and publication-quality visualizations for each attribute group, which greatly enhances reproducibility and interpretation. In addition to the regular R functions, scRICA provides a command-line tool to simplify the implementation in a high-performance computing environment. With scRICA, we aim to reduce the obstacles in programming and improve analysis efficiency and reproducibility for multiple-sample scRNA-seq data. This package includes comprehensive documentation and example workflows, available at https://rpubs.com/yli_cri/1025790, to facilitate the replication of the software development and its use by others. By enabling efficient multiple comparisons across various experimental conditions and providing sufficient information and visualizations to interpret the expected output results, scRICA has the potential to significantly advance the field of single-cell data analysis.

Methods

Operation

scRICA categorizes the workflow of integrative and comparative analysis using multiple-sample scRNA-seq data into four steps ( Figure 1A): ‘Step 1’ for pre-processing and quality control; ‘Step 2’ for multi-sample integration; ‘Step 3’ for visualization by attribute groups; and ‘Step 4’ for downstream analysis including differential expression (DE) analysis, pseudo-time trajectory analysis, and cell clusters identification. Everything is implemented in R as an R package with various functions implementations, additionally it also offers command line execution option to make the implementations easier on High-Performance Computing (HPC) systems. Instructions for installing prerequisite packages are available at https://github.com/yan-cri/scRICA. Users can follow these instructions to set up the appropriate analysis environment in R.

Figure 1. scRICA schematic overview.

A. scRICA workflow analysis steps; B. scRICA analysis results outline; C&D. Cellular composition percentage stacking bar plots by different sample’s attribute groups; E&F. Marker genes dot plots with respect to different sample’s attribute groups; G&H. Heat map of gene of interest with respect to the selected sample’s attribute groups and cell types.

Workflow implementation

‘Step 1’ performs pre-processing and quality control checks for each individual sample listed in the input metadata table. This step is initiated by the function ‘processQC()’, with main parameters inputting from a user-provided metadata table (details in input and output structures). ‘Step 1’ performs: 1) counts importation into a SeuratObject; 2) doublet/multiplet detection and removal using DoubletDecon⁶; 3) quality control results visualization; 4) mitochondrial and ribosomal contents summarization; and 5) cell numbers summarization. It has additional options including: 1) ‘genomeSpecies’, which allows users to select the species of the reference genome; 2) ‘mtPerCutoff’, the cut-off for mitochondrial content values. Values above the cut-off indicate cells with low viability; 3) ‘extraFilter’, which allows users to further eliminate certain cells from the analysis. These cells need to be specified in a separate file with respect to each sample, and the full path of these files should be provided in the metadata table column ‘filterFname’; and 4) ‘multi-omics ’, which specifies the format of the input count matrices.

‘Step 2’ integrates all samples listed in the input metadata table using two popular integration methods: the CCA and RPCA algorithms in Seurat.^7,8 This step is performed on cells that pass the quality control from ‘Step 1’ via the function ‘getClusterMarkers()’. In ‘Step 2’, the number of gene features used for integration can be specified by the option ‘nfeatures’. Additionally, users can exclude ribosomal genes from the gene set by setting ‘ribo.removal = TRUE’. By default, the identified cell clusters are listed as numbers, (i.e., 0, 1, 2, etc.). scRICA allows users to further annotate cell clusters via the option ‘newAnnotationRscriptName’ for downstream analysis.

‘Step 3’ provides various visualization techniques to inspect cellular compositions and gene expression patterns of different cell types, with respect to sample attributes specified in the option ‘expCondCheck’. For example, we can analyze fallopian tube data from the Human Cell Atlas by anatomic sites (isthmus, ampulla, fimbriae, abbreviated as I, A and F) by specifying the anatomic site as the desired sample attribute. Four main functions are available: 1) Function ‘getClusterSummaryReplot()’ generates box plots to inspect cellular compositions across attribute groups. For example, users can visualize cellular compositions across anatomic sites (I, A and F) ( Figure 1C) and across patients (D1-7) ( Figure 1D). 2) Function ‘getGoiDotplot()’ generates dot plots for a pre-defined set of gene markers, inputting through a separate file. This plot can visualize normalized and center scaled gene expression values across attribute groups, with dot color representing the center scaled average gene expression of cells with each group and dot size representing the percentage of cells with non-zero expression. Gene annotations can be specified and displayed on top as a legend bar (as shown in Figure 1E and 1F). 3) Function ‘getGoiHeatmap()’ generates heatmaps for a pre-defined set of gene markers, which can be specified either in a separate file or through the option ‘geneNames’. This heatmap can visualize gene expression at the single-cell level ( Figure 1G) or the donor level ( Figure 1H) across groups, providing complementary information for the dot plot that display scaled average values. 4) Function ‘getScatterPlot()’ generates scatter plots for any two selected cell types or attribute groups via the option ‘selectedGroups’.

‘Step 4’ can perform downstream analysis including: 1) differential expression (DE) analysis by the function ‘getclusterExpCondDe()’; 2) over-expressed gene identification via the function ‘getExpCondClusterMarkers()’; 3) pseudo-time trajectory analysis via the function ‘getExpCondClusterPseudotime()’; and 4) sub-cluster identification via the function ‘getHippoRes()’. All analyses can be applied to any specified cell types, any specified attribute groups, or a combination of cell types and attributes. DE analysis can be conducted at single-cell level or pseudo-bulk level. The options for single-cell based DE include wilcox, MAST and t-test. For pseudo-bulk-based DE, expression levels are averaged and normalized for each donor with respect to the total number of cells, followed with DE test using DESeq2.⁹ The pseudo-time trajectory analysis in scRICA is performed with slingshot.¹⁰ Users can select any subset of cell types from specified attribute group for pseudo-time analysis. Sub-clustering identification is conducted by HIPPO,¹¹ which implements a zero-inflation test that runs iteratively to select heterogeneous features in order to refine and identify biologically important sub-cluster cell types.

Input and output structures

scRICA’s input includes a metadata table and a processed count matrix. The processed count matrix can be in different formats, including a processed count matrix saved in a folder, which is directly generated from the 10X genomic Cell Ranger analysis tool (https://support.10xgenomics.com/single-cell-gene-expression/software/overview/welcome), or a count matrix stored in text or hd5 format. The input metadata table includes two mandatory columns: the first column specifies all sample names, and the second column (‘path’) specifies the full path of the count matrix for each corresponding sample in a row. Optional columns include: 1) ‘doubletsRmMethod’, which specifies the doublets removal algorithm metroid, detroid,⁶ or both (designated by OL); 2) ‘filterFname’, which provides the full path of Excel files that specify cells to be removed from the analysis; and 3) ‘expCond*’, which allows the inputting of multiple sample attributes for each sample. These sample attributes can be inherited through the entire analysis workflow by setting the option ‘expCondCheck’ properly.

scRICA outputs all analysis results to a main folder named ‘scRICA_results’ by default or it can be specified by user via the option ‘resDirName’. Results from each step are organized into corresponding sub-folders ( Figure 1B). The integration analysis results are saved as an RDS object. All visualizations are saved as PDF files. QC summary, DE analysis, and trajectory analysis are saved as both text files and Rdata objects, making them easily accessible for additional analysis.

Conclusion

scRICA simplifies scRNA-seq’s integrative and comparative analysis as an easy 4-step workflow. It significantly reduces the requirement of computational programming skills used for complex datasets. The inheritance feature provides an efficient way to thoroughly explore data with nested groups of sample attributes and multiple cell types. The command line execution allows users to conduct analysis on high performance computers directly. scRICA will enhance the programming efficiency and reproducibility for the data analysis of large-scale single-cell studies.

Authors’ contributions

The source code of scRICA was developed by YL with the contributions from JS, QL and MC. Package usages have been tested and implemented by JS, QL, QG, and MR for various types data. YL and MC drafted the manuscript. All authors have edited, read and approved the final manuscript.

Ethics and consent

Ethical approval and consent were not required.

Data and software availability

Source code is available at: https://github.com/yan-cri/scRICA; the archived version of this release is available at: https://zenodo.org/doi/10.5281/zenodo.12786508; it is distributed under the license of CC0 1.0 Universal.

Availability and requirement: R >= 4.0.0 with pre-requisite packages installation requirements.

License: CC0 1.0 Universal

Any restrictions to use by non-academics : None

Demonstration example datasets: There are 2 demonstration datasets for scRICA, including 1) 6 samples count matrix demonstration data; and 2) integrated RDS file named as’scRICA_demo.rds’. They are available at https://zenodo.org/records/13128908¹² under the terms of the Creative Commons Zero.

Acknowledgments

We would like to thank Ernst Lengyel and Oni Basu’s research group for generation and sharing of scRNA-seq data for development and testing of this package. Additionally, we would like to thank Rohit Kulkarni’s research group for the generation and sharing of multi-omics data. We thank Jackie Roessier and Sarah Sumner for their editing support.

References

1. Svensson V, Vento-Tormo R, Teichmann SA: Exponential scaling of single-cell RNA-seq in the past decade. Nat. Protoc. 2018; 13(4): 599–604. Number: 4 Publisher: Nature Publishing Group. Accessed 2022-09-01. PubMed Abstract | Publisher Full Text
2. Angerer P, Simon L, Tritschler S, et al.: Single cells make big data: New challenges and opportunities in transcriptomics. Curr. Opin. Syst. Biol. 2017; 4: 85–91. Accessed 2022-09-01. Publisher Full Text
3. Zappia L, Theis FJ: Over 1000 tools reveal trends in the single-cell RNA-seq analysis landscape. Genome Biol. 2021; 22(1): 301. Accessed 2022-08-26. PubMed Abstract | Publisher Full Text | Free Full Text
4. Regev A, Teichmann SA, Lander ES, et al.: Human Cell Atlas Meeting Participants: The Human Cell Atlas. elife. 2017; 6: 27041. Publisher: eLife Sciences Publications, Ltd. Accessed 2022-09-01. PubMed Abstract | Publisher Full Text | Free Full Text
5. Kiselev VY, Andrews TS, Hemberg M: Challenges in unsupervised clustering of single-cell RNA-seq data. Nat. Rev. Genet. 2019; 20(5): 273–282. Number: 5 Publisher: Nature Publishing Group. Accessed 2022-08-26. Publisher Full Text
6. DePasquale EAK, Schnell DJ, Camp P-JV, et al.: DoubletDecon: Deconvoluting Doublets from Single-Cell RNA-Sequencing Data. Cell Rep. 2019; 29(6): 1718–1727.e8. Publisher: Elsevier. Accessed 2020-08-26. PubMed Abstract | Publisher Full Text | Free Full Text
7. Stuart T, Butler A, Hoffman P, et al.: Comprehensive integration of single-cell data. Cell. 2019; 177: 1888–1902.e21. PubMed Abstract | Publisher Full Text | Free Full Text
8. Hao Y, Hao S, Andersen-Nissen E III, et al.: Integrated analysis of multimodal single-cell data. Cell. 2021; 184: 3573–3587.e29. PubMed Abstract | Publisher Full Text | Free Full Text
9. Love MI, Huber W, Anders S: Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. bioRxiv, 002832. 2014. Accessed 2014-12-05. Publisher Full Text
10. Street K, Risso D, Fletcher RB, et al.: Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genomics. 2018; 477.
11. Kim T, Chen M: HIPPO: Heterogeneity-Induced Pre-Processing tOol.2021. R package version 1.6.0. Reference Source
12. Li Y: yan-cri/scRICA: scRICA v1 (Version v1). Zenodo. 2024. Publisher Full Text

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 20 Jan 2025

Author details Author details

¹ Center for Research Informatics, University of Chicago Division of the Biological Sciences, Chicago, Illinois, 60637, USA
² Department of Mathematics and Statistics, Loyola University Chicago, Chicago, Illinois, 60637, USA
³ Department of Physics, The University of Chicago, Chicago, Illinois, 60637, USA
⁴ Department of Medicine, Section of Genetic Medicine, The University of Chicago, Chicago, Illinois, 60637, USA

Yan Li
Roles: Conceptualization, Formal Analysis, Investigation, Resources, Software, Supervision, Writing – Original Draft Preparation, Writing – Review & Editing

Jason Shapiro
Roles: Conceptualization, Methodology, Software, Writing – Original Draft Preparation, Writing – Review & Editing

Qiaoshan Lin
Roles: Conceptualization, Software, Validation, Writing – Review & Editing

Qing Gong
Roles: Validation, Writing – Review & Editing

Michiko Ryu
Roles: Software, Validation, Writing – Review & Editing

Mengjie Chen
Roles: Funding Acquisition, Investigation, Methodology, Software, Writing – Original Draft Preparation, Writing – Review & Editing

Competing interests

No competing interests were disclosed.

Grant information

This work has been supported by the Biological Sciences Division at the University of Chicago with additional funding provided by the Institute for Translational Medicine, CTSA grant number UL1TR000430 from the National Institutes of Health.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Article Versions (1)

version 1

Published: 20 Jan 2025, 14:108

https://doi.org/10.12688/f1000research.153698.1

Copyright

© 2025 Li Y et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

0

SEE MORE DETAILS

CITE

how to cite this article

Li Y, Shapiro J, Lin Q et al. scRICA: An R package for multiple-sample single-cell RNA-seq data integrative comparative analysis [version 1; peer review: 1 approved with reservations, 1 not approved]. F1000Research 2025, 14:108 (https://doi.org/10.12688/f1000research.153698.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Version 1

VERSION 1

PUBLISHED 20 Jan 2025

Views

12

Reviewer Report 24 Feb 2025

Sebastiaan Valkiers, University of Antwerp, Antwerp, Belgium

Not Approved

https://doi.org/10.5256/f1000research.168622.r364338

The authors present scRICA, a convenience wrapper for comparative analysis of scRNA-seq datasets that integrates different steps of the data analysis process including QC, multi-sample integration, differential expression analysis, pseudo-time trajectories, and cell clusters annotation. As the authors rightly highlight, ... Continue reading

The authors present scRICA, a convenience wrapper for comparative analysis of scRNA-seq datasets that integrates different steps of the data analysis process including QC, multi-sample integration, differential expression analysis, pseudo-time trajectories, and cell clusters annotation. As the authors rightly highlight, performing these steps individually often requires significant computational expertise and careful consideration of parameters, which can be a barrier for many researchers. By providing a unified and user-friendly workflow, scRICA aims to reduce the complexity and programming burden associated with these analyses. The method is implemented as an R package, making it accessible to researchers familiar with the R environment. Additionally, the authors provide a well-documented example of the workflow, available at https://rpubs.com/yli_cri/1025790.

While the concept of simplifying complex workflows and integrating multiple tools is commendable, the manuscript currently lacks sufficient detail to demonstrate the unique value and practical advantages of scRICA over existing tools, particularly Seurat, which is widely adopted in the field.

Key comments below:

Fundamentally, scRICA does not introduce novel methodology. Rather, it consolidates commonly used methods for scRNA-seq data analysis, such as DoubletDecon, aspects of Seurat, HIPPO and slingshot. While certainly useful, the novelty of a pipeline like this is questionable. In recent years, there have been a multitude of methods that have attempted to create reproducible pipelines for analyzing scRNA-seq experiments (see Li K et al. (2023 [Ref-1]) https://doi.org/10.1186/s12864-023-09332-2 , Nouri N et al. (2023 [Ref-2) https://doi.org/10.1093/bioinformatics/btad760, Kubovčiak J, et al. (2023 [Ref-3]) https://doi.org/10.1093/bioadv/vbad089), with limited adoption by the community. It is unclear to me to what extent scRICA contributes additionally compared to these existing workflows. (1) It would only be fair to acknowledge that similar methods already exist and (2) the manuscript would greatly benefit from pointing out in what way scRICA has made improvements over these existing methods.

The manuscript would be significantly strengthened by including a detailed use case that illustrates scenarios where scRICA provides clear advantages over standard approaches. For instance, demonstrating how scRICA handles complex metadata or integrates multiple tools in a way that Seurat does not would help clarify its unique contributions. Without such examples, it is difficult to assess the practical utility and added value of scRICA.

The authors claim improvements in efficiency and scalability with scRICA. However, they do not include examples or results to back up this statement. The inclusion of a benchmark test against state-of-the-art workflows to illustrate efficiency advantages would be greatly appreciated.

While I appreciate the step-by-step explanations in the vignette (https://rpubs.com/yli_cri/1025790), I find that the documentation lacks a detailed reference for function arguments (e.g., a dedicated wiki or manual in addition to the vignette). Given that the main goal of scRICA is to standardize and simplify analysis workflows, there is a risk that users may treat the pipeline as a ‘black box’, running it without fully understanding the choices made at each step. Providing more detailed function documentation would help mitigate this by allowing users to make informed parameter selections rather than simply relying on defaults. This would enhance both transparency and interpretability of the results.

Minor comments:

Figure 1.
A panel: I am a bit confused by this panel. In my understanding, assigning the cell types is part of the later stages of scRNA-seq data analysis, but here it is shown as one of the input features. Wouldn’t it make more sense to display the GEx count matrix here?
B panel: This is a bit redundant and potentially confusing as the information is already present in the panel A table.
C-H panels: What is exactly shown here / which dataset is used? Please include in the methods section. Also, how can the results be interpreted?

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Yes
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

No
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

No

References

1. Li K, Sun YH, Ouyang Z, Negi S, et al.: scRNASequest: an ecosystem of scRNA-seq analysis, visualization, and publishing.BMC Genomics. 2023; 24 (1): 228 PubMed Abstract | Publisher Full Text
2. Nouri N, Kurlovs AH, Gaglia G, de Rinaldis E, et al.: Scaling up single-cell RNA-seq data analysis with CellBridge workflow.Bioinformatics. 2023; 39 (12). PubMed Abstract | Publisher Full Text
3. Kubovčiak J, Kolář M, Novotný J: Scdrake: a reproducible and scalable pipeline for scRNA-seq data analysis.Bioinform Adv. 2023; 3 (1): vbad089 PubMed Abstract | Publisher Full Text

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Bioinformatics, single-cell sequencing, adaptive immunology, statistics

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

CITE

Report a concern

Respond or Comment

Views

11

Reviewer Report 11 Feb 2025

Pedro L Baldoni, University of Pittsburgh, Pittsburgh, USA

Approved with Reservations

https://doi.org/10.5256/f1000research.168622.r363052

This article presents scRICA, an R package developed to facilitate and streamline the analysis of multi-sample scRNA-seq experiments. The article provides a high-level overview of the main functionalities of scRICA. The authors list several implemented wrapper functions designed to perform ... Continue reading

This article presents scRICA, an R package developed to facilitate and streamline the analysis of multi-sample scRNA-seq experiments. The article provides a high-level overview of the main functionalities of scRICA. The authors list several implemented wrapper functions designed to perform various tasks typical of scRNA-seq analysis workflows, including data quality control (QC), data integration, data visualization, and downstream analyses such as differential expression, gene marker identification, pseudotime analysis, and sub-cluster identification.

One of the strengths of the scRICA package is that it serves as a one-stop shop for several packages developed for the analysis of scRNA-seq data, such as DoubletDecon, Seurat, DESeq2, Slingshot, and HIPPO. As the authors state, “The computational skills required for integrative and comparative analyses of large datasets with various sample attributes represent a substantial obstacle for many researchers,” a sentiment with which I fully agree. Although the authors present a scRICA workflow elsewhere, I feel that this article, as the landmark publication of scRICA, would be strengthened by including (1) a use case example of the package, complete with example code and a description of the produced output, and (2) a more detailed description of the various default options used in their wrapper function implementations.

I also include minor comments below.

Minor Comments

- The hyperlinks listed under “Data and software availability” include a semicolon, making it appear as though the links are broken.

- Figure 1A would be significantly improved if the metadata table were presented in the same format as one would pass it to scRICA. As currently displayed, the merged rows, different colors, and descriptive/qualitative entries make it difficult to understand how the table should be formatted.

- It would be very helpful if the authors elaborated on the creation of the input count matrix from popular upstream tools besides Cell Ranger. Since the package aims to make complex multi-sample scRNA-seq analyses more accessible to researchers, it is reasonable to assume that not all readers will know how to create, for example, a count matrix in hdf5 format. Either providing a function to read in the output from upstream tools or explicitly showing users how to do so themselves would be highly beneficial.

- Similarly, explaining how the various output files generated by scRICA can be used and interpreted by the end user would also be very helpful.

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Partly
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Partly
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Yes

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Statistical bioinformatics, transcriptomics, epigenomics

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Respond or Comment

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 20 Jan 2025

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2
Version 1 20 Jan 25	read	read

Pedro L Baldoni, University of Pittsburgh, Pittsburgh, USA
Sebastiaan Valkiers, University of Antwerp, Antwerp, Belgium

Comments on this article

All Comments(0)

Add a comment

Sign up for content alerts

Browse by related subjects

Back to all reports

Reviewer Report

12 Views

24 Feb 2025 | for Version 1

Sebastiaan Valkiers, University of Antwerp, Antwerp, Belgium

12 Views Cite this report Responses(0)

Not Approved

The authors present scRICA, a convenience wrapper for comparative analysis of scRNA-seq datasets that integrates different steps of the data analysis process including QC, multi-sample integration, differential expression analysis, pseudo-time trajectories, and cell clusters annotation. As the authors rightly highlight, performing these steps individually often requires significant computational expertise and careful consideration of parameters, which can be a barrier for many researchers. By providing a unified and user-friendly workflow, scRICA aims to reduce the complexity and programming burden associated with these analyses. The method is implemented as an R package, making it accessible to researchers familiar with the R environment. Additionally, the authors provide a well-documented example of the workflow, available at https://rpubs.com/yli_cri/1025790.

While the concept of simplifying complex workflows and integrating multiple tools is commendable, the manuscript currently lacks sufficient detail to demonstrate the unique value and practical advantages of scRICA over existing tools, particularly Seurat, which is widely adopted in the field.

Key comments below:

Fundamentally, scRICA does not introduce novel methodology. Rather, it consolidates commonly used methods for scRNA-seq data analysis, such as DoubletDecon, aspects of Seurat, HIPPO and slingshot. While certainly useful, the novelty of a pipeline like this is questionable. In recent years, there have been a multitude of methods that have attempted to create reproducible pipelines for analyzing scRNA-seq experiments (see Li K et al. (2023 [Ref-1]) https://doi.org/10.1186/s12864-023-09332-2 , Nouri N et al. (2023 [Ref-2) https://doi.org/10.1093/bioinformatics/btad760, Kubovčiak J, et al. (2023 [Ref-3]) https://doi.org/10.1093/bioadv/vbad089), with limited adoption by the community. It is unclear to me to what extent scRICA contributes additionally compared to these existing workflows. (1) It would only be fair to acknowledge that similar methods already exist and (2) the manuscript would greatly benefit from pointing out in what way scRICA has made improvements over these existing methods.

The manuscript would be significantly strengthened by including a detailed use case that illustrates scenarios where scRICA provides clear advantages over standard approaches. For instance, demonstrating how scRICA handles complex metadata or integrates multiple tools in a way that Seurat does not would help clarify its unique contributions. Without such examples, it is difficult to assess the practical utility and added value of scRICA.

The authors claim improvements in efficiency and scalability with scRICA. However, they do not include examples or results to back up this statement. The inclusion of a benchmark test against state-of-the-art workflows to illustrate efficiency advantages would be greatly appreciated.

While I appreciate the step-by-step explanations in the vignette (https://rpubs.com/yli_cri/1025790), I find that the documentation lacks a detailed reference for function arguments (e.g., a dedicated wiki or manual in addition to the vignette). Given that the main goal of scRICA is to standardize and simplify analysis workflows, there is a risk that users may treat the pipeline as a ‘black box’, running it without fully understanding the choices made at each step. Providing more detailed function documentation would help mitigate this by allowing users to make informed parameter selections rather than simply relying on defaults. This would enhance both transparency and interpretability of the results.

Minor comments:

Figure 1.
A panel: I am a bit confused by this panel. In my understanding, assigning the cell types is part of the later stages of scRNA-seq data analysis, but here it is shown as one of the input features. Wouldn’t it make more sense to display the GEx count matrix here?
B panel: This is a bit redundant and potentially confusing as the information is already present in the panel A table.
C-H panels: What is exactly shown here / which dataset is used? Please include in the methods section. Also, how can the results be interpreted?

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Yes
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

No
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

No

References

1. Li K, Sun YH, Ouyang Z, Negi S, et al.: scRNASequest: an ecosystem of scRNA-seq analysis, visualization, and publishing.BMC Genomics. 2023; 24 (1): 228 PubMed Abstract | Publisher Full Text
2. Nouri N, Kurlovs AH, Gaglia G, de Rinaldis E, et al.: Scaling up single-cell RNA-seq data analysis with CellBridge workflow.Bioinformatics. 2023; 39 (12). PubMed Abstract | Publisher Full Text
3. Kubovčiak J, Kolář M, Novotný J: Scdrake: a reproducible and scalable pipeline for scRNA-seq data analysis.Bioinform Adv. 2023; 3 (1): vbad089 PubMed Abstract | Publisher Full Text

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Bioinformatics, single-cell sequencing, adaptive immunology, statistics

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

11 Views

11 Feb 2025 | for Version 1

Pedro L Baldoni, University of Pittsburgh, Pittsburgh, USA

11 Views Cite this report Responses(0)

Approved With Reservations

This article presents scRICA, an R package developed to facilitate and streamline the analysis of multi-sample scRNA-seq experiments. The article provides a high-level overview of the main functionalities of scRICA. The authors list several implemented wrapper functions designed to perform various tasks typical of scRNA-seq analysis workflows, including data quality control (QC), data integration, data visualization, and downstream analyses such as differential expression, gene marker identification, pseudotime analysis, and sub-cluster identification.

One of the strengths of the scRICA package is that it serves as a one-stop shop for several packages developed for the analysis of scRNA-seq data, such as DoubletDecon, Seurat, DESeq2, Slingshot, and HIPPO. As the authors state, “The computational skills required for integrative and comparative analyses of large datasets with various sample attributes represent a substantial obstacle for many researchers,” a sentiment with which I fully agree. Although the authors present a scRICA workflow elsewhere, I feel that this article, as the landmark publication of scRICA, would be strengthened by including (1) a use case example of the package, complete with example code and a description of the produced output, and (2) a more detailed description of the various default options used in their wrapper function implementations.

I also include minor comments below.

Minor Comments

- The hyperlinks listed under “Data and software availability” include a semicolon, making it appear as though the links are broken.

- Figure 1A would be significantly improved if the metadata table were presented in the same format as one would pass it to scRICA. As currently displayed, the merged rows, different colors, and descriptive/qualitative entries make it difficult to understand how the table should be formatted.

- It would be very helpful if the authors elaborated on the creation of the input count matrix from popular upstream tools besides Cell Ranger. Since the package aims to make complex multi-sample scRNA-seq analyses more accessible to researchers, it is reasonable to assume that not all readers will know how to create, for example, a count matrix in hdf5 format. Either providing a function to read in the output from upstream tools or explicitly showing users how to do so themselves would be highly beneficial.

- Similarly, explaining how the various output files generated by scRICA can be used and interpreted by the end user would also be very helpful.

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Partly
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Partly
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Statistical bioinformatics, transcriptomics, epigenomics

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Respond to this report

Responses (0)

[1] 1. Svensson V, Vento-Tormo R, Teichmann SA: Exponential scaling of single-cell RNA-seq in the past decade. Nat. Protoc. 2018; 13(4): 599–604. Number: 4 Publisher: Nature Publishing Group. Accessed 2022-09-01. PubMed Abstract | Publisher Full Text

[2] 2. Angerer P, Simon L, Tritschler S, et al.: Single cells make big data: New challenges and opportunities in transcriptomics. Curr. Opin. Syst. Biol. 2017; 4: 85–91. Accessed 2022-09-01. Publisher Full Text

[3] 3. Zappia L, Theis FJ: Over 1000 tools reveal trends in the single-cell RNA-seq analysis landscape. Genome Biol. 2021; 22(1): 301. Accessed 2022-08-26. PubMed Abstract | Publisher Full Text | Free Full Text

[4] 4. Regev A, Teichmann SA, Lander ES, et al.: Human Cell Atlas Meeting Participants: The Human Cell Atlas. elife. 2017; 6: 27041. Publisher: eLife Sciences Publications, Ltd. Accessed 2022-09-01. PubMed Abstract | Publisher Full Text | Free Full Text

[5] 5. Kiselev VY, Andrews TS, Hemberg M: Challenges in unsupervised clustering of single-cell RNA-seq data. Nat. Rev. Genet. 2019; 20(5): 273–282. Number: 5 Publisher: Nature Publishing Group. Accessed 2022-08-26. Publisher Full Text

[6] 6. DePasquale EAK, Schnell DJ, Camp P-JV, et al.: DoubletDecon: Deconvoluting Doublets from Single-Cell RNA-Sequencing Data. Cell Rep. 2019; 29(6): 1718–1727.e8. Publisher: Elsevier. Accessed 2020-08-26. PubMed Abstract | Publisher Full Text | Free Full Text

[7] 7. Stuart T, Butler A, Hoffman P, et al.: Comprehensive integration of single-cell data. Cell. 2019; 177: 1888–1902.e21. PubMed Abstract | Publisher Full Text | Free Full Text

[8] 8. Hao Y, Hao S, Andersen-Nissen E III, et al.: Integrated analysis of multimodal single-cell data. Cell. 2021; 184: 3573–3587.e29. PubMed Abstract | Publisher Full Text | Free Full Text

[9] 9. Love MI, Huber W, Anders S: Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. bioRxiv, 002832. 2014. Accessed 2014-12-05. Publisher Full Text

[10] 10. Street K, Risso D, Fletcher RB, et al.: Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genomics. 2018; 477.

[11] 11. Kim T, Chen M: HIPPO: Heterogeneity-Induced Pre-Processing tOol.2021. R package version 1.6.0. Reference Source

[12] 12. Li Y: yan-cri/scRICA: scRICA v1 (Version v1). Zenodo. 2024. Publisher Full Text

scRICA: An R package for multiple-sample single-cell RNA-seq data integrative comparative analysis

Abstract

Keywords

Introduction

Methods

Operation

Figure 1. scRICA schematic overview.

Workflow implementation

Input and output structures

Conclusion

Authors’ contributions

Ethics and consent

Data and software availability

Acknowledgments

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated