Gene expression data visualization tool on the o&sup2;S&sup2;PARC platform

Hiba Ben Aribi; Mengyuan Ding; Anmol Kiran

doi:10.12688/f1000research.126840.2

Home Browse Gene expression data visualization tool on the o²S²PARC...

ALL Metrics

Views

Downloads

Get PDF

Get XML

Export

▬

✚

Software Tool Article

Revised

Gene expression data visualization tool on the o²S²PARC platform

[version 2; peer review: 2 approved]

Hiba Ben Aribi¹, Mengyuan Ding ², Anmol Kiran^3,4

PUBLISHED 06 Feb 2023

Author details Author details

¹ Faculty of Sciences of Tunis, University of Tunis El Manar (UTM), Tunis, Tunisia
² Department of Neurology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
³ Clinical Research Program, Malawi-Liverpool Wellcome Trust, Blantyre, Malawi
⁴ Institute of Infection, Veterinary, and Ecological Sciences, University of Liverpool, Liverpool, UK

Hiba Ben Aribi
Roles: Conceptualization, Data Curation, Formal Analysis, Methodology, Project Administration, Supervision, Visualization, Writing – Original Draft Preparation, Writing – Review & Editing

Mengyuan Ding
Roles: Writing – Original Draft Preparation, Writing – Review & Editing

Anmol Kiran
Roles: Data Curation, Methodology, Software, Validation, Writing – Original Draft Preparation, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

This article is included in the Genomics and Genetics gateway.

This article is included in the Bioinformatics gateway.

Abstract

Background: The identification of differentially expressed genes and their associated biological processes, molecular function, and cellular components are essential for genetic disease studies because they present potential biomarkers and therapeutic targets.
Methods: In this study, we developed an o²S²PARC template to instantiate an interactive pipeline for gene expression data visualization, ontological mapping, and statistical evaluation. To demonstrate the tool's usefulness, we performed a case study on a publicly available dataset.
Results: The tool enables users to identify the differentially expressed genes (DEGs) and visualize them in a volcano plot format. Ontologies associated with the DEGs are assigned and visualized in barplots.
Conclusions: The “Expression data visualization” template is publicly available on the o²S²PARC platform.

Keywords

Visualization, Gene expression, Ontology, o²S²PARC

Corresponding author: Mengyuan Ding

Competing interests: No competing interests were disclosed.

Grant information: The author(s) declared that no grants were involved in supporting this work.

Copyright: © 2023 Ben Aribi H et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Ben Aribi H, Ding M and Kiran A. Gene expression data visualization tool on the o²S²PARC platform [version 2; peer review: 2 approved]. F1000Research 2023, 11:1267 (https://doi.org/10.12688/f1000research.126840.2) First published: 07 Nov 2022, 11:1267 (https://doi.org/10.12688/f1000research.126840.1) Latest published: 06 Feb 2023, 11:1267 (https://doi.org/10.12688/f1000research.126840.2)

Revised Amendments from Version 1

The article is revised according to the reviewers’ suggestions. More information is provided about the platforms, the tool (the function names of the cited Python packages are specified in the “Methods”-“Operation” section), and the browser extension (where the code locates and installation guide notes are amplified in “Methods” –“User guide extension”). We also further explained how to produce and interpret the ontology barplots (in the “Results” section) and the pipeline validation.

See the authors' detailed response to the review by Esra Neufeld
See the authors' detailed response to the review by Joost B. M. Wagenaar

Introduction

Transcriptome data has been used to understand the local microenvironment, molecular signals, and cell-cell interaction in cells, tissues, and organs in multiple diseases, such as Alzheimer’s disease,¹ Parkinson’s disease,² and many more. In this study, we focus on the gene expression data, particularly the differentially expressed genes (DEGs) and their associated ontologies: (i) the cellular component (CC) that describes the subcellular structures and macromolecular complexes, often used to annotate cellular locations of gene products; (ii) the biological process (BP) that describes the biological programs consisting of multiple molecular activities, such as DNA repair or signal transduction; (iii) and the molecular function (MF) that describes molecular-level activities performed by gene products, such as “catalysis” or “transport”.

This study was performed during the Stimulating Peripheral Activity to Relieve Conditions (SPARC) FAIR Codeathon in August 2022 organized by the National Institute of Health (NIH) SPARC program.³ The SPARC program was initiated to advance the understanding of nerve-organ interactions and to expedite the development of innovative therapies and devices that modulate electrical activity in nerves to promote organ function. It has adopted the FAIR data sharing policy (encompasses the principles of Findability, Accessibility, Interoperability, and Reusability), according to the SPARC Data Structure (SDS). Currently, there are multiple transcriptomic datasets available on the SPARC Portal,⁴ containing a wide range of species from humans, pigs, and mice to rats; anatomical structures include neurons for multiple organs and physiological systems; analysis methods include RNA sequencing, real-time PCR; small molecule FISH (RNAscope) probes, and multiple others.

We developed a gene expression data visualization tool to visualize the transcriptomics data on the SPARC Portal platform and created and published it on the o²S²PARC, Open Online Simulations for Stimulating Peripheral Activity to Relieve Conditions, platform – a simulation and analysis platform designed to study peripheral nerve system neuromodulation/stimulations and its physiological impact on organs.⁵ The o²S²PARC platform provides simulations in animal/human anatomical models with emulational organ and tissue-specific properties with the permission of conducting experiments from molecules to a body level.⁵ While the platform currently hosts tools for multiple biological and physiological analyses, it does not provide a tool for transcriptomics and gene expression data analysis or visualization.

In this article, We introduced a publicly available pipeline to visualize gene expression data and a chrome extension that guides the user from downloading the dataset from the SPARC portal to using the tool and generating the data.

Methods

The gene expression data visualization tool template

Implementation

The tool is created as a template on the o²S²PARC platform. The platform is accessible on all common web browsers. The tool makes use of pandas 1.4.3, bioinfokit 2.0.8, numpy 1.22.1, matplotlib 3.5.2, seaborn 0.11.2, and goatools 1.2.3. The required runtime environment is bundled along with the tool and automatically installed.

Operation

The tool includes two pipelines encoded in two separate python jupyterlab notebooks. We used Jupyterlab, as recommended by the o²S²PARC platform, because it provides interactive exploration, along with the possibility of providing guidance and instructions in line with scripted analyses. The first pipeline identifies the DEGs based on statistically determined p-values (by default, a threshold of 5% is applied to determine significance) and determines the expression profile of the genes:

• p-value > 0.05: “Not differentially expressed”
• p-value < 0.05 and LogFC (log-fold change) value > 0: “Upregulated”
• p-value < 0.05 and LogFC value < 0: “Downregulated”
• The DEGs are represented in a volcano plot generated using the visuz.GeneExpression.volcano() function from the bioinfokit Python package.⁶

The pipeline also performs the ontology analysis for the differentially expressed genes, to determine the cellular components, biological processes, and molecular functions associated with these genes.

The ontology analysis and result visualization are performed using the goatools.base, goatools.obo_parser, goatools.anno.genetogo_reader, and goatools.goea.go_enrichment_ns functions from the goatools Python package.⁷

The biological processes, molecular functions, and cellular component ontologies are represented in separate barplots, as in Figure 1. In total six barplots are created, three upregulated genes and three downregulated genes.

Figure 1. Example Barplot of statistically significant ontologies associated with differentially expressed genes.

The gene-related ontology data were downloaded from the NCBI database. The file for the human species is provided as default. The user needs to provide a file as input if the transcriptomics data relates to other species.

The second pipeline takes two CSV files as input. The CSV files correspond to gene expression data of a different dataset, to be further compared. Example data files are available in the project GitHub repository.

As in the first pipeline, the gene’s expression profiles and the DEGs are determined for the two datasets, separately. Then, we identified the common genes between the two datasets and those specific genes to one dataset.

Finally, the gene expression profiles in the two datasets are compiled in a single CSV file for further analysis, which includes the expression analysis result of the two combined datasets.

User guide extension

A web browser extension was developed, using HTML and CSS programming to guide users. The extension is helpful for the SPARC platform users. It guides the user step-by-step from downloading transcriptomics data from the SPARC portal database, through a raw-data analysis workflow, to explaining the “Gene expression data visualization” tool.

The extension code could be downloaded from the project GitHub repository and the extension could be installed using the developer mode on any browser. The steps from downloading the code to using the extension are provided in the GitHub repository.

Pipeline validation

The tool was initially created to visualize the SPARC Portal platform transcriptomics data. However, it could be used to visualize any expression data CSV file. The pipeline validation was performed using two datasets from the Gene Expression Omnibus (GEO) database⁸ corresponding to the early and advanced stages of multiple sclerosis disease (MS) in human patients (GSE 126802 and GSE 10800).

The early-stage dataset GSE126802⁹ provides microarray gene expression analysis raw data from the subcortical normal-appearing white matter from 18 MS donors and the white matter of 9 control donors. The advanced stage dataset GSE108000¹⁰ provides microarray gene expression data from 7 chronic active MS demyelinated lesions, 8 inactive MS lesions, and the white matter of 10 control donors.

The tool was used to visualize the first dataset data, to determine the genes and pathways implicated in the occurrence of the disease. Then we compared the two datasets to determine the genes and pathways implicated in the disease progression.

Results

The tool includes two pipelines, one to visualize the expression data from a single CSV file, and the second to compare two datasets.

The dataset expression data are visualized in a volcano plot format, as represented in Figure 2.

Figure 2. Volcano plot generated by the “Gene expression data visualization” tool.

The pipeline also determines the ontologies associated with the DEGs: (i) BP associated with upregulated genes; (ii) MF associated with upregulated genes; (iii) CC associated with upregulated genes; (iv) BP associated with downregulated genes; (v) MF associated with downregulated genes; and (vi) CC associated with downregulated genes. The statistically significant ontologies are represented in six barplots. The y-axis corresponds to the statistically significant ontology names. Also, the x-axis represents the percentage of the genes associated with the ontology per total genes (upregulated or downregulated).

The second pipeline determines the genes with similar expression profiles in the two datasets and most importantly those with different profiles, which is helpful for the comparison of two cells, tissues, or diseases.

It also generates a table summarizing the gene’s count, as represented in Table 1.

Table 1. Table summarizing the numbers of gene groups.

data2_expression	Downregulated	Not differentially expressed	Upregulated
data1_expression	Downregulated	Not differentially expressed	Upregulated
Downregulated	834	1048	82
Not differentially expressed	6850	35307	5292
Upregulated	47	792	415

Pipeline validation

The data analysis was performed for the unique purpose of validating the pipeline. The data and analysis results are available as supplementary materials on the project’s GitHub repository. However, no interpretation was performed since our article focuses on presenting the tool.

The tool is publicly available for all the o²S²PARC platform users. And the user guide Browser extension is available in the project repository.

Discussion and conclusion

Transcriptomics has been increasingly utilized by researchers and clinicians in prioritizing specific systems and networks,² finding biomarkers,¹¹ developing precision medicine strategies,¹¹ monitoring disease progressions, and predicting treatment effects.¹²

The expression data visualization tool is useful in helping transform the transcriptome data into visualizable differentially expressed genes (DEGs) and gene ontology (GO) analyses in a one-step standardized process and form. Nowadays, DEGs and GO are commonly utilized tools in detecting potential key pathways, molecules, and cells related to target tissues, organs, and diseases.¹¹^–¹⁴

The tool is build-in/hosted on the o²S²PARC platform. No direct bridge leading the tool from the SPARC portal platform currently exists. The browser extension plays an intermediate role in guiding the users from the SPARC portal toward the tool on the o²S²PARC platform, which is also available on the project GitHub repository.

Our work enhances the usability of the transcriptomics data on the SPARC portal by providing a specific data analysis and visualization tool that does not require any coding skills, to identify the gene expression differences between species, healthy or diseased population groups, individual subjects, and tissues. It also represents an example of how to use and contribute to the development of the o²S²PARC platform.

The current version requires processed gene expression data and only integrates a limited amount of transcriptomic analysis. However, future versions will integrate more features such as data preprocessing.

Data availability

No data is associated with this article.

Software availability

• Software available from the o2SPARC platform: https://osparc.io
• Source code available from: https://github.com/SPARC-FAIR-Codeathon/Transcriptomic_oSPARC
• Archived source code at the time of publication: https://doi.org/10.5281/zenodo.7541899 ¹⁵
• License: MIT

Acknowledgements

The authors thank the Stimulating Peripheral Activity to Relieve Conditions (SPARC) program and the National Institutes of Health (NIH) for their immense support during the SPARC FAIR codeathon.

References

1. Chew P: Transcriptional Networks of Microglia in Alzheimer’s Disease and Insights into Pathogenesis. Genes. 2019 Oct 12; 10(10): 798. PubMed Abstract | Publisher Full Text
2. Mroczek M, Desouky A, Sirry W: Imaging Transcriptomics in Neurodegenerative Diseases. J. Neuroimaging. 2021 Mar; 31(2): 244–250. Publisher Full Text
3. Stimulating Peripheral Activity to Relieve Conditions [Internet]. National Center for Advancing Translational Sciences. 2016 [cited 2022 Aug 10]. Reference Source
4. SPARC Portal:[cited 2022 Aug 11].Reference Source
5. the o2S2PARC modeling and simulation platform:[cited 2022 Aug 10].Reference Source
6. Renesh Bedre: reneshbedre/bioinfokit: Bioinformatics data analysis and visualization toolkit.2020, March 5.
7. Klopfenstein DV, Zhang L, Pedersen BS, et al.: GOATOOLS: A Python library for Gene Ontology analyses. Sci. Rep. 2018; 8: 10872. Publisher Full Text
8. Barrett T, Wilhite SE, Ledoux P, et al.: NCBI GEO: archive for functional genomics data sets--update. Nucleic Acids Res. 2013 Jan; 41: D991–D995. PubMed Abstract | Publisher Full Text
9. Melief J, Orre M, Bossers K, et al.: Transcriptome analysis of normal-appearing white matter reveals cortisol- and disease-associated gene expression profiles in multiple sclerosis. Acta Neuropathol. Commun. 2019 Apr 25; 7(1): 60. PubMed Abstract | Publisher Full Text
10. Hendrickx DAE, van Scheppingen J , van der Poel M , et al.: Gene Expression Profiling of Multiple Sclerosis Pathology Identifies Early Patterns of Demyelination Surrounding Chronic Active Lesions. Front. Immunol. 2017; 8: 1810. PubMed Abstract | Publisher Full Text
11. Zhang L, Chen D, Song D, et al.: Clinical and translational values of spatial transcriptomics. Sig. Transduct. Target Ther. 2022 Dec; 7(1): 111. PubMed Abstract | Publisher Full Text
12. Cartwright H, editor: Artificial Neural Networks. New York, NY:Springer US;2021 [cited 2022 Aug 12]; vol. 2190. . Methods in Molecular Biology. Publisher Full Text
13. Zhang L, Sun L, Zhang B, et al.: Identification of Differentially Expressed Genes (DEGs) Relevant to Prognosis of Ovarian Cancer by Use of Integrated Bioinformatics Analysis and Validation by Immunohistochemistry Assay. Med. Sci. Monit. 2019 Dec 24; 25: 9902–9912. PubMed Abstract | Publisher Full Text
14. Bardsley EN, Davis H, Ajijola OA, et al.: RNA Sequencing Reveals Novel Transcripts from Sympathetic Stellate Ganglia During Cardiac Sympathetic Hyperactivity. Sci. Rep. 2018 Dec; 8(1): 8633. PubMed Abstract | Publisher Full Text
15. Ben Aribi H, Ding M, Nickerson D, et al.: SPARC-FAIR-Codeathon/Transcriptomic_oSPARC: Version 2.0 (v2.0). Zenodo. 2023. Publisher Full Text

Comments on this article Comments (0)

Version 2

VERSION 2 PUBLISHED 07 Nov 2022

Author details Author details

Mengyuan Ding
Roles: Writing – Original Draft Preparation, Writing – Review & Editing

Anmol Kiran
Roles: Data Curation, Methodology, Software, Validation, Writing – Original Draft Preparation, Writing – Review & Editing

Competing interests

No competing interests were disclosed.

Grant information

The author(s) declared that no grants were involved in supporting this work.

Article Versions (2)

version 2

Revised

Published: 06 Feb 2023, 11:1267

https://doi.org/10.12688/f1000research.126840.2

version 1

Published: 07 Nov 2022, 11:1267

https://doi.org/10.12688/f1000research.126840.1

© 2023 Ben Aribi H et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

SEE MORE DETAILS

CITE

how to cite this article

Ben Aribi H, Ding M and Kiran A. Gene expression data visualization tool on the o²S²PARC platform [version 2; peer review: 2 approved]. F1000Research 2023, 11:1267 (https://doi.org/10.12688/f1000research.126840.2)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Version 2

VERSION 2

PUBLISHED 06 Feb 2023

Revised

Views

Reviewer Report 15 Feb 2023

Joost B. M. Wagenaar, Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA

Approved

https://doi.org/10.5256/f1000research.143516.r162566

The authors addressed the comments of the ... Continue reading

CITE

Report a concern

Respond or Comment

Views

Reviewer Report 13 Feb 2023

Esra Neufeld, Foundation for Research on Information Technologies in Society (IT'IS), Zurich, Switzerland

Approved

https://doi.org/10.5256/f1000research.143516.r162565

CITE

Report a concern

Respond or Comment

Version 1

VERSION 1

PUBLISHED 07 Nov 2022

Views

Reviewer Report 21 Dec 2022

Esra Neufeld, Foundation for Research on Information Technologies in Society (IT'IS), Zurich, Switzerland

Approved with Reservations

https://doi.org/10.5256/f1000research.139290.r155905

The present paper discusses a series of workflows established to analyze genetic expression data. They integrate different tools to produce intuitive visualization, help with interpretation, and provide basic statistics. The workflows are made available through an open, online platform (o2S2PARC) established as part of the NIH SPARC Program, which has also generated and published corresponding gene expression data. A particular aspect of interest concerns the platform and technologies used to support the creation, sharing, publication, and execution of such workflows.

The paper is kept short, but provides relevant illustrations and information. This reviewer is not sufficiently knowledgeable to provide qualified feedback on the innovativeness and adequacy of the selected data analysis methodology, but will focus on the implementation and innovative aspects thereof.

In general, seeing that the form of producing and sharing such an openly accessible workflow was far from standard – it involved a hackathon and the use of a novel and open platform for collaboratively establishing FAIR, sustainable, and reproducible computational modeling and data analysis – it would be desirable to provide more information about that process and its benefits/weaknesses/potential.

Minor feedback:

Abstract:
- “In this study, we developed an o²S²PARC template representing an interactive pipeline for the gene expression data visualization and ontologies data analysis and visualization.” -> “In this study, we developed an o²S²PARC template to instantiate an interactive pipeline for gene expression data visualization, ontological mapping, statistical evaluation.”
- “The ontologies associated with the DEGs are determined” -> “Ontologies associated with the DEGs are assigned”
Introduction:
- “and much more” -> “and many more”
- “We developed a gene expression data visualization tool created and published on the o²S²PARC, Open Online Simulations for Stimulating Peripheral Activity to Relieve Conditions platform, a simulation and analysis platform aiming to initially perform interactive peripheral nerve system neuromodulation/stimulations and to visualize its physiological impact on organs.” -> “We developed a gene expression data visualization tool and published it on the o²S²PARC, Open Online Simulations for Stimulating Peripheral Activity to Relieve Conditions, platform – a simulation and analysis platform designed to study peripheral nerve system neuromodulation/stimulations and its physiological impact on organs.”
- “However, the platform currently hosts […], but does not” -> “While the platform currently hosts […], it does not”
- Mention already in the introduction that SPARC has generated and published gene expression data (not only in the discussion).
- “open-accessible” -> “open-access” or “openly accessible”
- “across multiple scales” -> “across multiple dimensions”
- SPARC is mentioned repeatedly, before introducing it. Move that SPARC introduction sentence up.
- “to expedite the invention of therapeutic medicine and devices” -> “to expedite the development of innovative therapies and devices”
- “runs under” -> “has adopted”
- “consists of the principles of Findabile, Accessibile, Interoperabile, and Reusabile” -> “encompasses the principles of Findability, Accessibility, Interoperability, and Reusability”
Implementation
- “template in the” -> “template on the”
- “on all” -> “on all common”
- “It requires” -> “The tool makes use of”
- “All the requirements are integrated within the tool” -> “The required runtime environment is bundled along with the tool”
Operation
- Explain why JupyterLab is used (it provides interactive exploration, along with the possibility of providing guidance and instructions in line with scripted analyses)
- “The first pipeline identifies the DEGs based on the p-value, set to p-value < 0.05 as default,“ -> “The first pipeline identifies the DEGs based on statistically determined p-values (by default, a threshold of 5% is applied to determine significance)”
- “the LogFC” -> “LogFC”; twice; and introduce that abbreviation (log-fold change on a basis of 2, I assume)
- “in six similar separate” -> rephrase and provide more detail
- “genes-related” -> “gene-related”
- “as input if the transcriptomics data correspond to other species” -> “as input, if the transcriptomics data relates to other species”; or “pertains”
- “The second pipeline takes two CSV files as input.” What are these two CSV files? Two datasets for which the gene expression is to be compared?
- “The gene’s expression profile is determined, as in the first pipeline, for the two datasets.” -> “The gene’s expression profiles are determined, as in the first pipeline, for the two datasets.”
- “The common and uncommon genes count is performed.” Please, rephrase.
- “And the gene expression profiles in the two datasets are compiled in a single csv file for further analysis.” -> “Finally, the gene expression profiles in the two datasets are compiled in a single csv file for further analysis.” Is this file only a merged version of the original data, or does it include results of the analysis?
- “as a user guide” -> “to guide users”
- “for the new” -> “for new”
- “It guides the user step by step from downloading transcriptomics data from the SPARC portal database, providing a raw data analysis workflow, and explaining the “Gene expression data visualization” tool.” -> “It guides the user step-by-step from the download of transcriptomics data from the SPARC portal database, through a raw-data analysis workflow, to the explanation of the “Gene expression data visualization” tool.”
Results
- Are the results of the application to the multiple sclerosis data new/insightful? How are they to be interpreted?
- What are the “top ontologies”?
- “useful to compare” -> “useful for the comparison of”
- “resuming” -> “summarizing”
- “in the project repository” -> “in a project repository”
Discussion and conclusion
- “analysis methods involve” -> “analysis methods include”
- Is “favored” the right word?
- “and gene ontology (GO) analysis” -> “and gene ontology (GO) analyses”
- “standardized format” -> “standardized process and form”
- “changes among any target objects, such as species, tissues, and diseases” -> “differences between species, healthy or diseased population groups, individual subjects, and tissues”

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Partly
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Yes
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Partly
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Yes

Competing Interests: I am an investigator currently funded by the NIH SPARC program and involved in the development of the o2S2PARC platform.

Reviewer Expertise: Computational Life Sciences, EM-tissue interactions, Computational Modeling

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Author Response 06 Feb 2023

Jessica Ding

06 Feb 2023

Author Response

We here represent our sincere gratitude toward the reviewer’s advice and corrections.
Our replies and amendments were made point-by-point according to the review’s proposals.

1. In general, seeing that ... Continue reading We here represent our sincere gratitude toward the reviewer’s advice and corrections.
Our replies and amendments were made point-by-point according to the review’s proposals.

1. In general, seeing that the form of producing and sharing such an openly accessible workflow was far from standard – it involved a hackathon and the use of a novel and open platform for collaboratively establishing FAIR, sustainable, and reproducible computational modeling and data analysis – it would be desirable to provide more information about that process and its benefits/weaknesses/potential.

Reply: We thank the reviewer for pointing this out. In the “Introduction” section, we mentioned that the work was performed during a hackathon (“This study was performed during the Stimulating Peripheral Activity to Relieve Conditions (SPARC) FAIR Codeathon in August 2022 organized by the National Institute of Health (NIH) SPARC program").
The hosting platforms were also introduced in the following 2 paragraphs.

In the “Discussion” section ( last section), we discussed the benefits, weaknesses, and potentials of the tool in the last four paragraphs of the paper.

2. Minor feedback:
Abstract:
“In this study, we developed an o²S²PARC template representing an interactive pipeline for the gene expression data visualization and ontologies data analysis and visualization.” -> “In this study, we developed an o²S²PARC template to instantiate an interactive pipeline for gene expression data visualization, ontological mapping, and statistical evaluation.”

Reply: We thank the reviewer for pointing this out. We have made the changes accordingly.

“The ontologies associated with the DEGs are determined” -> “Ontologies associated with the DEGs are assigned”.

Reply: We thank the reviewer for pointing this out. We have made the changes accordingly.

Introduction:
“and much more” -> “and many more”

Reply: We thank the reviewer for pointing this out. We have made the changes accordingly.

“We developed a gene expression data visualization tool created and published on the o2S2PARC, Open Online Simulations for Stimulating Peripheral Activity to Relieve Conditions platform, a simulation and analysis platform aiming to initially perform interactive peripheral nerve system neuromodulation/stimulations and to visualize its physiological impact on organs.” -> “We developed a gene expression data visualization tool and published it on the o2S2PARC, Open Online Simulations for Stimulating Peripheral Activity to Relieve Conditions, platform – a simulation and analysis platform designed to study peripheral nerve system neuromodulation/stimulations and its physiological impact on organs.”

Reply: We thank the reviewer for pointing this out. We made the following changes: “We developed a gene expression data visualization tool to visualize the transcriptomics data on the SPARC Portal platform and created and published it on the o²S²PARC, Open Online Simulations for Stimulating Peripheral Activity to Relieve Conditions, platform – a simulation and analysis platform designed to study peripheral nerve system neuromodulation/stimulations and its physiological impact on organs.”

“However, the platform currently hosts […], but does not” -> “While the platform currently hosts […], it does not”

Reply: We thank the reviewer for pointing this out. We made the following changes: “While the platform currently hosts tools for multiple biological and physiological analyses, it does not provide a tool for transcriptomics and gene expression data analysis or visualization.”

Mention already in the introduction that SPARC has generated and published gene expression data (not only in the discussion).

Reply: We thank the reviewer for pointing this out. We made the changes by putting emphasis on introducing the SPARC in the “Introduction” section.

“open-accessible” -> “open-access” or “openly accessible”

Reply: We thank the reviewer for pointing out this error. We have made the changes.

“across multiple scales” -> “across multiple dimensions”

Reply: We thank the reviewer for pointing out this error. We have made the changes so the error no longer exists.

SPARC is mentioned repeatedly, before introducing it. Move that SPARC introduction sentence up.

Reply: We thank the reviewer for pointing this out. We have moved the SPARC introduction sentence up to the “Introduction” section- paragraph 2.

“to expedite the invention of therapeutic medicine and devices” -> “to expedite the development of innovative therapies and devices”

Reply: Thank you and we have made the changes accordingly.

“runs under” -> “has adopted”

Reply: Thank you and we have made the changes accordingly.

“consists of the principles of Findabile, Accessibile, Interoperabile, and Reusabile” -> “encompasses the principles of Findability, Accessibility, Interoperability, and Reusability”

Reply: Thank you and we have made the changes accordingly.

Implementation
“template in the” -> “template on the”

Reply: Thank you and we have made the changes accordingly.

“on all” -> “on all common”

Reply: Thank you and we have made the changes accordingly.

“It requires” -> “The tool makes use of”

Reply: Thank you and we have made the changes accordingly.

“All the requirements are integrated within the tool” -> “The required runtime environment is bundled along with the tool”

Reply: Thank you and we have made the changes accordingly.

Operation
Explain why JupyterLab is used (it provides interactive exploration, along with the possibility of providing guidance and instructions in line with scripted analyses)

Reply: We thank the reviewer for pointing this out. The reason for using jupyterLab was defined in the “Methods”- “Operation” section (first paragraph). We made the following changes: “We used Jupyterlab, as recommended by the o²S²PARC platform, because it provides interactive exploration, along with the possibility of providing guidance and instructions in line with scripted analyses. ”

“The first pipeline identifies the DEGs based on the p-value, set to p-value < 0.05 as default,” -> “The first pipeline identifies the DEGs based on statistically determined p-values (by default, a threshold of 5% is applied to determine significance)”

Reply: Thank you and we have made the changes accordingly.

“the LogFC” -> “LogFC”; twice; and introduce that abbreviation (log-fold change on a basis of 2, I assume)

Reply: Thank you and we have made the changes accordingly.

“in six similar separate” -> rephrase and provide more detail

Reply: We thank the reviewer for pointing this out. The information was clarified in the “Methods”- “Operation” section, as follows: “The pipeline also performs the ontology analysis for the differentially expressed genes, to determine the cellular components, biological processes, and molecular functions associated with these genes.“...“The biological processes, molecular functions, and cellular component ontologies are represented in separate Barplots, as in Figure 1. In total six barplots are created, three upregulated genes and three downregulated genes.

“genes-related” -> “gene-related”

Reply: Thank you and we have made the changes accordingly.

“as input if the transcriptomics data correspond to other species” -> “as input, if the transcriptomics data relates to other species”; or “pertains”

Reply: Thank you and we have made the changes accordingly.

“The second pipeline takes two CSV files as input.” What are these two CSV files? Two datasets for which the gene expression is to be compared?

Reply: We thank the reviewer for pointing this out. The required CSV files are specified in the “Methods”- “Operation” as follows: “The second pipeline takes two CSV files as input. The CSV files correspond to gene expression data of a different dataset, to be further compared. Example data files are available in the project GitHub repository.”

“The gene’s expression profile is determined, as in the first pipeline, for the two datasets.” -> “The gene’s expression profiles are determined, as in the first pipeline, for the two datasets.”

Reply: We thank the reviewer for pointing this out. We made the following changes: “As in the first pipeline, the gene’s expression profiles and the DEGs are determined for the two datasets, separately.”

“The common and uncommon genes count is performed.” Please, rephrase.

Reply: We thank the reviewer for this advice. We made the following changes: “Then, we identified the common genes between the two datasets and those specific genes to one dataset.”

“And the gene expression profiles in the two datasets are compiled in a single csv file for further analysis.” -> “Finally, the gene expression profiles in the two datasets are compiled in a single csv file for further analysis.” Is this file only a merged version of the original data, or does it include the results of the analysis?

Reply: We thank the reviewer for this advice. We have made the changes as follows: “Finally, the gene expression profiles in the two datasets are compiled in a single CSV file for further analysis, which includes the expression analysis result of the two combined datasets.”

“as a user guide” -> “to guide users”

Reply: We thank the reviewer for this advice and we have made the changes accordingly.

“for the new” -> “for new”

Reply: We thank the reviewer for this advice. We have made the changes accordingly.

“It guides the user step by step from downloading transcriptomics data from the SPARC portal database, providing a raw data analysis workflow, and explaining the “Gene expression data visualization” tool.” -> “It guides the user step-by-step from the download of transcriptomics data from the SPARC portal database, through a raw-data analysis workflow, to the explanation of the “Gene expression data visualization” tool.”

Reply: We thank the reviewer for this advice. We have made the changes as follows: “It guides the user step-by-step from downloading transcriptomics data from the SPARC portal database, through a raw-data analysis workflow, to explaining the “Gene expression data visualization” tool.”

Results
Are the results of the application to the multiple sclerosis data new/insightful? How are they to be interpreted?

Reply: We thank the reviewer for pointing this out. The data analysis was performed for the unique purpose of validating the pipeline. The datasets are publicly available and thus possibly studied by other researchers. However, to our knowledge, no others studies have compared these two datasets for the same purpose of studying the genes and ontologies implicated in multiple sclerosis disease progression. The results are available as supplementary materials on the project GitHub repository (https://github.com/SPARC-FAIR-Codeathon/Transcriptomic_oSPARC/tree/main/pipeline%20validation). But no interpretation was performed since our article focuses on presenting the tool, and we made the clarification as follows: “Pipeline validation:
The data analysis was performed for the unique purpose of validating the pipeline. The data and analysis results are available as supplementary materials on the project's GitHub repository. However, no interpretation was performed since our article focuses on presenting the tool.”

What are the “top ontologies”?

Reply: We apologize for this error and it was replaced by “statistically significant ontologies”.

“useful to compare” -> “useful for the comparison of”

Reply: Thank you for pointing out this error. We have made the changes accordingly.

“resuming” -> “summarizing”

Reply: Thank you for pointing out this error. We have made the changes accordingly.

“in the project repository” -> “in a project repository”

Reply: Thank you for pointing out this error. We have made the changes accordingly.

Discussion and conclusion

“analysis methods involve” -> “analysis methods include”

Reply: Thank you for your suggestion. We have made the changes accordingly. This sentence has been moved to the second paragraph of the “Introduction”.

Is “favored” the right word?

Reply: Thank you for your suggestion. We have changed the word to “utilized” accordingly.

“and gene ontology (GO) analysis” -> “and gene ontology (GO) analyses”

Reply: Thank you for your suggestion. We have made the change accordingly.

“standardized format” -> “standardized process and form”

Reply: Thank you for your suggestion. We have made the change accordingly.

“changes among any target objects, such as species, tissues, and diseases” -> “differences between species, healthy or diseased population groups, individual subjects, and tissues”

Reply: Thank you for your suggestion. We have made the change accordingly.
We here represent our sincere gratitude toward the reviewer’s advice and corrections.
Our replies and amendments were made point-by-point according to the review’s proposals.

1. In general, seeing that the form of producing and sharing such an openly accessible workflow was far from standard – it involved a hackathon and the use of a novel and open platform for collaboratively establishing FAIR, sustainable, and reproducible computational modeling and data analysis – it would be desirable to provide more information about that process and its benefits/weaknesses/potential.

Reply: We thank the reviewer for pointing this out. In the “Introduction” section, we mentioned that the work was performed during a hackathon (“This study was performed during the Stimulating Peripheral Activity to Relieve Conditions (SPARC) FAIR Codeathon in August 2022 organized by the National Institute of Health (NIH) SPARC program").
The hosting platforms were also introduced in the following 2 paragraphs.

In the “Discussion” section ( last section), we discussed the benefits, weaknesses, and potentials of the tool in the last four paragraphs of the paper.

2. Minor feedback:
Abstract:
“In this study, we developed an o²S²PARC template representing an interactive pipeline for the gene expression data visualization and ontologies data analysis and visualization.” -> “In this study, we developed an o²S²PARC template to instantiate an interactive pipeline for gene expression data visualization, ontological mapping, and statistical evaluation.”

Reply: We thank the reviewer for pointing this out. We have made the changes accordingly.

“The ontologies associated with the DEGs are determined” -> “Ontologies associated with the DEGs are assigned”.

Reply: We thank the reviewer for pointing this out. We have made the changes accordingly.

Introduction:
“and much more” -> “and many more”

Reply: We thank the reviewer for pointing this out. We have made the changes accordingly.

“We developed a gene expression data visualization tool created and published on the o2S2PARC, Open Online Simulations for Stimulating Peripheral Activity to Relieve Conditions platform, a simulation and analysis platform aiming to initially perform interactive peripheral nerve system neuromodulation/stimulations and to visualize its physiological impact on organs.” -> “We developed a gene expression data visualization tool and published it on the o2S2PARC, Open Online Simulations for Stimulating Peripheral Activity to Relieve Conditions, platform – a simulation and analysis platform designed to study peripheral nerve system neuromodulation/stimulations and its physiological impact on organs.”

Reply: We thank the reviewer for pointing this out. We made the following changes: “We developed a gene expression data visualization tool to visualize the transcriptomics data on the SPARC Portal platform and created and published it on the o²S²PARC, Open Online Simulations for Stimulating Peripheral Activity to Relieve Conditions, platform – a simulation and analysis platform designed to study peripheral nerve system neuromodulation/stimulations and its physiological impact on organs.”

“However, the platform currently hosts […], but does not” -> “While the platform currently hosts […], it does not”

Reply: We thank the reviewer for pointing this out. We made the following changes: “While the platform currently hosts tools for multiple biological and physiological analyses, it does not provide a tool for transcriptomics and gene expression data analysis or visualization.”

Mention already in the introduction that SPARC has generated and published gene expression data (not only in the discussion).

Reply: We thank the reviewer for pointing this out. We made the changes by putting emphasis on introducing the SPARC in the “Introduction” section.

“open-accessible” -> “open-access” or “openly accessible”

Reply: We thank the reviewer for pointing out this error. We have made the changes.

“across multiple scales” -> “across multiple dimensions”

Reply: We thank the reviewer for pointing out this error. We have made the changes so the error no longer exists.

SPARC is mentioned repeatedly, before introducing it. Move that SPARC introduction sentence up.

Reply: We thank the reviewer for pointing this out. We have moved the SPARC introduction sentence up to the “Introduction” section- paragraph 2.

“to expedite the invention of therapeutic medicine and devices” -> “to expedite the development of innovative therapies and devices”

Reply: Thank you and we have made the changes accordingly.

“runs under” -> “has adopted”

Reply: Thank you and we have made the changes accordingly.

“consists of the principles of Findabile, Accessibile, Interoperabile, and Reusabile” -> “encompasses the principles of Findability, Accessibility, Interoperability, and Reusability”

Reply: Thank you and we have made the changes accordingly.

Implementation
“template in the” -> “template on the”

Reply: Thank you and we have made the changes accordingly.

“on all” -> “on all common”

Reply: Thank you and we have made the changes accordingly.

“It requires” -> “The tool makes use of”

Reply: Thank you and we have made the changes accordingly.

“All the requirements are integrated within the tool” -> “The required runtime environment is bundled along with the tool”

Reply: Thank you and we have made the changes accordingly.

Operation
Explain why JupyterLab is used (it provides interactive exploration, along with the possibility of providing guidance and instructions in line with scripted analyses)

Reply: We thank the reviewer for pointing this out. The reason for using jupyterLab was defined in the “Methods”- “Operation” section (first paragraph). We made the following changes: “We used Jupyterlab, as recommended by the o²S²PARC platform, because it provides interactive exploration, along with the possibility of providing guidance and instructions in line with scripted analyses. ”

“The first pipeline identifies the DEGs based on the p-value, set to p-value < 0.05 as default,” -> “The first pipeline identifies the DEGs based on statistically determined p-values (by default, a threshold of 5% is applied to determine significance)”

Reply: Thank you and we have made the changes accordingly.

“the LogFC” -> “LogFC”; twice; and introduce that abbreviation (log-fold change on a basis of 2, I assume)

Reply: Thank you and we have made the changes accordingly.

“in six similar separate” -> rephrase and provide more detail

Reply: We thank the reviewer for pointing this out. The information was clarified in the “Methods”- “Operation” section, as follows: “The pipeline also performs the ontology analysis for the differentially expressed genes, to determine the cellular components, biological processes, and molecular functions associated with these genes.“...“The biological processes, molecular functions, and cellular component ontologies are represented in separate Barplots, as in Figure 1. In total six barplots are created, three upregulated genes and three downregulated genes.

“genes-related” -> “gene-related”

Reply: Thank you and we have made the changes accordingly.

“as input if the transcriptomics data correspond to other species” -> “as input, if the transcriptomics data relates to other species”; or “pertains”

Reply: Thank you and we have made the changes accordingly.

“The second pipeline takes two CSV files as input.” What are these two CSV files? Two datasets for which the gene expression is to be compared?

Reply: We thank the reviewer for pointing this out. The required CSV files are specified in the “Methods”- “Operation” as follows: “The second pipeline takes two CSV files as input. The CSV files correspond to gene expression data of a different dataset, to be further compared. Example data files are available in the project GitHub repository.”

“The gene’s expression profile is determined, as in the first pipeline, for the two datasets.” -> “The gene’s expression profiles are determined, as in the first pipeline, for the two datasets.”

Reply: We thank the reviewer for pointing this out. We made the following changes: “As in the first pipeline, the gene’s expression profiles and the DEGs are determined for the two datasets, separately.”

“The common and uncommon genes count is performed.” Please, rephrase.

Reply: We thank the reviewer for this advice. We made the following changes: “Then, we identified the common genes between the two datasets and those specific genes to one dataset.”

“And the gene expression profiles in the two datasets are compiled in a single csv file for further analysis.” -> “Finally, the gene expression profiles in the two datasets are compiled in a single csv file for further analysis.” Is this file only a merged version of the original data, or does it include the results of the analysis?

Reply: We thank the reviewer for this advice. We have made the changes as follows: “Finally, the gene expression profiles in the two datasets are compiled in a single CSV file for further analysis, which includes the expression analysis result of the two combined datasets.”

“as a user guide” -> “to guide users”

Reply: We thank the reviewer for this advice and we have made the changes accordingly.

“for the new” -> “for new”

Reply: We thank the reviewer for this advice. We have made the changes accordingly.

“It guides the user step by step from downloading transcriptomics data from the SPARC portal database, providing a raw data analysis workflow, and explaining the “Gene expression data visualization” tool.” -> “It guides the user step-by-step from the download of transcriptomics data from the SPARC portal database, through a raw-data analysis workflow, to the explanation of the “Gene expression data visualization” tool.”

Reply: We thank the reviewer for this advice. We have made the changes as follows: “It guides the user step-by-step from downloading transcriptomics data from the SPARC portal database, through a raw-data analysis workflow, to explaining the “Gene expression data visualization” tool.”

Results
Are the results of the application to the multiple sclerosis data new/insightful? How are they to be interpreted?

Reply: We thank the reviewer for pointing this out. The data analysis was performed for the unique purpose of validating the pipeline. The datasets are publicly available and thus possibly studied by other researchers. However, to our knowledge, no others studies have compared these two datasets for the same purpose of studying the genes and ontologies implicated in multiple sclerosis disease progression. The results are available as supplementary materials on the project GitHub repository (https://github.com/SPARC-FAIR-Codeathon/Transcriptomic_oSPARC/tree/main/pipeline%20validation). But no interpretation was performed since our article focuses on presenting the tool, and we made the clarification as follows: “Pipeline validation:
The data analysis was performed for the unique purpose of validating the pipeline. The data and analysis results are available as supplementary materials on the project's GitHub repository. However, no interpretation was performed since our article focuses on presenting the tool.”

What are the “top ontologies”?

Reply: We apologize for this error and it was replaced by “statistically significant ontologies”.

“useful to compare” -> “useful for the comparison of”

Reply: Thank you for pointing out this error. We have made the changes accordingly.

“resuming” -> “summarizing”

Reply: Thank you for pointing out this error. We have made the changes accordingly.

“in the project repository” -> “in a project repository”

Reply: Thank you for pointing out this error. We have made the changes accordingly.

Discussion and conclusion

“analysis methods involve” -> “analysis methods include”

Reply: Thank you for your suggestion. We have made the changes accordingly. This sentence has been moved to the second paragraph of the “Introduction”.

Is “favored” the right word?

Reply: Thank you for your suggestion. We have changed the word to “utilized” accordingly.

“and gene ontology (GO) analysis” -> “and gene ontology (GO) analyses”

Reply: Thank you for your suggestion. We have made the change accordingly.

“standardized format” -> “standardized process and form”

Reply: Thank you for your suggestion. We have made the change accordingly.

“changes among any target objects, such as species, tissues, and diseases” -> “differences between species, healthy or diseased population groups, individual subjects, and tissues”

Reply: Thank you for your suggestion. We have made the change accordingly.
Competing Interests: The authors declare no conflicts of interest to the reviewer's opinion. Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response 06 Feb 2023

Jessica Ding

06 Feb 2023

Author Response

We here represent our sincere gratitude toward the reviewer’s advice and corrections.
Our replies and amendments were made point-by-point according to the review’s proposals.

1. In general, seeing that ... Continue reading We here represent our sincere gratitude toward the reviewer’s advice and corrections.
Our replies and amendments were made point-by-point according to the review’s proposals.

1. In general, seeing that the form of producing and sharing such an openly accessible workflow was far from standard – it involved a hackathon and the use of a novel and open platform for collaboratively establishing FAIR, sustainable, and reproducible computational modeling and data analysis – it would be desirable to provide more information about that process and its benefits/weaknesses/potential.

Reply: We thank the reviewer for pointing this out. In the “Introduction” section, we mentioned that the work was performed during a hackathon (“This study was performed during the Stimulating Peripheral Activity to Relieve Conditions (SPARC) FAIR Codeathon in August 2022 organized by the National Institute of Health (NIH) SPARC program").
The hosting platforms were also introduced in the following 2 paragraphs.

In the “Discussion” section ( last section), we discussed the benefits, weaknesses, and potentials of the tool in the last four paragraphs of the paper.

2. Minor feedback:
Abstract:
“In this study, we developed an o²S²PARC template representing an interactive pipeline for the gene expression data visualization and ontologies data analysis and visualization.” -> “In this study, we developed an o²S²PARC template to instantiate an interactive pipeline for gene expression data visualization, ontological mapping, and statistical evaluation.”

Reply: We thank the reviewer for pointing this out. We have made the changes accordingly.

“The ontologies associated with the DEGs are determined” -> “Ontologies associated with the DEGs are assigned”.

Reply: We thank the reviewer for pointing this out. We have made the changes accordingly.

Introduction:
“and much more” -> “and many more”

Reply: We thank the reviewer for pointing this out. We have made the changes accordingly.

“We developed a gene expression data visualization tool created and published on the o2S2PARC, Open Online Simulations for Stimulating Peripheral Activity to Relieve Conditions platform, a simulation and analysis platform aiming to initially perform interactive peripheral nerve system neuromodulation/stimulations and to visualize its physiological impact on organs.” -> “We developed a gene expression data visualization tool and published it on the o2S2PARC, Open Online Simulations for Stimulating Peripheral Activity to Relieve Conditions, platform – a simulation and analysis platform designed to study peripheral nerve system neuromodulation/stimulations and its physiological impact on organs.”

Reply: We thank the reviewer for pointing this out. We made the following changes: “We developed a gene expression data visualization tool to visualize the transcriptomics data on the SPARC Portal platform and created and published it on the o²S²PARC, Open Online Simulations for Stimulating Peripheral Activity to Relieve Conditions, platform – a simulation and analysis platform designed to study peripheral nerve system neuromodulation/stimulations and its physiological impact on organs.”

“However, the platform currently hosts […], but does not” -> “While the platform currently hosts […], it does not”

Reply: We thank the reviewer for pointing this out. We made the following changes: “While the platform currently hosts tools for multiple biological and physiological analyses, it does not provide a tool for transcriptomics and gene expression data analysis or visualization.”

Mention already in the introduction that SPARC has generated and published gene expression data (not only in the discussion).

Reply: We thank the reviewer for pointing this out. We made the changes by putting emphasis on introducing the SPARC in the “Introduction” section.

“open-accessible” -> “open-access” or “openly accessible”

Reply: We thank the reviewer for pointing out this error. We have made the changes.

“across multiple scales” -> “across multiple dimensions”

Reply: We thank the reviewer for pointing out this error. We have made the changes so the error no longer exists.

SPARC is mentioned repeatedly, before introducing it. Move that SPARC introduction sentence up.

Reply: We thank the reviewer for pointing this out. We have moved the SPARC introduction sentence up to the “Introduction” section- paragraph 2.

“to expedite the invention of therapeutic medicine and devices” -> “to expedite the development of innovative therapies and devices”

Reply: Thank you and we have made the changes accordingly.

“runs under” -> “has adopted”

Reply: Thank you and we have made the changes accordingly.

“consists of the principles of Findabile, Accessibile, Interoperabile, and Reusabile” -> “encompasses the principles of Findability, Accessibility, Interoperability, and Reusability”

Reply: Thank you and we have made the changes accordingly.

Implementation
“template in the” -> “template on the”

Reply: Thank you and we have made the changes accordingly.

“on all” -> “on all common”

Reply: Thank you and we have made the changes accordingly.

“It requires” -> “The tool makes use of”

Reply: Thank you and we have made the changes accordingly.

“All the requirements are integrated within the tool” -> “The required runtime environment is bundled along with the tool”

Reply: Thank you and we have made the changes accordingly.

Operation
Explain why JupyterLab is used (it provides interactive exploration, along with the possibility of providing guidance and instructions in line with scripted analyses)

Reply: We thank the reviewer for pointing this out. The reason for using jupyterLab was defined in the “Methods”- “Operation” section (first paragraph). We made the following changes: “We used Jupyterlab, as recommended by the o²S²PARC platform, because it provides interactive exploration, along with the possibility of providing guidance and instructions in line with scripted analyses. ”

“The first pipeline identifies the DEGs based on the p-value, set to p-value < 0.05 as default,” -> “The first pipeline identifies the DEGs based on statistically determined p-values (by default, a threshold of 5% is applied to determine significance)”

Reply: Thank you and we have made the changes accordingly.

“the LogFC” -> “LogFC”; twice; and introduce that abbreviation (log-fold change on a basis of 2, I assume)

Reply: Thank you and we have made the changes accordingly.

“in six similar separate” -> rephrase and provide more detail

Reply: We thank the reviewer for pointing this out. The information was clarified in the “Methods”- “Operation” section, as follows: “The pipeline also performs the ontology analysis for the differentially expressed genes, to determine the cellular components, biological processes, and molecular functions associated with these genes.“...“The biological processes, molecular functions, and cellular component ontologies are represented in separate Barplots, as in Figure 1. In total six barplots are created, three upregulated genes and three downregulated genes.

“genes-related” -> “gene-related”

Reply: Thank you and we have made the changes accordingly.

“as input if the transcriptomics data correspond to other species” -> “as input, if the transcriptomics data relates to other species”; or “pertains”

Reply: Thank you and we have made the changes accordingly.

“The second pipeline takes two CSV files as input.” What are these two CSV files? Two datasets for which the gene expression is to be compared?

Reply: We thank the reviewer for pointing this out. The required CSV files are specified in the “Methods”- “Operation” as follows: “The second pipeline takes two CSV files as input. The CSV files correspond to gene expression data of a different dataset, to be further compared. Example data files are available in the project GitHub repository.”

“The gene’s expression profile is determined, as in the first pipeline, for the two datasets.” -> “The gene’s expression profiles are determined, as in the first pipeline, for the two datasets.”

Reply: We thank the reviewer for pointing this out. We made the following changes: “As in the first pipeline, the gene’s expression profiles and the DEGs are determined for the two datasets, separately.”

“The common and uncommon genes count is performed.” Please, rephrase.

Reply: We thank the reviewer for this advice. We made the following changes: “Then, we identified the common genes between the two datasets and those specific genes to one dataset.”

“And the gene expression profiles in the two datasets are compiled in a single csv file for further analysis.” -> “Finally, the gene expression profiles in the two datasets are compiled in a single csv file for further analysis.” Is this file only a merged version of the original data, or does it include the results of the analysis?

Reply: We thank the reviewer for this advice. We have made the changes as follows: “Finally, the gene expression profiles in the two datasets are compiled in a single CSV file for further analysis, which includes the expression analysis result of the two combined datasets.”

“as a user guide” -> “to guide users”

Reply: We thank the reviewer for this advice and we have made the changes accordingly.

“for the new” -> “for new”

Reply: We thank the reviewer for this advice. We have made the changes accordingly.

“It guides the user step by step from downloading transcriptomics data from the SPARC portal database, providing a raw data analysis workflow, and explaining the “Gene expression data visualization” tool.” -> “It guides the user step-by-step from the download of transcriptomics data from the SPARC portal database, through a raw-data analysis workflow, to the explanation of the “Gene expression data visualization” tool.”

Reply: We thank the reviewer for this advice. We have made the changes as follows: “It guides the user step-by-step from downloading transcriptomics data from the SPARC portal database, through a raw-data analysis workflow, to explaining the “Gene expression data visualization” tool.”

Results
Are the results of the application to the multiple sclerosis data new/insightful? How are they to be interpreted?

Reply: We thank the reviewer for pointing this out. The data analysis was performed for the unique purpose of validating the pipeline. The datasets are publicly available and thus possibly studied by other researchers. However, to our knowledge, no others studies have compared these two datasets for the same purpose of studying the genes and ontologies implicated in multiple sclerosis disease progression. The results are available as supplementary materials on the project GitHub repository (https://github.com/SPARC-FAIR-Codeathon/Transcriptomic_oSPARC/tree/main/pipeline%20validation). But no interpretation was performed since our article focuses on presenting the tool, and we made the clarification as follows: “Pipeline validation:
The data analysis was performed for the unique purpose of validating the pipeline. The data and analysis results are available as supplementary materials on the project's GitHub repository. However, no interpretation was performed since our article focuses on presenting the tool.”

What are the “top ontologies”?

Reply: We apologize for this error and it was replaced by “statistically significant ontologies”.

“useful to compare” -> “useful for the comparison of”

Reply: Thank you for pointing out this error. We have made the changes accordingly.

“resuming” -> “summarizing”

Reply: Thank you for pointing out this error. We have made the changes accordingly.

“in the project repository” -> “in a project repository”

Reply: Thank you for pointing out this error. We have made the changes accordingly.

Discussion and conclusion

“analysis methods involve” -> “analysis methods include”

Reply: Thank you for your suggestion. We have made the changes accordingly. This sentence has been moved to the second paragraph of the “Introduction”.

Is “favored” the right word?

Reply: Thank you for your suggestion. We have changed the word to “utilized” accordingly.

“and gene ontology (GO) analysis” -> “and gene ontology (GO) analyses”

Reply: Thank you for your suggestion. We have made the change accordingly.

“standardized format” -> “standardized process and form”

Reply: Thank you for your suggestion. We have made the change accordingly.

“changes among any target objects, such as species, tissues, and diseases” -> “differences between species, healthy or diseased population groups, individual subjects, and tissues”

Reply: Thank you for your suggestion. We have made the change accordingly.
We here represent our sincere gratitude toward the reviewer’s advice and corrections.
Our replies and amendments were made point-by-point according to the review’s proposals.

1. In general, seeing that the form of producing and sharing such an openly accessible workflow was far from standard – it involved a hackathon and the use of a novel and open platform for collaboratively establishing FAIR, sustainable, and reproducible computational modeling and data analysis – it would be desirable to provide more information about that process and its benefits/weaknesses/potential.

Reply: We thank the reviewer for pointing this out. In the “Introduction” section, we mentioned that the work was performed during a hackathon (“This study was performed during the Stimulating Peripheral Activity to Relieve Conditions (SPARC) FAIR Codeathon in August 2022 organized by the National Institute of Health (NIH) SPARC program").
The hosting platforms were also introduced in the following 2 paragraphs.

In the “Discussion” section ( last section), we discussed the benefits, weaknesses, and potentials of the tool in the last four paragraphs of the paper.

2. Minor feedback:
Abstract:
“In this study, we developed an o²S²PARC template representing an interactive pipeline for the gene expression data visualization and ontologies data analysis and visualization.” -> “In this study, we developed an o²S²PARC template to instantiate an interactive pipeline for gene expression data visualization, ontological mapping, and statistical evaluation.”

Reply: We thank the reviewer for pointing this out. We have made the changes accordingly.

“The ontologies associated with the DEGs are determined” -> “Ontologies associated with the DEGs are assigned”.

Reply: We thank the reviewer for pointing this out. We have made the changes accordingly.

Introduction:
“and much more” -> “and many more”

Reply: We thank the reviewer for pointing this out. We have made the changes accordingly.

“We developed a gene expression data visualization tool created and published on the o2S2PARC, Open Online Simulations for Stimulating Peripheral Activity to Relieve Conditions platform, a simulation and analysis platform aiming to initially perform interactive peripheral nerve system neuromodulation/stimulations and to visualize its physiological impact on organs.” -> “We developed a gene expression data visualization tool and published it on the o2S2PARC, Open Online Simulations for Stimulating Peripheral Activity to Relieve Conditions, platform – a simulation and analysis platform designed to study peripheral nerve system neuromodulation/stimulations and its physiological impact on organs.”

Reply: We thank the reviewer for pointing this out. We made the following changes: “We developed a gene expression data visualization tool to visualize the transcriptomics data on the SPARC Portal platform and created and published it on the o²S²PARC, Open Online Simulations for Stimulating Peripheral Activity to Relieve Conditions, platform – a simulation and analysis platform designed to study peripheral nerve system neuromodulation/stimulations and its physiological impact on organs.”

“However, the platform currently hosts […], but does not” -> “While the platform currently hosts […], it does not”

Reply: We thank the reviewer for pointing this out. We made the following changes: “While the platform currently hosts tools for multiple biological and physiological analyses, it does not provide a tool for transcriptomics and gene expression data analysis or visualization.”

Mention already in the introduction that SPARC has generated and published gene expression data (not only in the discussion).

Reply: We thank the reviewer for pointing this out. We made the changes by putting emphasis on introducing the SPARC in the “Introduction” section.

“open-accessible” -> “open-access” or “openly accessible”

Reply: We thank the reviewer for pointing out this error. We have made the changes.

“across multiple scales” -> “across multiple dimensions”

Reply: We thank the reviewer for pointing out this error. We have made the changes so the error no longer exists.

SPARC is mentioned repeatedly, before introducing it. Move that SPARC introduction sentence up.

Reply: We thank the reviewer for pointing this out. We have moved the SPARC introduction sentence up to the “Introduction” section- paragraph 2.

“to expedite the invention of therapeutic medicine and devices” -> “to expedite the development of innovative therapies and devices”

Reply: Thank you and we have made the changes accordingly.

“runs under” -> “has adopted”

Reply: Thank you and we have made the changes accordingly.

“consists of the principles of Findabile, Accessibile, Interoperabile, and Reusabile” -> “encompasses the principles of Findability, Accessibility, Interoperability, and Reusability”

Reply: Thank you and we have made the changes accordingly.

Implementation
“template in the” -> “template on the”

Reply: Thank you and we have made the changes accordingly.

“on all” -> “on all common”

Reply: Thank you and we have made the changes accordingly.

“It requires” -> “The tool makes use of”

Reply: Thank you and we have made the changes accordingly.

“All the requirements are integrated within the tool” -> “The required runtime environment is bundled along with the tool”

Reply: Thank you and we have made the changes accordingly.

Operation
Explain why JupyterLab is used (it provides interactive exploration, along with the possibility of providing guidance and instructions in line with scripted analyses)

Reply: We thank the reviewer for pointing this out. The reason for using jupyterLab was defined in the “Methods”- “Operation” section (first paragraph). We made the following changes: “We used Jupyterlab, as recommended by the o²S²PARC platform, because it provides interactive exploration, along with the possibility of providing guidance and instructions in line with scripted analyses. ”

“The first pipeline identifies the DEGs based on the p-value, set to p-value < 0.05 as default,” -> “The first pipeline identifies the DEGs based on statistically determined p-values (by default, a threshold of 5% is applied to determine significance)”

Reply: Thank you and we have made the changes accordingly.

“the LogFC” -> “LogFC”; twice; and introduce that abbreviation (log-fold change on a basis of 2, I assume)

Reply: Thank you and we have made the changes accordingly.

“in six similar separate” -> rephrase and provide more detail

Reply: We thank the reviewer for pointing this out. The information was clarified in the “Methods”- “Operation” section, as follows: “The pipeline also performs the ontology analysis for the differentially expressed genes, to determine the cellular components, biological processes, and molecular functions associated with these genes.“...“The biological processes, molecular functions, and cellular component ontologies are represented in separate Barplots, as in Figure 1. In total six barplots are created, three upregulated genes and three downregulated genes.

“genes-related” -> “gene-related”

Reply: Thank you and we have made the changes accordingly.

“as input if the transcriptomics data correspond to other species” -> “as input, if the transcriptomics data relates to other species”; or “pertains”

Reply: Thank you and we have made the changes accordingly.

“The second pipeline takes two CSV files as input.” What are these two CSV files? Two datasets for which the gene expression is to be compared?

Reply: We thank the reviewer for pointing this out. The required CSV files are specified in the “Methods”- “Operation” as follows: “The second pipeline takes two CSV files as input. The CSV files correspond to gene expression data of a different dataset, to be further compared. Example data files are available in the project GitHub repository.”

“The gene’s expression profile is determined, as in the first pipeline, for the two datasets.” -> “The gene’s expression profiles are determined, as in the first pipeline, for the two datasets.”

Reply: We thank the reviewer for pointing this out. We made the following changes: “As in the first pipeline, the gene’s expression profiles and the DEGs are determined for the two datasets, separately.”

“The common and uncommon genes count is performed.” Please, rephrase.

Reply: We thank the reviewer for this advice. We made the following changes: “Then, we identified the common genes between the two datasets and those specific genes to one dataset.”

“And the gene expression profiles in the two datasets are compiled in a single csv file for further analysis.” -> “Finally, the gene expression profiles in the two datasets are compiled in a single csv file for further analysis.” Is this file only a merged version of the original data, or does it include the results of the analysis?

Reply: We thank the reviewer for this advice. We have made the changes as follows: “Finally, the gene expression profiles in the two datasets are compiled in a single CSV file for further analysis, which includes the expression analysis result of the two combined datasets.”

“as a user guide” -> “to guide users”

Reply: We thank the reviewer for this advice and we have made the changes accordingly.

“for the new” -> “for new”

Reply: We thank the reviewer for this advice. We have made the changes accordingly.

“It guides the user step by step from downloading transcriptomics data from the SPARC portal database, providing a raw data analysis workflow, and explaining the “Gene expression data visualization” tool.” -> “It guides the user step-by-step from the download of transcriptomics data from the SPARC portal database, through a raw-data analysis workflow, to the explanation of the “Gene expression data visualization” tool.”

Reply: We thank the reviewer for this advice. We have made the changes as follows: “It guides the user step-by-step from downloading transcriptomics data from the SPARC portal database, through a raw-data analysis workflow, to explaining the “Gene expression data visualization” tool.”

Results
Are the results of the application to the multiple sclerosis data new/insightful? How are they to be interpreted?

Reply: We thank the reviewer for pointing this out. The data analysis was performed for the unique purpose of validating the pipeline. The datasets are publicly available and thus possibly studied by other researchers. However, to our knowledge, no others studies have compared these two datasets for the same purpose of studying the genes and ontologies implicated in multiple sclerosis disease progression. The results are available as supplementary materials on the project GitHub repository (https://github.com/SPARC-FAIR-Codeathon/Transcriptomic_oSPARC/tree/main/pipeline%20validation). But no interpretation was performed since our article focuses on presenting the tool, and we made the clarification as follows: “Pipeline validation:
The data analysis was performed for the unique purpose of validating the pipeline. The data and analysis results are available as supplementary materials on the project's GitHub repository. However, no interpretation was performed since our article focuses on presenting the tool.”

What are the “top ontologies”?

Reply: We apologize for this error and it was replaced by “statistically significant ontologies”.

“useful to compare” -> “useful for the comparison of”

Reply: Thank you for pointing out this error. We have made the changes accordingly.

“resuming” -> “summarizing”

Reply: Thank you for pointing out this error. We have made the changes accordingly.

“in the project repository” -> “in a project repository”

Reply: Thank you for pointing out this error. We have made the changes accordingly.

Discussion and conclusion

“analysis methods involve” -> “analysis methods include”

Reply: Thank you for your suggestion. We have made the changes accordingly. This sentence has been moved to the second paragraph of the “Introduction”.

Is “favored” the right word?

Reply: Thank you for your suggestion. We have changed the word to “utilized” accordingly.

“and gene ontology (GO) analysis” -> “and gene ontology (GO) analyses”

Reply: Thank you for your suggestion. We have made the change accordingly.

“standardized format” -> “standardized process and form”

Reply: Thank you for your suggestion. We have made the change accordingly.

“changes among any target objects, such as species, tissues, and diseases” -> “differences between species, healthy or diseased population groups, individual subjects, and tissues”

Reply: Thank you for your suggestion. We have made the change accordingly.
Competing Interests: The authors declare no conflicts of interest to the reviewer's opinion. Close
Report a concern

Views

Reviewer Report 13 Dec 2022

Joost B. M. Wagenaar, Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA

Approved with Reservations

https://doi.org/10.5256/f1000research.139290.r155907

In this article, the authors describe a tool that runs on a platform called O2S2PARC. This tool allows users to visualize gene expression data using some standardized interactive approach. The authors mention that this tool could make it easier for investigators to analyze data in a one-step approach. The authors mention that the tool currently requires processed data files, but that this tool can be expanded to use raw data.

I am having a bit trouble understanding the need for this tool or some of the platforms and workflows it describes. It would be helpful for the authors to describe the SPARC platform, and the O2S2PARC platform in a bit more detail and explain how they are related to the python jupyter notebooks that were used to run the analysis.

Figure 1 shows an example of a barplot but it doesn't mention how this plot was generated or how users should interpret the data although this might not be a real concern as the focus of the manuscript is on the tool itself.

It is a bit unclear what the browser extension that is described in the article does or where it is available. The authors do provide access to a Github repository which has more extensive documentation on how to use the tool.

Overall, I think there could be a bit more information in the article to detail more specifically how the tool interacts with the other platforms that were mentioned and what problem it solves. The authors conclude that there is a need for better tools for visualizing DEGs and GO analysis and it is clear that this tool is a step in that direction.

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Partly
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Yes
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Partly
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Yes

Competing Interests: I am an investigator that is currently funded by the NIH SPARC program

Reviewer Expertise: Data management, timeseries analysis, fair-sharing,

CITE

Report a concern

Author Response 06 Feb 2023

Jessica Ding

06 Feb 2023

Author Response

We are grateful for the advice and questions raised by reviewer 1. Here are our revisions and replies for each point s/he had made.

1. I am having a ... Continue reading We are grateful for the advice and questions raised by reviewer 1. Here are our revisions and replies for each point s/he had made.

1. I am having a bit trouble understanding the need for this tool or some of the platforms and workflows it describes. It would be helpful for the authors to describe the SPARC platform, and the O2S2PARC platform in a bit more detail and explain how they are related to the python jupyter notebooks that were used to run the analysis.

Reply: We thank the reviewer for pointing this out. We have added details on SPARC platform and the O2S2PARC platform in the introduction section (second and third paragraphs). The use of the python jupyter notebooks is explained in the “Methods” section under the “Operation” subtitle (first paragraph).

Changes are made as follows:

“This study was performed during the Stimulating Peripheral Activity to Relieve Conditions (SPARC) FAIR Codeathon in August 2022 organized by the National Institute of Health (NIH) SPARC program15. The SPARC program was initiated to advance the understanding of nerve-organ interactions and to expedite the invention of therapeutic medicine and devices that modulate electrical activity in nerves to promote organ function. It runs under the FAIR data sharing policy (consists of the principle of Findable, Accessible, Interoperable, and Reusable), according to the SPARC Data Structure (SDS). Currently, there are multiple transcriptomic datasets available on the SPARC Portal9, containing a wide range of species from humans, pigs, mice to rats; anatomical structures include neurons for multiple organs and physiological systems; analysis methods involve RNA sequencing, real-time PCR; small molecule FISH (RNAscope) probes, and multiple others.

The expression data visualization tool was initially created to visualize the transcriptomics data on the SPARC Portal platform, and was created and published via the o²S²PARC (Open Online Simulations for Stimulating Peripheral Activity to Relieve Conditions) platform, a simulation and analysis platform aiming to initially perform interactive peripheral nerve system neuromodulation/stimulations and to visualize its physiological impact on organs3. The o²S²PARC platform provides simulations in animal/human anatomical models with emulational organ and tissue-specific properties in the permission of conducting experiments from molecules to a body level3.”

“We used the Jupyterlab, as recommended by the o²S²PARC platform, because it provides interactive exploration, along with the possibility of providing guidance and instructions in line with scripted analyses.”

2. Figure 1 shows an example of a barplot but it doesn't mention how this plot was generated or how users should interpret the data although this might not be a real concern as the focus of the manuscript is on the tool itself.

Reply: We thank the reviewer for pointing this out. Indeed, the ontology analysis and result visualization, in barplots, are performed using the Goatools Python package. The methodology to generate the barplot and how it should be interpreted are described in “Methods” - “Operation” section paragraph 2 (“The DEGs are represented in a volcano plot generated using the visuz.GeneExpression.volcano() function from the bioinfokit Python package.”) and in the “Results” section paragraph 3 (“The y-axis corresponds to the statistically significant ontology names. Also, the x-axis represents the percentage of the genes associated with the ontology per total genes (upregulated or downregulated).”) respectively.

3. It is a bit unclear what the browser extension that is described in the article does or where it is available. The authors do provide access to a GitHub repository which has more extensive documentation on how to use the tool.

Reply: We thank the reviewer for pointing this out. A brief introduction about the extension was added in the “Methods” - “User guide extension” section. Indeed, the browser extension serves as a user guide, from downloading the dataset from the SPARC portal to using the tool on the o²S²PARC platform and analyzing the data.

The changes are made as follows: “The extension code could be downloaded from the project GitHub repository and the extension could be installed using the developer mode on any browser. The steps from downloading the code to using the extension are provided in the GitHub repository: https://github.com/SPARC-FAIR-Codeathon/Transcriptomic_oSPARC/blob/main/Install%20the%20extension/README.md.”

4. Overall, I think there could be a bit more information in the article to detail more specifically how the tool interacts with the other platforms that were mentioned and what problem it solves.

Reply: We thank the reviewer for pointing this out. We explained the functions and roles it plays in the "Discussion" section by the following: "The tool is build-in/hosted on the o²S²PARC platform. No direct bridge leading the tool from the SPARC portal platform currently exists. The browser extension plays an intermediate role in guiding the users from the SPARC portal toward the tool on the o²S²PARC platform, which is also available on GitHub.

Our work enhances the usability of the transcriptomics data on the SPARC portal by providing a specific data analysis and visualization tool that does not require any coding skills, to identify the gene expression changes among any target objects, such as species, tissues, and diseases. It also represents an example of how to use and contribute to the development of the o²S²PARC platform." It is also reusable for future transcriptomic analysis on other platforms and portals.
We are grateful for the advice and questions raised by reviewer 1. Here are our revisions and replies for each point s/he had made.

1. I am having a bit trouble understanding the need for this tool or some of the platforms and workflows it describes. It would be helpful for the authors to describe the SPARC platform, and the O2S2PARC platform in a bit more detail and explain how they are related to the python jupyter notebooks that were used to run the analysis.

Reply: We thank the reviewer for pointing this out. We have added details on SPARC platform and the O2S2PARC platform in the introduction section (second and third paragraphs). The use of the python jupyter notebooks is explained in the “Methods” section under the “Operation” subtitle (first paragraph).

Changes are made as follows:

“This study was performed during the Stimulating Peripheral Activity to Relieve Conditions (SPARC) FAIR Codeathon in August 2022 organized by the National Institute of Health (NIH) SPARC program15. The SPARC program was initiated to advance the understanding of nerve-organ interactions and to expedite the invention of therapeutic medicine and devices that modulate electrical activity in nerves to promote organ function. It runs under the FAIR data sharing policy (consists of the principle of Findable, Accessible, Interoperable, and Reusable), according to the SPARC Data Structure (SDS). Currently, there are multiple transcriptomic datasets available on the SPARC Portal9, containing a wide range of species from humans, pigs, mice to rats; anatomical structures include neurons for multiple organs and physiological systems; analysis methods involve RNA sequencing, real-time PCR; small molecule FISH (RNAscope) probes, and multiple others.

The expression data visualization tool was initially created to visualize the transcriptomics data on the SPARC Portal platform, and was created and published via the o²S²PARC (Open Online Simulations for Stimulating Peripheral Activity to Relieve Conditions) platform, a simulation and analysis platform aiming to initially perform interactive peripheral nerve system neuromodulation/stimulations and to visualize its physiological impact on organs3. The o²S²PARC platform provides simulations in animal/human anatomical models with emulational organ and tissue-specific properties in the permission of conducting experiments from molecules to a body level3.”

“We used the Jupyterlab, as recommended by the o²S²PARC platform, because it provides interactive exploration, along with the possibility of providing guidance and instructions in line with scripted analyses.”

2. Figure 1 shows an example of a barplot but it doesn't mention how this plot was generated or how users should interpret the data although this might not be a real concern as the focus of the manuscript is on the tool itself.

Reply: We thank the reviewer for pointing this out. Indeed, the ontology analysis and result visualization, in barplots, are performed using the Goatools Python package. The methodology to generate the barplot and how it should be interpreted are described in “Methods” - “Operation” section paragraph 2 (“The DEGs are represented in a volcano plot generated using the visuz.GeneExpression.volcano() function from the bioinfokit Python package.”) and in the “Results” section paragraph 3 (“The y-axis corresponds to the statistically significant ontology names. Also, the x-axis represents the percentage of the genes associated with the ontology per total genes (upregulated or downregulated).”) respectively.

3. It is a bit unclear what the browser extension that is described in the article does or where it is available. The authors do provide access to a GitHub repository which has more extensive documentation on how to use the tool.

Reply: We thank the reviewer for pointing this out. A brief introduction about the extension was added in the “Methods” - “User guide extension” section. Indeed, the browser extension serves as a user guide, from downloading the dataset from the SPARC portal to using the tool on the o²S²PARC platform and analyzing the data.

The changes are made as follows: “The extension code could be downloaded from the project GitHub repository and the extension could be installed using the developer mode on any browser. The steps from downloading the code to using the extension are provided in the GitHub repository: https://github.com/SPARC-FAIR-Codeathon/Transcriptomic_oSPARC/blob/main/Install%20the%20extension/README.md.”

4. Overall, I think there could be a bit more information in the article to detail more specifically how the tool interacts with the other platforms that were mentioned and what problem it solves.

Reply: We thank the reviewer for pointing this out. We explained the functions and roles it plays in the "Discussion" section by the following: "The tool is build-in/hosted on the o²S²PARC platform. No direct bridge leading the tool from the SPARC portal platform currently exists. The browser extension plays an intermediate role in guiding the users from the SPARC portal toward the tool on the o²S²PARC platform, which is also available on GitHub.

Our work enhances the usability of the transcriptomics data on the SPARC portal by providing a specific data analysis and visualization tool that does not require any coding skills, to identify the gene expression changes among any target objects, such as species, tissues, and diseases. It also represents an example of how to use and contribute to the development of the o²S²PARC platform." It is also reusable for future transcriptomic analysis on other platforms and portals.
Competing Interests: The authors declare no conflict of interest to this review. Close
Report a concern
Respond or Comment

COMMENTS ON THIS REPORT

Author Response 06 Feb 2023

Jessica Ding

06 Feb 2023

Author Response

We are grateful for the advice and questions raised by reviewer 1. Here are our revisions and replies for each point s/he had made.

1. I am having a ... Continue reading We are grateful for the advice and questions raised by reviewer 1. Here are our revisions and replies for each point s/he had made.

1. I am having a bit trouble understanding the need for this tool or some of the platforms and workflows it describes. It would be helpful for the authors to describe the SPARC platform, and the O2S2PARC platform in a bit more detail and explain how they are related to the python jupyter notebooks that were used to run the analysis.

Reply: We thank the reviewer for pointing this out. We have added details on SPARC platform and the O2S2PARC platform in the introduction section (second and third paragraphs). The use of the python jupyter notebooks is explained in the “Methods” section under the “Operation” subtitle (first paragraph).

Changes are made as follows:

“This study was performed during the Stimulating Peripheral Activity to Relieve Conditions (SPARC) FAIR Codeathon in August 2022 organized by the National Institute of Health (NIH) SPARC program15. The SPARC program was initiated to advance the understanding of nerve-organ interactions and to expedite the invention of therapeutic medicine and devices that modulate electrical activity in nerves to promote organ function. It runs under the FAIR data sharing policy (consists of the principle of Findable, Accessible, Interoperable, and Reusable), according to the SPARC Data Structure (SDS). Currently, there are multiple transcriptomic datasets available on the SPARC Portal9, containing a wide range of species from humans, pigs, mice to rats; anatomical structures include neurons for multiple organs and physiological systems; analysis methods involve RNA sequencing, real-time PCR; small molecule FISH (RNAscope) probes, and multiple others.

The expression data visualization tool was initially created to visualize the transcriptomics data on the SPARC Portal platform, and was created and published via the o²S²PARC (Open Online Simulations for Stimulating Peripheral Activity to Relieve Conditions) platform, a simulation and analysis platform aiming to initially perform interactive peripheral nerve system neuromodulation/stimulations and to visualize its physiological impact on organs3. The o²S²PARC platform provides simulations in animal/human anatomical models with emulational organ and tissue-specific properties in the permission of conducting experiments from molecules to a body level3.”

“We used the Jupyterlab, as recommended by the o²S²PARC platform, because it provides interactive exploration, along with the possibility of providing guidance and instructions in line with scripted analyses.”

2. Figure 1 shows an example of a barplot but it doesn't mention how this plot was generated or how users should interpret the data although this might not be a real concern as the focus of the manuscript is on the tool itself.

Reply: We thank the reviewer for pointing this out. Indeed, the ontology analysis and result visualization, in barplots, are performed using the Goatools Python package. The methodology to generate the barplot and how it should be interpreted are described in “Methods” - “Operation” section paragraph 2 (“The DEGs are represented in a volcano plot generated using the visuz.GeneExpression.volcano() function from the bioinfokit Python package.”) and in the “Results” section paragraph 3 (“The y-axis corresponds to the statistically significant ontology names. Also, the x-axis represents the percentage of the genes associated with the ontology per total genes (upregulated or downregulated).”) respectively.

3. It is a bit unclear what the browser extension that is described in the article does or where it is available. The authors do provide access to a GitHub repository which has more extensive documentation on how to use the tool.

Reply: We thank the reviewer for pointing this out. A brief introduction about the extension was added in the “Methods” - “User guide extension” section. Indeed, the browser extension serves as a user guide, from downloading the dataset from the SPARC portal to using the tool on the o²S²PARC platform and analyzing the data.

The changes are made as follows: “The extension code could be downloaded from the project GitHub repository and the extension could be installed using the developer mode on any browser. The steps from downloading the code to using the extension are provided in the GitHub repository: https://github.com/SPARC-FAIR-Codeathon/Transcriptomic_oSPARC/blob/main/Install%20the%20extension/README.md.”

4. Overall, I think there could be a bit more information in the article to detail more specifically how the tool interacts with the other platforms that were mentioned and what problem it solves.

Reply: We thank the reviewer for pointing this out. We explained the functions and roles it plays in the "Discussion" section by the following: "The tool is build-in/hosted on the o²S²PARC platform. No direct bridge leading the tool from the SPARC portal platform currently exists. The browser extension plays an intermediate role in guiding the users from the SPARC portal toward the tool on the o²S²PARC platform, which is also available on GitHub.

Our work enhances the usability of the transcriptomics data on the SPARC portal by providing a specific data analysis and visualization tool that does not require any coding skills, to identify the gene expression changes among any target objects, such as species, tissues, and diseases. It also represents an example of how to use and contribute to the development of the o²S²PARC platform." It is also reusable for future transcriptomic analysis on other platforms and portals.
We are grateful for the advice and questions raised by reviewer 1. Here are our revisions and replies for each point s/he had made.

1. I am having a bit trouble understanding the need for this tool or some of the platforms and workflows it describes. It would be helpful for the authors to describe the SPARC platform, and the O2S2PARC platform in a bit more detail and explain how they are related to the python jupyter notebooks that were used to run the analysis.

Reply: We thank the reviewer for pointing this out. We have added details on SPARC platform and the O2S2PARC platform in the introduction section (second and third paragraphs). The use of the python jupyter notebooks is explained in the “Methods” section under the “Operation” subtitle (first paragraph).

Changes are made as follows:

“This study was performed during the Stimulating Peripheral Activity to Relieve Conditions (SPARC) FAIR Codeathon in August 2022 organized by the National Institute of Health (NIH) SPARC program15. The SPARC program was initiated to advance the understanding of nerve-organ interactions and to expedite the invention of therapeutic medicine and devices that modulate electrical activity in nerves to promote organ function. It runs under the FAIR data sharing policy (consists of the principle of Findable, Accessible, Interoperable, and Reusable), according to the SPARC Data Structure (SDS). Currently, there are multiple transcriptomic datasets available on the SPARC Portal9, containing a wide range of species from humans, pigs, mice to rats; anatomical structures include neurons for multiple organs and physiological systems; analysis methods involve RNA sequencing, real-time PCR; small molecule FISH (RNAscope) probes, and multiple others.

The expression data visualization tool was initially created to visualize the transcriptomics data on the SPARC Portal platform, and was created and published via the o²S²PARC (Open Online Simulations for Stimulating Peripheral Activity to Relieve Conditions) platform, a simulation and analysis platform aiming to initially perform interactive peripheral nerve system neuromodulation/stimulations and to visualize its physiological impact on organs3. The o²S²PARC platform provides simulations in animal/human anatomical models with emulational organ and tissue-specific properties in the permission of conducting experiments from molecules to a body level3.”

“We used the Jupyterlab, as recommended by the o²S²PARC platform, because it provides interactive exploration, along with the possibility of providing guidance and instructions in line with scripted analyses.”

2. Figure 1 shows an example of a barplot but it doesn't mention how this plot was generated or how users should interpret the data although this might not be a real concern as the focus of the manuscript is on the tool itself.

Reply: We thank the reviewer for pointing this out. Indeed, the ontology analysis and result visualization, in barplots, are performed using the Goatools Python package. The methodology to generate the barplot and how it should be interpreted are described in “Methods” - “Operation” section paragraph 2 (“The DEGs are represented in a volcano plot generated using the visuz.GeneExpression.volcano() function from the bioinfokit Python package.”) and in the “Results” section paragraph 3 (“The y-axis corresponds to the statistically significant ontology names. Also, the x-axis represents the percentage of the genes associated with the ontology per total genes (upregulated or downregulated).”) respectively.

3. It is a bit unclear what the browser extension that is described in the article does or where it is available. The authors do provide access to a GitHub repository which has more extensive documentation on how to use the tool.

Reply: We thank the reviewer for pointing this out. A brief introduction about the extension was added in the “Methods” - “User guide extension” section. Indeed, the browser extension serves as a user guide, from downloading the dataset from the SPARC portal to using the tool on the o²S²PARC platform and analyzing the data.

The changes are made as follows: “The extension code could be downloaded from the project GitHub repository and the extension could be installed using the developer mode on any browser. The steps from downloading the code to using the extension are provided in the GitHub repository: https://github.com/SPARC-FAIR-Codeathon/Transcriptomic_oSPARC/blob/main/Install%20the%20extension/README.md.”

4. Overall, I think there could be a bit more information in the article to detail more specifically how the tool interacts with the other platforms that were mentioned and what problem it solves.

Reply: We thank the reviewer for pointing this out. We explained the functions and roles it plays in the "Discussion" section by the following: "The tool is build-in/hosted on the o²S²PARC platform. No direct bridge leading the tool from the SPARC portal platform currently exists. The browser extension plays an intermediate role in guiding the users from the SPARC portal toward the tool on the o²S²PARC platform, which is also available on GitHub.

Our work enhances the usability of the transcriptomics data on the SPARC portal by providing a specific data analysis and visualization tool that does not require any coding skills, to identify the gene expression changes among any target objects, such as species, tissues, and diseases. It also represents an example of how to use and contribute to the development of the o²S²PARC platform." It is also reusable for future transcriptomic analysis on other platforms and portals.
Competing Interests: The authors declare no conflict of interest to this review. Close
Report a concern

Comments on this article Comments (0)

Version 2

VERSION 2 PUBLISHED 07 Nov 2022

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2
Version 2 (revision) 06 Feb 23	read	read
Version 1 07 Nov 22	read	read

Joost B. M. Wagenaar, University of Pennsylvania, Philadelphia, USA
Esra Neufeld, Foundation for Research on Information Technologies in Society (IT'IS), Zurich, Switzerland

Comments on this article

All Comments(0)

Add a comment

Browse by related subjects

Back to all reports

Reviewer Report

10 Views

15 Feb 2023 | for Version 2

Joost B. M. Wagenaar, Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA

10 Views Cite this report Responses(0)

Approved

The authors addressed the comments of the initial review and I have no further comments.

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Data management, timeseries analysis, fair-sharing,

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

8 Views

13 Feb 2023 | for Version 2

Esra Neufeld, Foundation for Research on Information Technologies in Society (IT'IS), Zurich, Switzerland

8 Views Cite this report Responses(0)

Approved

Thank you very much to the authors for their revision, which addresses nearly all points raised by this reviewer, as well as for the work they have performed and shared. At this point, the article can be indexed as is.

This reviewer would, however, still welcome a paragraph that provides more information on the process of the hackathon and the experience on working on a platform meant to facilitate collaborative computational modeling and the publication of studies and services.

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Computational Life Sciences, EM-tissue interactions, Computational Modeling

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

20 Views

21 Dec 2022 | for Version 1

Esra Neufeld, Foundation for Research on Information Technologies in Society (IT'IS), Zurich, Switzerland

20 Views Cite this report Responses(1)

Approved With Reservations

Abstract:
- “In this study, we developed an o²S²PARC template representing an interactive pipeline for the gene expression data visualization and ontologies data analysis and visualization.” -> “In this study, we developed an o²S²PARC template to instantiate an interactive pipeline for gene expression data visualization, ontological mapping, statistical evaluation.”
- “The ontologies associated with the DEGs are determined” -> “Ontologies associated with the DEGs are assigned”
Introduction:
- “and much more” -> “and many more”
- “We developed a gene expression data visualization tool created and published on the o²S²PARC, Open Online Simulations for Stimulating Peripheral Activity to Relieve Conditions platform, a simulation and analysis platform aiming to initially perform interactive peripheral nerve system neuromodulation/stimulations and to visualize its physiological impact on organs.” -> “We developed a gene expression data visualization tool and published it on the o²S²PARC, Open Online Simulations for Stimulating Peripheral Activity to Relieve Conditions, platform – a simulation and analysis platform designed to study peripheral nerve system neuromodulation/stimulations and its physiological impact on organs.”
- “However, the platform currently hosts […], but does not” -> “While the platform currently hosts […], it does not”
- Mention already in the introduction that SPARC has generated and published gene expression data (not only in the discussion).
- “open-accessible” -> “open-access” or “openly accessible”
- “across multiple scales” -> “across multiple dimensions”
- SPARC is mentioned repeatedly, before introducing it. Move that SPARC introduction sentence up.
- “to expedite the invention of therapeutic medicine and devices” -> “to expedite the development of innovative therapies and devices”
- “runs under” -> “has adopted”
- “consists of the principles of Findabile, Accessibile, Interoperabile, and Reusabile” -> “encompasses the principles of Findability, Accessibility, Interoperability, and Reusability”
Implementation
- “template in the” -> “template on the”
- “on all” -> “on all common”
- “It requires” -> “The tool makes use of”
- “All the requirements are integrated within the tool” -> “The required runtime environment is bundled along with the tool”
Operation
- Explain why JupyterLab is used (it provides interactive exploration, along with the possibility of providing guidance and instructions in line with scripted analyses)
- “The first pipeline identifies the DEGs based on the p-value, set to p-value < 0.05 as default,“ -> “The first pipeline identifies the DEGs based on statistically determined p-values (by default, a threshold of 5% is applied to determine significance)”
- “the LogFC” -> “LogFC”; twice; and introduce that abbreviation (log-fold change on a basis of 2, I assume)
- “in six similar separate” -> rephrase and provide more detail
- “genes-related” -> “gene-related”
- “as input if the transcriptomics data correspond to other species” -> “as input, if the transcriptomics data relates to other species”; or “pertains”
- “The second pipeline takes two CSV files as input.” What are these two CSV files? Two datasets for which the gene expression is to be compared?
- “The gene’s expression profile is determined, as in the first pipeline, for the two datasets.” -> “The gene’s expression profiles are determined, as in the first pipeline, for the two datasets.”
- “The common and uncommon genes count is performed.” Please, rephrase.
- “And the gene expression profiles in the two datasets are compiled in a single csv file for further analysis.” -> “Finally, the gene expression profiles in the two datasets are compiled in a single csv file for further analysis.” Is this file only a merged version of the original data, or does it include results of the analysis?
- “as a user guide” -> “to guide users”
- “for the new” -> “for new”
- “It guides the user step by step from downloading transcriptomics data from the SPARC portal database, providing a raw data analysis workflow, and explaining the “Gene expression data visualization” tool.” -> “It guides the user step-by-step from the download of transcriptomics data from the SPARC portal database, through a raw-data analysis workflow, to the explanation of the “Gene expression data visualization” tool.”
Results
- Are the results of the application to the multiple sclerosis data new/insightful? How are they to be interpreted?
- What are the “top ontologies”?
- “useful to compare” -> “useful for the comparison of”
- “resuming” -> “summarizing”
- “in the project repository” -> “in a project repository”
Discussion and conclusion
- “analysis methods involve” -> “analysis methods include”
- Is “favored” the right word?
- “and gene ontology (GO) analysis” -> “and gene ontology (GO) analyses”
- “standardized format” -> “standardized process and form”
- “changes among any target objects, such as species, tissues, and diseases” -> “differences between species, healthy or diseased population groups, individual subjects, and tissues”

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Partly
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Yes
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Partly
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Yes

Competing Interests

I am an investigator currently funded by the NIH SPARC program and involved in the development of the o2S2PARC platform.

Reviewer Expertise

Computational Life Sciences, EM-tissue interactions, Computational Modeling

Respond to this report

Responses (1)

Author Response

06 Feb 2023

Jessica Ding

We here represent our sincere gratitude toward the reviewer’s advice and corrections.
Our replies and amendments were made point-by-point according to the review’s proposals.

1. In general, seeing that the form of producing and sharing such an openly accessible workflow was far from standard – it involved a hackathon and the use of a novel and open platform for collaboratively establishing FAIR, sustainable, and reproducible computational modeling and data analysis – it would be desirable to provide more information about that process and its benefits/weaknesses/potential.

Reply: We thank the reviewer for pointing this out. In the “Introduction” section, we mentioned that the work was performed during a hackathon (“This study was performed during the Stimulating Peripheral Activity to Relieve Conditions (SPARC) FAIR Codeathon in August 2022 organized by the National Institute of Health (NIH) SPARC program").
The hosting platforms were also introduced in the following 2 paragraphs.

In the “Discussion” section ( last section), we discussed the benefits, weaknesses, and potentials of the tool in the last four paragraphs of the paper.

2. Minor feedback:
Abstract:
“In this study, we developed an o²S²PARC template representing an interactive pipeline for the gene expression data visualization and ontologies data analysis and visualization.” -> “In this study, we developed an o²S²PARC template to instantiate an interactive pipeline for gene expression data visualization, ontological mapping, and statistical evaluation.”

Reply: We thank the reviewer for pointing this out. We have made the changes accordingly.

“The ontologies associated with the DEGs are determined” -> “Ontologies associated with the DEGs are assigned”.

Reply: We thank the reviewer for pointing this out. We have made the changes accordingly.

Introduction:
“and much more” -> “and many more”

Reply: We thank the reviewer for pointing this out. We have made the changes accordingly.

“We developed a gene expression data visualization tool created and published on the o2S2PARC, Open Online Simulations for Stimulating Peripheral Activity to Relieve Conditions platform, a simulation and analysis platform aiming to initially perform interactive peripheral nerve system neuromodulation/stimulations and to visualize its physiological impact on organs.” -> “We developed a gene expression data visualization tool and published it on the o2S2PARC, Open Online Simulations for Stimulating Peripheral Activity to Relieve Conditions, platform – a simulation and analysis platform designed to study peripheral nerve system neuromodulation/stimulations and its physiological impact on organs.”

Reply: We thank the reviewer for pointing this out. We made the following changes: “We developed a gene expression data visualization tool to visualize the transcriptomics data on the SPARC Portal platform and created and published it on the o²S²PARC, Open Online Simulations for Stimulating Peripheral Activity to Relieve Conditions, platform – a simulation and analysis platform designed to study peripheral nerve system neuromodulation/stimulations and its physiological impact on organs.”

“However, the platform currently hosts […], but does not” -> “While the platform currently hosts […], it does not”

Reply: We thank the reviewer for pointing this out. We made the following changes: “While the platform currently hosts tools for multiple biological and physiological analyses, it does not provide a tool for transcriptomics and gene expression data analysis or visualization.”

Mention already in the introduction that SPARC has generated and published gene expression data (not only in the discussion).

Reply: We thank the reviewer for pointing this out. We made the changes by putting emphasis on introducing the SPARC in the “Introduction” section.

“open-accessible” -> “open-access” or “openly accessible”

Reply: We thank the reviewer for pointing out this error. We have made the changes.

“across multiple scales” -> “across multiple dimensions”

Reply: We thank the reviewer for pointing out this error. We have made the changes so the error no longer exists.

SPARC is mentioned repeatedly, before introducing it. Move that SPARC introduction sentence up.

Reply: We thank the reviewer for pointing this out. We have moved the SPARC introduction sentence up to the “Introduction” section- paragraph 2.

“to expedite the invention of therapeutic medicine and devices” -> “to expedite the development of innovative therapies and devices”

Reply: Thank you and we have made the changes accordingly.

“runs under” -> “has adopted”

Reply: Thank you and we have made the changes accordingly.

“consists of the principles of Findabile, Accessibile, Interoperabile, and Reusabile” -> “encompasses the principles of Findability, Accessibility, Interoperability, and Reusability”

Reply: Thank you and we have made the changes accordingly.

Implementation
“template in the” -> “template on the”

Reply: Thank you and we have made the changes accordingly.

“on all” -> “on all common”

Reply: Thank you and we have made the changes accordingly.

“It requires” -> “The tool makes use of”

Reply: Thank you and we have made the changes accordingly.

“All the requirements are integrated within the tool” -> “The required runtime environment is bundled along with the tool”

Reply: Thank you and we have made the changes accordingly.

Operation
Explain why JupyterLab is used (it provides interactive exploration, along with the possibility of providing guidance and instructions in line with scripted analyses)

Reply: We thank the reviewer for pointing this out. The reason for using jupyterLab was defined in the “Methods”- “Operation” section (first paragraph). We made the following changes: “We used Jupyterlab, as recommended by the o²S²PARC platform, because it provides interactive exploration, along with the possibility of providing guidance and instructions in line with scripted analyses. ”

“The first pipeline identifies the DEGs based on the p-value, set to p-value < 0.05 as default,” -> “The first pipeline identifies the DEGs based on statistically determined p-values (by default, a threshold of 5% is applied to determine significance)”

Reply: Thank you and we have made the changes accordingly.

“the LogFC” -> “LogFC”; twice; and introduce that abbreviation (log-fold change on a basis of 2, I assume)

Reply: Thank you and we have made the changes accordingly.

“in six similar separate” -> rephrase and provide more detail

Reply: We thank the reviewer for pointing this out. The information was clarified in the “Methods”- “Operation” section, as follows: “The pipeline also performs the ontology analysis for the differentially expressed genes, to determine the cellular components, biological processes, and molecular functions associated with these genes.“...“The biological processes, molecular functions, and cellular component ontologies are represented in separate Barplots, as in Figure 1. In total six barplots are created, three upregulated genes and three downregulated genes.

“genes-related” -> “gene-related”

Reply: Thank you and we have made the changes accordingly.

“as input if the transcriptomics data correspond to other species” -> “as input, if the transcriptomics data relates to other species”; or “pertains”

Reply: Thank you and we have made the changes accordingly.

“The second pipeline takes two CSV files as input.” What are these two CSV files? Two datasets for which the gene expression is to be compared?

Reply: We thank the reviewer for pointing this out. The required CSV files are specified in the “Methods”- “Operation” as follows: “The second pipeline takes two CSV files as input. The CSV files correspond to gene expression data of a different dataset, to be further compared. Example data files are available in the project GitHub repository.”

“The gene’s expression profile is determined, as in the first pipeline, for the two datasets.” -> “The gene’s expression profiles are determined, as in the first pipeline, for the two datasets.”

Reply: We thank the reviewer for pointing this out. We made the following changes: “As in the first pipeline, the gene’s expression profiles and the DEGs are determined for the two datasets, separately.”

“The common and uncommon genes count is performed.” Please, rephrase.

Reply: We thank the reviewer for this advice. We made the following changes: “Then, we identified the common genes between the two datasets and those specific genes to one dataset.”

“And the gene expression profiles in the two datasets are compiled in a single csv file for further analysis.” -> “Finally, the gene expression profiles in the two datasets are compiled in a single csv file for further analysis.” Is this file only a merged version of the original data, or does it include the results of the analysis?

Reply: We thank the reviewer for this advice. We have made the changes as follows: “Finally, the gene expression profiles in the two datasets are compiled in a single CSV file for further analysis, which includes the expression analysis result of the two combined datasets.”

“as a user guide” -> “to guide users”

Reply: We thank the reviewer for this advice and we have made the changes accordingly.

“for the new” -> “for new”

Reply: We thank the reviewer for this advice. We have made the changes accordingly.

“It guides the user step by step from downloading transcriptomics data from the SPARC portal database, providing a raw data analysis workflow, and explaining the “Gene expression data visualization” tool.” -> “It guides the user step-by-step from the download of transcriptomics data from the SPARC portal database, through a raw-data analysis workflow, to the explanation of the “Gene expression data visualization” tool.”

Reply: We thank the reviewer for this advice. We have made the changes as follows: “It guides the user step-by-step from downloading transcriptomics data from the SPARC portal database, through a raw-data analysis workflow, to explaining the “Gene expression data visualization” tool.”

Results
Are the results of the application to the multiple sclerosis data new/insightful? How are they to be interpreted?

Reply: We thank the reviewer for pointing this out. The data analysis was performed for the unique purpose of validating the pipeline. The datasets are publicly available and thus possibly studied by other researchers. However, to our knowledge, no others studies have compared these two datasets for the same purpose of studying the genes and ontologies implicated in multiple sclerosis disease progression. The results are available as supplementary materials on the project GitHub repository (https://github.com/SPARC-FAIR-Codeathon/Transcriptomic_oSPARC/tree/main/pipeline%20validation). But no interpretation was performed since our article focuses on presenting the tool, and we made the clarification as follows: “Pipeline validation:
The data analysis was performed for the unique purpose of validating the pipeline. The data and analysis results are available as supplementary materials on the project's GitHub repository. However, no interpretation was performed since our article focuses on presenting the tool.”

What are the “top ontologies”?

Reply: We apologize for this error and it was replaced by “statistically significant ontologies”.

“useful to compare” -> “useful for the comparison of”

Reply: Thank you for pointing out this error. We have made the changes accordingly.

“resuming” -> “summarizing”

Reply: Thank you for pointing out this error. We have made the changes accordingly.

“in the project repository” -> “in a project repository”

Reply: Thank you for pointing out this error. We have made the changes accordingly.

Discussion and conclusion

“analysis methods involve” -> “analysis methods include”

Reply: Thank you for your suggestion. We have made the changes accordingly. This sentence has been moved to the second paragraph of the “Introduction”.

Is “favored” the right word?

Reply: Thank you for your suggestion. We have changed the word to “utilized” accordingly.

“and gene ontology (GO) analysis” -> “and gene ontology (GO) analyses”

Reply: Thank you for your suggestion. We have made the change accordingly.

“standardized format” -> “standardized process and form”

Reply: Thank you for your suggestion. We have made the change accordingly.

“changes among any target objects, such as species, tissues, and diseases” -> “differences between species, healthy or diseased population groups, individual subjects, and tissues”

Reply: Thank you for your suggestion. We have made the change accordingly.

View more View less

Competing Interests

The authors declare no conflicts of interest to the reviewer's opinion.

Back to all reports

Reviewer Report

24 Views

13 Dec 2022 | for Version 1

Joost B. M. Wagenaar, Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA

24 Views Cite this report Responses(1)

Approved With Reservations

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Partly
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Yes
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Partly
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Yes

Competing Interests

I am an investigator that is currently funded by the NIH SPARC program

Reviewer Expertise

Data management, timeseries analysis, fair-sharing,

Respond to this report

Responses (1)

Author Response

06 Feb 2023

Jessica Ding

We are grateful for the advice and questions raised by reviewer 1. Here are our revisions and replies for each point s/he had made.

1. I am having a bit trouble understanding the need for this tool or some of the platforms and workflows it describes. It would be helpful for the authors to describe the SPARC platform, and the O2S2PARC platform in a bit more detail and explain how they are related to the python jupyter notebooks that were used to run the analysis.

Reply: We thank the reviewer for pointing this out. We have added details on SPARC platform and the O2S2PARC platform in the introduction section (second and third paragraphs). The use of the python jupyter notebooks is explained in the “Methods” section under the “Operation” subtitle (first paragraph).

Changes are made as follows:

“This study was performed during the Stimulating Peripheral Activity to Relieve Conditions (SPARC) FAIR Codeathon in August 2022 organized by the National Institute of Health (NIH) SPARC program15. The SPARC program was initiated to advance the understanding of nerve-organ interactions and to expedite the invention of therapeutic medicine and devices that modulate electrical activity in nerves to promote organ function. It runs under the FAIR data sharing policy (consists of the principle of Findable, Accessible, Interoperable, and Reusable), according to the SPARC Data Structure (SDS). Currently, there are multiple transcriptomic datasets available on the SPARC Portal9, containing a wide range of species from humans, pigs, mice to rats; anatomical structures include neurons for multiple organs and physiological systems; analysis methods involve RNA sequencing, real-time PCR; small molecule FISH (RNAscope) probes, and multiple others.

The expression data visualization tool was initially created to visualize the transcriptomics data on the SPARC Portal platform, and was created and published via the o²S²PARC (Open Online Simulations for Stimulating Peripheral Activity to Relieve Conditions) platform, a simulation and analysis platform aiming to initially perform interactive peripheral nerve system neuromodulation/stimulations and to visualize its physiological impact on organs3. The o²S²PARC platform provides simulations in animal/human anatomical models with emulational organ and tissue-specific properties in the permission of conducting experiments from molecules to a body level3.”

“We used the Jupyterlab, as recommended by the o²S²PARC platform, because it provides interactive exploration, along with the possibility of providing guidance and instructions in line with scripted analyses.”

2. Figure 1 shows an example of a barplot but it doesn't mention how this plot was generated or how users should interpret the data although this might not be a real concern as the focus of the manuscript is on the tool itself.

Reply: We thank the reviewer for pointing this out. Indeed, the ontology analysis and result visualization, in barplots, are performed using the Goatools Python package. The methodology to generate the barplot and how it should be interpreted are described in “Methods” - “Operation” section paragraph 2 (“The DEGs are represented in a volcano plot generated using the visuz.GeneExpression.volcano() function from the bioinfokit Python package.”) and in the “Results” section paragraph 3 (“The y-axis corresponds to the statistically significant ontology names. Also, the x-axis represents the percentage of the genes associated with the ontology per total genes (upregulated or downregulated).”) respectively.

3. It is a bit unclear what the browser extension that is described in the article does or where it is available. The authors do provide access to a GitHub repository which has more extensive documentation on how to use the tool.

Reply: We thank the reviewer for pointing this out. A brief introduction about the extension was added in the “Methods” - “User guide extension” section. Indeed, the browser extension serves as a user guide, from downloading the dataset from the SPARC portal to using the tool on the o²S²PARC platform and analyzing the data.

The changes are made as follows: “The extension code could be downloaded from the project GitHub repository and the extension could be installed using the developer mode on any browser. The steps from downloading the code to using the extension are provided in the GitHub repository: https://github.com/SPARC-FAIR-Codeathon/Transcriptomic_oSPARC/blob/main/Install%20the%20extension/README.md.”

4. Overall, I think there could be a bit more information in the article to detail more specifically how the tool interacts with the other platforms that were mentioned and what problem it solves.

Reply: We thank the reviewer for pointing this out. We explained the functions and roles it plays in the "Discussion" section by the following: "The tool is build-in/hosted on the o²S²PARC platform. No direct bridge leading the tool from the SPARC portal platform currently exists. The browser extension plays an intermediate role in guiding the users from the SPARC portal toward the tool on the o²S²PARC platform, which is also available on GitHub.

Our work enhances the usability of the transcriptomics data on the SPARC portal by providing a specific data analysis and visualization tool that does not require any coding skills, to identify the gene expression changes among any target objects, such as species, tissues, and diseases. It also represents an example of how to use and contribute to the development of the o²S²PARC platform." It is also reusable for future transcriptomic analysis on other platforms and portals.

View more View less

Competing Interests

The authors declare no conflict of interest to this review.

Alongside their report, reviewers assign a status to the article:

Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested

Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.

Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions

[1] 1. Chew P: Transcriptional Networks of Microglia in Alzheimer’s Disease and Insights into Pathogenesis. Genes. 2019 Oct 12; 10(10): 798. PubMed Abstract | Publisher Full Text

[2] 2. Mroczek M, Desouky A, Sirry W: Imaging Transcriptomics in Neurodegenerative Diseases. J. Neuroimaging. 2021 Mar; 31(2): 244–250. Publisher Full Text

[3] 3. Stimulating Peripheral Activity to Relieve Conditions [Internet]. National Center for Advancing Translational Sciences. 2016 [cited 2022 Aug 10]. Reference Source

[4] 4. SPARC Portal:[cited 2022 Aug 11].Reference Source

[5] 5. the o2S2PARC modeling and simulation platform:[cited 2022 Aug 10].Reference Source

[6] 6. Renesh Bedre: reneshbedre/bioinfokit: Bioinformatics data analysis and visualization toolkit.2020, March 5.

[7] 7. Klopfenstein DV, Zhang L, Pedersen BS, et al.: GOATOOLS: A Python library for Gene Ontology analyses. Sci. Rep. 2018; 8: 10872. Publisher Full Text

[8] 8. Barrett T, Wilhite SE, Ledoux P, et al.: NCBI GEO: archive for functional genomics data sets--update. Nucleic Acids Res. 2013 Jan; 41: D991–D995. PubMed Abstract | Publisher Full Text

[9] 9. Melief J, Orre M, Bossers K, et al.: Transcriptome analysis of normal-appearing white matter reveals cortisol- and disease-associated gene expression profiles in multiple sclerosis. Acta Neuropathol. Commun. 2019 Apr 25; 7(1): 60. PubMed Abstract | Publisher Full Text

[10] 10. Hendrickx DAE, van Scheppingen J , van der Poel M , et al.: Gene Expression Profiling of Multiple Sclerosis Pathology Identifies Early Patterns of Demyelination Surrounding Chronic Active Lesions. Front. Immunol. 2017; 8: 1810. PubMed Abstract | Publisher Full Text

[11] 11. Zhang L, Chen D, Song D, et al.: Clinical and translational values of spatial transcriptomics. Sig. Transduct. Target Ther. 2022 Dec; 7(1): 111. PubMed Abstract | Publisher Full Text

[12] 12. Cartwright H, editor: Artificial Neural Networks. New York, NY:Springer US;2021 [cited 2022 Aug 12]; vol. 2190. . Methods in Molecular Biology. Publisher Full Text

[13] 13. Zhang L, Sun L, Zhang B, et al.: Identification of Differentially Expressed Genes (DEGs) Relevant to Prognosis of Ovarian Cancer by Use of Integrated Bioinformatics Analysis and Validation by Immunohistochemistry Assay. Med. Sci. Monit. 2019 Dec 24; 25: 9902–9912. PubMed Abstract | Publisher Full Text

[14] 14. Bardsley EN, Davis H, Ajijola OA, et al.: RNA Sequencing Reveals Novel Transcripts from Sympathetic Stellate Ganglia During Cardiac Sympathetic Hyperactivity. Sci. Rep. 2018 Dec; 8(1): 8633. PubMed Abstract | Publisher Full Text

[15] 15. Ben Aribi H, Ding M, Nickerson D, et al.: SPARC-FAIR-Codeathon/Transcriptomic_oSPARC: Version 2.0 (v2.0). Zenodo. 2023. Publisher Full Text

Gene expression data visualization tool on the o²S²PARC platform

Abstract

Keywords

Revised Amendments from Version 1

Introduction

Methods

The gene expression data visualization tool template

Figure 1. Example Barplot of statistically significant ontologies associated with differentially expressed genes.

User guide extension

Pipeline validation

Results

Figure 2. Volcano plot generated by the “Gene expression data visualization” tool.

Table 1. Table summarizing the numbers of gene groups.

Pipeline validation

Discussion and conclusion

Data availability

Software availability

Acknowledgements

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated