DataMap: A Browser-based App for Visualizing High-Dimensional Data

Xijin Ge

doi:10.12688/f1000research.165281.1

Home Browse DataMap: A Browser-based App for Visualizing High-Dimensional Data

ALL Metrics

-

Views

-

Downloads

Get PDF

Get XML

Export

▬

✚

Software Tool Article

DataMap: A Browser-based App for Visualizing High-Dimensional Data

[version 1; peer review: awaiting peer review]

Xijin Ge

PUBLISHED 10 Nov 2025

Author details Author details

Mathematics and Statistics, South Dakota State University, Brookings, South Dakota, 57007, USA

Xijin Ge
Roles: Conceptualization, Software, Writing – Original Draft Preparation

OPEN PEER REVIEW

REVIEWER STATUS AWAITING PEER REVIEW

This article is included in the Bioinformatics gateway.

This article is included in the RPackage gateway.

Abstract

Background

Visualization and analysis of high-dimensional data are essential in biomedical research. There is a need for secure, scalable, and reproducible tools to facilitate data exploration and interpretation.

Results

We introduce DataMap, a browser-based application designed for the visualization of high-dimensional data using heatmaps and dimensionality reduction plots. DataMap operates directly in the web browser, ensuring data privacy without the requirement of installations or server setups. The application features an intuitive user interface for data transformation, annotation, and generation of reproducible R code.

Conclusions

Freely accessible as a GitHub page (https://gexijin.github.io/datamap), DataMap is a secure, user-friendly, and reproducible solution for visualizing high-dimensional data.

Keywords

Data visualization, Heatmap, PCA, t-SNE, Reproducibility

Corresponding author: Xijin Ge

Competing interests: No competing interests were disclosed.

Grant information: Supported by NIH grants (P20GM135008, R01HG010805, R01HG013534, and R43GM153076).

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright: © 2025 Ge X. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Ge X. DataMap: A Browser-based App for Visualizing High-Dimensional Data [version 1; peer review: awaiting peer review]. F1000Research 2025, 14:1234 (https://doi.org/10.12688/f1000research.165281.1) First published: 10 Nov 2025, 14:1234 (https://doi.org/10.12688/f1000research.165281.1) Latest published: 10 Nov 2025, 14:1234 (https://doi.org/10.12688/f1000research.165281.1)

1. Introduction

High-dimensional datasets, such as expression matrices from RNA-seq or proteomics experiments, have become commonplace in biomedical research. Heatmaps are an effective visualization method, efficiently representing thousands of data points in a matrix through variations in color. Several heatmap-centric visualization tools, including Clustergrammer¹, Phantasus², and Morpheus³, have been developed to facilitate the analysis of these expansive datasets. Phantasus and Morpheus operate entirely within the user’s web browser, while Clustergrammer requires server-side processing.

Our goal was to develop a secure, browser-based application capable of generating high-quality and reproducible visualizations. To address this, we developed DataMap, an R/Shiny application deployed via Shinylive, leveraging WebR—a specialized version of R compiled into WebAssembly—to enable execution directly within web browsers. Hosted as static files on GitHub, this serverless design ensures sensitive data remains securely on the user’s device while eliminating the need for server-side computation.

DataMap supports multiple visualization methods, including hierarchical clustering with heatmaps, principal component analysis (PCA), and t-distributed stochastic neighbor embedding (t-SNE)⁴. These methods enable researchers to uncover biologically significant patterns, clusters, and relationships within complex datasets. To facilitate usability, the application automatically recommends optimal file-parsing methods and data transformation settings by analyzing input files and assessing data distributions. Although optimized for omics datasets, DataMap remains broadly applicable to general high-dimensional data matrices across various research domains.

2. Implementation

DataMap is implemented as an R/Shiny application compiled into WebAssembly via Shinylive, enabling entirely client-side execution within web browsers. The application is hosted on GitHub Pages as static files, automatically updated through GitHub Actions from the source code repository, ensuring continuous integration and continuous deployment (CI/CD).

The app consists of several functional modules:

1. File Upload Module: Supports multiple file formats, including Excel, CSV, TSV, TXT, and other plain text files, with automatic delimiter detection for accurate parsing.
2. Data Transformation Module: Offers essential preprocessing features, such as log transformation, normalization, missing value handling, outlier capping, and feature filtering.
3. Visualization Modules: Produces high-quality, publication-ready visualizations, including hierarchical clustering heatmaps generated using the pheatmap package⁵, as well as PCA and t-SNE plots. The heatmap visualization supports dendrogram cutting, enabling clearer identification of distinct clusters.
4. Code Generation Module: Automatically generates reproducible R code reflecting all analytical steps performed by the user, facilitating transparency and reproducibility.

This modular and serverless design ensures data security by processing exclusively on the user’s device and facilitates easy maintenance and ongoing enhancements through automated deployment.

3. Operation

To use DataMap, simply access the application via the static GitHub page at https://gexijin.github.io/datamap using modern web browsers such as Chrome, Edge, Safari, or Firefox. Additionally, users may install and run DataMap locally as an R package using the following commands in R:

remotes::install_github(“gexijin/datamap”)

datamap::run_app()

This flexibility ensures seamless operation both online and offline, accommodating diverse user needs.

4. Results

4.1 Features and functionality

Secure local processing: DataMap securely processes all data directly within the user’s web browser, safeguarding data privacy and removing dependency on external servers. This approach also ensures scalability without being limited by server resources.

Smart data import: It automatically detects file formats, delimiters, and annotations, streamlining the data upload process. The app also examines the data to identify the presence of row and column names. Row annotations can be uploaded separately or included in the data matrix. Column annotations, such as experimental design factors in omics datasets, must be uploaded separately using matching column names.

Comprehensive data transformations: The data transformation workflow employs statistical heuristics to recommend appropriate settings for effective visualization. Missing data can remain unchanged or be imputed using row-wise or column-wise mean or median values. When high skewness (>1) is detected and no negative values are present, the app recommends a log transformation, addressing common challenges associated with visualizing biological datasets. Matrix orientation is inferred by comparing row and column variability using the Median Absolute Deviation, followed by recommendations for centering or scaling. The mapping of data to colors in heatmaps is usually determined by the minimum and maximum values in the data matrix. This makes the mapping susceptible to outliers. Outliers beyond three standard deviations from the mean are capped, optimizing color ranges for visualization. Users can also filter out less variable rows. To optimize heatmap color mapping, outliers exceeding three standard deviations from the mean are capped by default, minimizing their influence. Additionally, users can filter out less-variable rows. These built-in strategies empower users, including non-statisticians, to produce robust visualizations effortlessly.

Publication-quality visualizations: Leveraging R visualization libraries, DataMap generates high-quality graphics suitable for publication, downloadable in PDF or PNG formats.

Reproducible analysis: DataMap promotes transparency, consistency, and collaborative analysis by automatically recording all user-selected settings and analytical steps, generating reproducible R code that replicates visualizations on local systems.

4.2 Comparison with existing tools

DataMap complements existing visualization tools such as Clustergrammer, Phantasus, and Morpheus. Like Phantasus and Morpheus, DataMap employs client-side processing for enhanced data security. It extends their functionality by offering a broader set of preprocessing options, automatic generation of reproducible R scripts, and publication-quality graphics. However, DataMap is less interactive than native web applications built with Java or other programming languages.

Use cases

We used DataMap to visualize genes upregulated by ionizing radiation in mouse B cells with and without functional p53 gene⁶. Figure 1A shows the top genes specifically induced in B cells with p53. Experimental conditions (genotype and radiation exposure) are annotated by column annotations, while row color bars indicate genes involved in apoptosis. This heatmap clearly reveals genes strongly responsive to radiation only in wild-type B cells, highlighting the functional importance of p53. In Figure 1B, we visualize a t-SNE projection of 2,700 single-cell RNA-seq samples, color-coded by clusters corresponding to cell types. These examples highlight DataMap’s capability to uncover insights within complex, high-dimensional omics datasets.

Figure 1. Example visualizations.

(A) Top 20 genes upregulated by ionizing radiation in mouse B cells with functional p53 gene⁶, and (B) t-SNE projection of 2700 single-cell RNAseq profiles of peripheral blood mononuclear cells (PBMCs), available from 10X Genomics. Both datasets are included as built-in examples within the application.

5. Discussion

When analyzing large datasets, browser-based execution is slower compared to native execution. For example, generating a hierarchical clustering heatmap of a 2700×50 matrix takes approximately 80 seconds when run in the browser, compared to just 5 seconds in native R on the same laptop (Intel 11th Gen Core i7-1185G7, 3.00 GHz). Users are encouraged to install DataMap locally as an R package for extremely large datasets. Another limitation stems from DataMap’s reliance on the WebR, which only supports a subset of R packages with delayed updates.

In summary, DataMap combines secure client-side processing with robust data preprocessing and reproducible workflows. It complements existing web-based tools, equipping biomedical researchers with a powerful tool for exploratory analysis. Future development will focus on expanding visualization capabilities and incorporating additional analytical modules.

Software availability

Software available from: https://gexijin.github.io/datamap

Source code available from: https://github.com/gexijin/datamap

Archived source code at time of publication: https://doi.org/10.5281/zenodo.15361414

License: GNU General Public License v3.0

Data availability

This work produces a software package. No data is generated or collected.

References

1. Fernandez NF, et al.: Clustergrammer, a web-based heatmap visualization and analysis tool for high-dimensional biological data. Sci Data. 2017; 4: 170151. PubMed Abstract | Publisher Full Text | Free Full Text
2. Kleverov M, et al.: Phantasus, a web application for visual and interactive gene expression analysis. elife. 2024; 13. PubMed Abstract | Publisher Full Text | Free Full Text
3. Starruss J, de Back W , Brusch L, et al.: Morpheus: a user-friendly modeling environment for multiscale and multicellular systems biology. Bioinformatics. 2014; 30: 1331–1332. PubMed Abstract | Publisher Full Text | Free Full Text
4. van der Maaten L , Hinton G: Visualizing data using t-SNE. J. Mach. Learn. Res. 2008; 9: 7.
5. Kolde R: pheatmap: Pretty Heatmaps.2018. Reference Source
6. Tonelli C, et al.: Genome-wide analysis of p53 transcriptional programs in B cells upon exposure to genotoxic stress in vivo. Oncotarget. 2015; 6: 24611–24626. PubMed Abstract | Publisher Full Text | Free Full Text

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 10 Nov 2025

Author details Author details

Mathematics and Statistics, South Dakota State University, Brookings, South Dakota, 57007, USA

Xijin Ge
Roles: Conceptualization, Software, Writing – Original Draft Preparation

Competing interests

No competing interests were disclosed.

Grant information

Supported by NIH grants (P20GM135008, R01HG010805, R01HG013534, and R43GM153076).

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Article Versions (1)

version 1

Published: 10 Nov 2025, 14:1234

https://doi.org/10.12688/f1000research.165281.1

Copyright

© 2025 Ge X. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

0

SEE MORE DETAILS

CITE

how to cite this article

Ge X. DataMap: A Browser-based App for Visualizing High-Dimensional Data [version 1; peer review: awaiting peer review]. F1000Research 2025, 14:1234 (https://doi.org/10.12688/f1000research.165281.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 10 Nov 2025

Open Peer Review

Reviewer Status

AWAITING PEER REVIEW

Comments on this article

All Comments(0)

Add a comment

Sign up for content alerts

Browse by related subjects

[1] 1. Fernandez NF, et al.: Clustergrammer, a web-based heatmap visualization and analysis tool for high-dimensional biological data. Sci Data. 2017; 4: 170151. PubMed Abstract | Publisher Full Text | Free Full Text

[2] 2. Kleverov M, et al.: Phantasus, a web application for visual and interactive gene expression analysis. elife. 2024; 13. PubMed Abstract | Publisher Full Text | Free Full Text

[3] 3. Starruss J, de Back W , Brusch L, et al.: Morpheus: a user-friendly modeling environment for multiscale and multicellular systems biology. Bioinformatics. 2014; 30: 1331–1332. PubMed Abstract | Publisher Full Text | Free Full Text

[4] 4. van der Maaten L , Hinton G: Visualizing data using t-SNE. J. Mach. Learn. Res. 2008; 9: 7.

[5] 5. Kolde R: pheatmap: Pretty Heatmaps.2018. Reference Source

[6] 6. Tonelli C, et al.: Genome-wide analysis of p53 transcriptional programs in B cells upon exposure to genotoxic stress in vivo. Oncotarget. 2015; 6: 24611–24626. PubMed Abstract | Publisher Full Text | Free Full Text

DataMap: A Browser-based App for Visualizing High-Dimensional Data

Abstract

Background

Results

Conclusions

Keywords

1. Introduction

2. Implementation

3. Operation

4. Results

4.1 Features and functionality

4.2 Comparison with existing tools

Figure 1. Example visualizations.

5. Discussion

Software availability

Data availability

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated