ggEDA: Visualisations for exploratory data analysis using tiled one-dimensional graphics and parallel coordinate plots

Sam El-Kamand; Julian M.W. Quinn; Mark J. Cowley

doi:10.12688/f1000research.168305.1

Home Browse ggEDA: Visualisations for exploratory data analysis using tiled one-dimensional...

ALL Metrics

-

Views

-

Downloads

Get PDF

Get XML

Export

▬

✚

Software Tool Article

ggEDA: Visualisations for exploratory data analysis using tiled one-dimensional graphics and parallel coordinate plots

[version 1; peer review: awaiting peer review]

Sam El-Kamand ^1,2, Julian M.W. Quinn¹, Mark J. Cowley^1,2

PUBLISHED 13 Nov 2025

Author details Author details

¹ Computational Biology, Children's Cancer Institute Australia, Sydney, New South Wales, 2052, Australia
² School of Clinical Medicine, University of New South Wales, Sydney, New South Wales, 2052, Australia

Sam El-Kamand
Roles: Conceptualization, Methodology, Software, Writing – Original Draft Preparation

Julian M.W. Quinn
Roles: Conceptualization, Supervision, Writing – Review & Editing

Mark J. Cowley
Roles: Conceptualization, Funding Acquisition, Supervision, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS AWAITING PEER REVIEW

This article is included in the RPackage gateway.

Abstract

Exploratory data analysis (EDA) involves summarising trends within a dataset to help uncover data quality issues and generate hypotheses. However, identifying relationships between multiple features often requires extensive coding, manual inspection and statistical modelling. Here, we introduce the ggEDA R package, which streamlines multidimensional data exploration by providing two turnkey and complementary visualisation strategies. ggEDA generates interactive parallel coordinate plots (PCPs) well suited for examining large datasets with mostly quantitative features, and introduces tiled one-dimensional plots that more effectively show missingness and reveal categorical relationships in smaller datasets. ggEDA reduces the amount of code and time required to detect multi-feature relationships that may otherwise require statistical modelling or thorough manual review to identify. To make ggEDA visualisations accessible to a wider audience we also developed interactiveEDA, a web app that enables non-programmers to explore and interpret data patterns interactively. ggEDA and interactiveEDA are available at https://github.com/CCICB/ggEDA and https://github.com/CCICB/interactiveEDA respectively.

Keywords

R, visualisation, exploratory data analysis, multidimensional, parallel coordinate plots

Corresponding author: Sam El-Kamand

Competing interests: No competing interests were disclosed.

Grant information: We acknowledge support from Cancer Australia (grant 1165556) and My Room Children’s Cancer Charity, as well as the Australian Medical Research Future Fund (MRFF Emerging Priorities and Consumer-Driven Research Initiative)

This work was also supported by Luminesce Alliance – Innovation for Children’s Health. Luminesce Alliance is a not- for-profit cooperative joint venture between the Sydney Children’s Hospitals Network, the Children’s Medical Research Institute, the Children’s Cancer Institute, the University of Sydney, and the UNSW Sydney. It has been established with the support of the NSW Government to coordinate and integrate pediatric research.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright: © 2025 El-Kamand S et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: El-Kamand S, Quinn JMW and Cowley MJ. ggEDA: Visualisations for exploratory data analysis using tiled one-dimensional graphics and parallel coordinate plots [version 1; peer review: awaiting peer review]. F1000Research 2025, 14:1248 (https://doi.org/10.12688/f1000research.168305.1) First published: 13 Nov 2025, 14:1248 (https://doi.org/10.12688/f1000research.168305.1) Latest published: 13 Nov 2025, 14:1248 (https://doi.org/10.12688/f1000research.168305.1)

Introduction

Exploratory data analysis (EDA) reveals relationships between data features, informing hypothesis generation and downstream analyses. It can also identify data-quality issues such as missingness, bias, and unexpected distribution structure. The R ecosystem already includes popular EDA packages such as skimr, which textually summarises completeness and descriptive statistics for individual features (1-dimensional), and GGally, which graphically describes pairwise feature correlations (2-dimensional) or multi-feature relationships through PCPs (n-dimensional). ggEDA enhances this ecosystem by providing interactive versions of standard n-dimensional visualisations like PCPs and introducing tiled one-dimensional visualisations that more effectively show missingness and relationships between categorical features in smaller datasets. Together, these visualisations provide key advantages over other EDA packages, most notably an ability to reveal a greater variety of multidimensional patterns ( Figure 1).

Figure 1. Comparison of R packages that create visualisations commonly used for exploratory data analysis, including ComplexHeatmap,¹ Data Explorer,² skimr,³ GGally⁴ and ggpcp.⁵

Due to documented reproducibility issues, ggpcp features could not be verified first-hand.

Methods

Implementation

ggEDA is implemented as a standard R package and published on CRAN and the R-universe. The interactiveEDA web app was written using the shiny framework⁶ and takes ggEDA as a dependency to separate the user-interface codebase from the underlying business logic, which is easier to test. interactiveEDA is compiled into a purely client-side web-assembly app using shinylive⁷ and hosted as a static web-page on GitHub Pages. Code to produce visualisations is run in the client’s browser instead of a third party server outside the direct control of end-users. The distributed nature of compute also provides scaling benefits compared to traditional server-side shiny apps that quickly slow as concurrent users grow. These security and scalability benefits do come at the cost of slower application startup time.

Operation

The ggEDA R package can be installed from CRAN (install.packages(“ggEDA”)). It is compatible with Mac OS X, Windows, and all Unix-like operating systems where R (≥3.5.0) can be installed. Package dependencies are described on the ggEDA CRAN listing .

Data can be explored using the interactiveEDA web app in any modern browser that supports WebAssembly. We performed the most extensive testing in Chrome (version 137.0.7151.120), however the app is also compatible with Firefox, Safari and Microsoft Edge.

Use cases

To demonstrate how ggEDA and interactiveEDA support exploratory data analysis, we present a series of use cases that highlight their capabilities in visualising multidimensional datasets.

Creating parallel coordinate plots

PCPs are a well-established EDA visualisation that reveal trends in predominantly quantitative datasets and detect outliers in one or more dimensions. Quantitative features are represented as a series of parallel axes with samples visualised as lines passing through each axis at the point of its value. Correlative relationships are revealed when feature axes are ordered based on line crossing minimisation algorithms or mutual information with a categorical feature. ggEDA can produce interactive PCPs from any dataset with quantitative features using the ggparallel(data) command ( Figure 2).

Figure 2. ggEDA parallel coordinate plots of the dry beans imaging dataset.⁹

A) Visualising 16 morphological features of 13,611 grains from common dry bean species reveals clear correlations amongst size-related attributes (Area, Perimeter and Axis Length). Bombay beans were the largest, most convex variety; B) Highlighting a single subclass simplifies both comparison against the full cohort and identification of within-class outliers. For example, Dermason beans (red) are smaller in size than other varieties. One Dermason bean grain had unusually low roundness, highly atypical for this variety.

PCPs scale well with large datasets but have several limitations. Visualising the relationships between multiple categorical variables is challenging. Missing data is also difficult to meaningfully represent. For this reason, ggEDA introduces a complementary visualisation composed of vertically aligned tile and bar plots.

Creating tiled one-dimensional graphics

For small datasets (n < 1000), ggEDA can represent features as distinct, vertically aligned bar or tile plots, with plot types auto-selected based on whether variables are categorical or numeric ( Figure 3).

Figure 3. ggEDA visualisations of common datasets revealing: A) Petals of the setosa species of iris are drastically smaller than other iris species; B) The majority of individuals who perished during the Titanic disaster were adult males; C) Gentoo penguins from Biscoe Island have shallower bill depths than Chinstrap or Adelie penguins, despite their increased body mass.

Identifying complex multidimensional patterns

To demonstrate the advantages of ggEDA, we created the artificial Lazy Birdwatcher dataset. It describes magpie observations by two birdwatchers, one of whom routinely skips birdwatching on weekends. This introduces a missing data pattern dependent on both the birdwatcher and day of the week. The multidimensional pattern becomes immediately apparent from ggEDA stacked tile plots despite being difficult to detect using one-dimensional EDA tools like skimr, or two-dimensional tools like ggpairs from the GGally package ( Figure 4). Despite being n-dimensional, all PCP plot implementations in R also fail to uncover this trend due to either exclusion of missing data or inability to represent clearly the relationships between categorical features.

Figure 4. Visualisation of the Lazy Birdwatcher dataset using the ggEDA package reveals a pattern of missingness (indicated by exclamation marks) dependent on multiple variables, Birdwatcher and Day (A).

This pattern is difficult to detect using one-dimensional EDA tools like skimr (B) or two-dimensional tools like ggpairs from the GGally package (C).

Exploring datasets using the interactiveEDA web-app

Despite the advancements provided by ggEDA and other tools in the R ecosystem, a key limitation remains: accessibility for non-programmers, particularly when visualising n-dimensional data. All existing R implementations lack graphical user interfaces ( Figure 1). While shiny web apps offer a potential solution, they often require uploading datasets to external servers, raising privacy concerns. To address these limitations, we developed interactiveEDA, a web-assembly compiled client-side web app for secure, interactive data exploration ( Figure 5). Operating entirely in the browser, interactiveEDA ensures data remains on the user’s machine, increasing ease of use without compromising data privacy. interactiveEDA is available at https://github.com/CCICB/interactiveEDA.

Figure 5. Screenshot of interactiveEDA, a web-app providing a graphical user interface for code-free generation of ggEDA visualisations.

Summary

ggEDA provides two complementary visualisation strategies for exploratory data analysis: interactive parallel coordinate plots for high-dimensional quantitative data and tiled one-dimensional graphics for exploring missingness and categorical relationships in smaller datasets. These tools help uncover complex patterns and data quality issues with minimal coding. For users without programming experience, the same visualisations are available through the interactiveEDA web app.

Software availability

ggEDA:

• Install using install.packages(“ggEDA”).
• Available from CRAN: https://cran.r-project.org/web/packages/ggEDA/
• Source code: https://github.com/CCICB/ggEDA
• Archived release: https://doi.org/10.5281/zenodo.17290896
• License: MIT

interactiveEDA:

• Available at: https://ccicb.github.io/interactiveEDA/
• Source code: https://github.com/CCICB/interactiveEDA
• Archived release: https://doi.org/10.5281/zenodo.17290912
• License: MIT

Data availability

Figshare. DryBeans. https://doi.org/10.6084/m9.figshare.29614133.v3⁸

This project contains the following underlying data:

• dry_beans.csv: Sourced from the UCI Machine Learning Repository.¹⁰ Originally published by Koklu and Özkan in 2020.⁹ A random subsample (n = 1000) is packaged with ggEDA (ggEDA::minibeans). Used in Figure 2.
• Mini beans.csv

Data is available under the terms of the CC BY 4.0 license.

Figshare. ggEDA. https://doi.org/10.6084/m9.figshare.30350887.v2¹¹

This project contains the following underlying data:

• iris.csv: Included with base R. Originally published by Anderson in 1935.¹² Used in Figure 3.
• titanic.raw.csv: Loaded from the datarium R package.¹³ Originally published by the British Board of Trade in 1990.¹⁴ Used in Figure 3.
• penguins.csv: Loaded from the palmerpenguins R package.¹⁵ Originally published by Gorman et al. in 2014.¹⁶ Used in Figure 3.
• lazy_birdwatcher.csv: Artificial dataset bundled with the ggEDA R package (ggEDA::lazy_birdwatcher). Used in Figure 4.

Data is available under the terms of the CC0 license.

Acknowledgements

We thank the developers of the packages integral to ggEDA, especially David Gohel for ggiraph,¹⁷ which enables its interactivity, and Thomas Lin Pedersen for patchwork¹⁸ and ggplot2 maintenance. We also acknowledge Hadley Wickham and all contributors to ggplot2.¹⁹ The ggEDA graphical user interface (EDA) was made possible thanks to creators and maintainers of shiny,⁶ shinylive⁷ and webR.²⁰

We thank the Australian BioCommons for advice and research computing support.

References

1. Gu Z: Complex heatmap visualization. iMeta. 2022; 1(3): e43. PubMed Abstract | Publisher Full Text | Free Full Text
2. Cui B: DataExplorer: Automate Data Exploration and Treatment. R package version 0.8.3. 2024. Publisher Full Text Reference Source
3. Waring E, Quinn M, McNamara A, et al.: skimr: Compact and Flexible Summaries of Data. R package version 2.1.5. 2022. Publisher Full Text Reference Source
4. Schloerke B, Cook D, Larmarange J, et al.: GGally: Extension to’ggplot2’. R package version 2.2.1. 2024. Publisher Full Text Reference Source
5. VanderPlas S, Ge Y, Unwin A, et al.: Penguins Go Parallel: A Grammar of Graphics Framework for Generalized Parallel Coordinate Plots. J. Comput. Graph. Stat. 2023; 32(4): 1572–1587. Publisher Full Text
6. Chang W, Cheng J, Allaire J, et al.: shiny: Web Application Framework for R. R package version 1.9.1. 2024. Publisher Full Text Reference Source
7. Schloerke B, Chang W, Stagg G, et al.: shinylive: Run’shiny’ Applications in the Browser. R package version 0.3.0. 2024. Publisher Full Text Reference Source
8. El-Kamand S: Dry Beans. Figshare. 2025. Publisher Full Text
9. Koklu M, Özkan IA: Multiclass classification of dry beans using computer vision and machine learning techniques. Comput. Electron. Agric. 2020; 174: 105507. Publisher Full Text
10. Dry Bean: UCI Machine Learning Repository.Publisher Full Text
11. El-Kamand S: ggEDA Datasets. Figshare. 2025. Publisher Full Text
12. Anderson E: The Irises of the Gaspé Peninsula. Bulletin of the American Iris Society. 1935; 59: 2–5.
13. Kassambara A: datarium: Data Bank for Statistical Analysis and Visualization. R package version 0.1.0. 2019. Publisher Full Text Reference Source
14. British Board of Trade: Report on the Loss of the’Titanic’ (S.S.) British Board of Trade Inquiry Report (reprint). Gloucester, UK: Allan Sutton Publishing; 1990.
15. Horst AM, Hill AP, Gorman KB: palmerpenguins: Palmer Archipelago (Antarctica) penguin data. R package version 0.1.0. 2020. Publisher Full Text Reference Source
16. Gorman KB, Williams TD, Fraser WR: Ecological Sexual Dimorphism and Environmental Variability within a Community of Antarctic Penguins (Genus Pygoscelis). PLoS One. 2014; 9(3): e90081–e90014. PubMed Abstract | Publisher Full Text | Free Full Text
17. Gohel D, Skintzos P: ggiraph: Make’ggplot2’ Graphics Interactive. R package version 0.8.10. 2024. Publisher Full Text Reference Source
18. Pedersen TL: patchwork: The Composer of Plots. R package version 1.2.0.9000. 2024. Publisher Full Text Reference Source
19. Wickham H: ggplot2: Elegant Graphics for Data Analysis. New York: Springer-Verlag; 2016. Publisher Full Text
20. Stagg GW, Lionel H, et al.: webR: The statistical language R compiled to WebAssembly via Emscripten.2023. Reference Source

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 13 Nov 2025

Author details Author details

¹ Computational Biology, Children's Cancer Institute Australia, Sydney, New South Wales, 2052, Australia
² School of Clinical Medicine, University of New South Wales, Sydney, New South Wales, 2052, Australia

Sam El-Kamand
Roles: Conceptualization, Methodology, Software, Writing – Original Draft Preparation

Julian M.W. Quinn
Roles: Conceptualization, Supervision, Writing – Review & Editing

Mark J. Cowley
Roles: Conceptualization, Funding Acquisition, Supervision, Writing – Review & Editing

Competing interests

No competing interests were disclosed.

Grant information

We acknowledge support from Cancer Australia (grant 1165556) and My Room Children’s Cancer Charity, as well as the Australian Medical Research Future Fund (MRFF Emerging Priorities and Consumer-Driven Research Initiative)

This work was also supported by Luminesce Alliance – Innovation for Children’s Health. Luminesce Alliance is a not- for-profit cooperative joint venture between the Sydney Children’s Hospitals Network, the Children’s Medical Research Institute, the Children’s Cancer Institute, the University of Sydney, and the UNSW Sydney. It has been established with the support of the NSW Government to coordinate and integrate pediatric research.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Article Versions (1)

version 1

Published: 13 Nov 2025, 14:1248

https://doi.org/10.12688/f1000research.168305.1

Copyright

© 2025 El-Kamand S et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

0

SEE MORE DETAILS

CITE

how to cite this article

El-Kamand S, Quinn JMW and Cowley MJ. ggEDA: Visualisations for exploratory data analysis using tiled one-dimensional graphics and parallel coordinate plots [version 1; peer review: awaiting peer review]. F1000Research 2025, 14:1248 (https://doi.org/10.12688/f1000research.168305.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 13 Nov 2025

Open Peer Review

Reviewer Status

AWAITING PEER REVIEW

Comments on this article

All Comments(0)

Add a comment

Sign up for content alerts

Browse by related subjects

[1] 1. Gu Z: Complex heatmap visualization. iMeta. 2022; 1(3): e43. PubMed Abstract | Publisher Full Text | Free Full Text

[2] 2. Cui B: DataExplorer: Automate Data Exploration and Treatment. R package version 0.8.3. 2024. Publisher Full Text Reference Source

[3] 3. Waring E, Quinn M, McNamara A, et al.: skimr: Compact and Flexible Summaries of Data. R package version 2.1.5. 2022. Publisher Full Text Reference Source

[4] 4. Schloerke B, Cook D, Larmarange J, et al.: GGally: Extension to’ggplot2’. R package version 2.2.1. 2024. Publisher Full Text Reference Source

[5] 5. VanderPlas S, Ge Y, Unwin A, et al.: Penguins Go Parallel: A Grammar of Graphics Framework for Generalized Parallel Coordinate Plots. J. Comput. Graph. Stat. 2023; 32(4): 1572–1587. Publisher Full Text

[6] 6. Chang W, Cheng J, Allaire J, et al.: shiny: Web Application Framework for R. R package version 1.9.1. 2024. Publisher Full Text Reference Source

[7] 7. Schloerke B, Chang W, Stagg G, et al.: shinylive: Run’shiny’ Applications in the Browser. R package version 0.3.0. 2024. Publisher Full Text Reference Source

[8] 8. El-Kamand S: Dry Beans. Figshare. 2025. Publisher Full Text

[9] 9. Koklu M, Özkan IA: Multiclass classification of dry beans using computer vision and machine learning techniques. Comput. Electron. Agric. 2020; 174: 105507. Publisher Full Text

[10] 10. Dry Bean: UCI Machine Learning Repository.Publisher Full Text

[11] 11. El-Kamand S: ggEDA Datasets. Figshare. 2025. Publisher Full Text

[12] 12. Anderson E: The Irises of the Gaspé Peninsula. Bulletin of the American Iris Society. 1935; 59: 2–5.

[13] 13. Kassambara A: datarium: Data Bank for Statistical Analysis and Visualization. R package version 0.1.0. 2019. Publisher Full Text Reference Source

[14] 14. British Board of Trade: Report on the Loss of the’Titanic’ (S.S.) British Board of Trade Inquiry Report (reprint). Gloucester, UK: Allan Sutton Publishing; 1990.

[15] 15. Horst AM, Hill AP, Gorman KB: palmerpenguins: Palmer Archipelago (Antarctica) penguin data. R package version 0.1.0. 2020. Publisher Full Text Reference Source

[16] 16. Gorman KB, Williams TD, Fraser WR: Ecological Sexual Dimorphism and Environmental Variability within a Community of Antarctic Penguins (Genus Pygoscelis). PLoS One. 2014; 9(3): e90081–e90014. PubMed Abstract | Publisher Full Text | Free Full Text

[17] 17. Gohel D, Skintzos P: ggiraph: Make’ggplot2’ Graphics Interactive. R package version 0.8.10. 2024. Publisher Full Text Reference Source

[18] 18. Pedersen TL: patchwork: The Composer of Plots. R package version 1.2.0.9000. 2024. Publisher Full Text Reference Source

[19] 19. Wickham H: ggplot2: Elegant Graphics for Data Analysis. New York: Springer-Verlag; 2016. Publisher Full Text

[20] 20. Stagg GW, Lionel H, et al.: webR: The statistical language R compiled to WebAssembly via Emscripten.2023. Reference Source

ggEDA: Visualisations for exploratory data analysis using tiled one-dimensional graphics and parallel coordinate plots

Abstract

Keywords

Introduction

Figure 1. Comparison of R packages that create visualisations commonly used for exploratory data analysis, including ComplexHeatmap,1 Data Explorer,2 skimr,3 GGally4 and ggpcp.5

Methods

Implementation

Operation

Use cases

Creating parallel coordinate plots

Figure 2. ggEDA parallel coordinate plots of the dry beans imaging dataset.9

Creating tiled one-dimensional graphics

Identifying complex multidimensional patterns

Figure 4. Visualisation of the Lazy Birdwatcher dataset using the ggEDA package reveals a pattern of missingness (indicated by exclamation marks) dependent on multiple variables, Birdwatcher and Day (A).

Exploring datasets using the interactiveEDA web-app

Figure 5. Screenshot of interactiveEDA, a web-app providing a graphical user interface for code-free generation of ggEDA visualisations.

Summary

Software availability

Data availability

Acknowledgements

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated

Figure 1. Comparison of R packages that create visualisations commonly used for exploratory data analysis, including ComplexHeatmap,¹ Data Explorer,² skimr,³ GGally⁴ and ggpcp.⁵

Figure 2. ggEDA parallel coordinate plots of the dry beans imaging dataset.⁹