ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Software Tool Article

StackbarExtended: a user-friendly stacked bar-plot representation incorporating phylogenetic information and microbiota differential abundance analysis

[version 1; peer review: 1 approved, 1 approved with reservations]
PUBLISHED 09 Aug 2024
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

Abstract

Background

Microbial communities are mainly composed of bacteria, archaea, viruses and fungi, and are present in the gut, mouth, nose, skin, lungs, vagina, and bladder, among other places. In recent years, research has highlighted the critical role that these highly complex communities play in health and disease. Advances in sequencing technology have resulted in the development of high-dimensional data, which are challenging to effectively analyze and visualize. In this context, traditional stacked bar-plot visualizations, while widely used, fall short of conveying the fundamental phylogenic relationships between community members and are thus difficult to interpret.

Methods

StackbarExtended is implemented in native R, required version (≥ 4.0), and is platform independent, with its source code available on GitHub and archived on Zenodo.

Results

StackbarExtended allows for the plotting of relative abundance at user-defined taxonomic levels while displaying phylogenetic information using color gradients. Additionally, StackbarExtended integrates differential abundance statistics directly into the visualization process and performs clustering of low-abundance taxa.

Conclusions

StackbarExtended offers researchers a user-friendly tool for rapid visualization, presentation, and analysis of the microbiota composition.

Keywords

Microbiota, differential abundance analysis, visualization

Introduction

Microbiota communities consisting of diverse microbial members, including bacteria, viruses, fungi, and other microorganisms (Berg et al., 2020) in the digestive tract, skin, respiratory and urogenital systems, and other locations (Kennedy and Chang, 2020), have emerged as a crucial factor in maintaining health and preventing disease (Hou et al., 2022). With advancements in sequencing technologies (Satam et al., 2023) there is an increasing amount of data available on the composition and function of microbiota communities (Hasan and Yang, 2019), but analyzing and visualizing these data remain challenging due to their complexity and high dimensionality (Panek et al., 2018).

Microbiota communities are organized into multiple phylogenetic levels, including Phylum, Family, Genus, and Species, with each level providing unique insights into the community’s structure and functional potential (Jandhyala et al., 2015). Integrating visualization techniques to simultaneously represent relative abundances across multiple taxonomic levels would enhance the accessibility and ease of interpretation.

Stacked bar-plot representations of the relative abundance of microorganisms remain one of the most commonly used visualizations to show the global composition of microbiota communities, as well as potential shifts in a given community (Liu et al., 2021; McMurdie and Holmes, 2013). However, current tools used to generate stacked bar visualizations in microbiota analysis do not capture phylogenetic relationships between microbial taxa (Peeters et al., 2021). This is because traditional stacked bar visualizations utilize random colors to represent different microbes within the same taxonomic level. Furthermore, differential abundance analysis, a key step in interpreting statistically significant compositional shifts and their biological implications, is typically conducted and shown separately.

To address these issues, we developed a novel R package that allows users to easily generate stacked bar-plots to visualize microbiota composition at a user-defined taxonomic level while integrating information on taxa phylogeny through the use of color gradients. In addition, statistically significant differences in relative abundance are indicated on the plot. These simple solutions allow for more information to be communicated within one single graphical representation.

Methods

Operation

StackbarExtended is implemented in native R, required version (≥ 4.0), and is platform independent, with its source code available on GitHub and archived on Zenodo (Cuisiniere, 2024a).

Implementation

StackbarExtended is designed for microbiota data analysis and visualization. It requires three main inputs: a taxonomic table providing phylogenetic information, a count table detailing the abundance of the taxa, and metadata containing biological or experimental group data (Navgire et al., 2022). These tables can be obtained through established pipelines such as DADA2 (Callahan et al., 2016), Mothur (Schloss et al., 2009), and QIIME2 (Bolyen et al., 2019) in conjunction with reference databases such as Greengenes (DeSantis et al., 2006) or SILVA (Quast et al., 2013). The package supports a phyloseq S4 class object as an input, promoting a data management approach that regroups the count, taxonomy, and metadata tables into a single entity, thus simplifying data handling and enhancing reproducibility. The Phyloseq S4 object can be obtained through the phyloseq() function of the phyloseq R package (McMurdie and Holmes, 2013).

One of the StackbarExtended functionalities is the ability to handle multiple taxonomic levels simultaneously, typically focusing on user-defined levels set by default to Family (X) and Phylum (X+1). This is achieved through the use of the tax_glom() function from the phyloseq package. Most importantly, StackbarExtended uses a color-coding mechanism where taxa at a specified level (X) are visually differentiated using user-defined color palettes that match those of their Phylum level (X+1), making visual identification of the taxonomic hierarchies straightforward. This is achieved by applying different shades of the same color to represent individual taxa (X) within the same Phylum (X+1), creating a gradient effect that maps the phylogenetic taxonomy.

Another challenge in analyzing microbiota data is the large number of taxa present in the dataset. In microbiota communities, a small number of taxa typically dominate, while numerous others are present at significantly lower abundance levels (Neu et al., 2021). In classic stacked bar-plot representations such as the ones included in Phyloseq (McMurdie and Holmes, 2013), MicrobiomeAnalyst (Chong et al., 2020) or MicroEco (Liu et al., 2021) pipelines, these low-abundance taxa are simply filtered out before plotting. StackbarExtended allows users to classify taxa based on their relative abundance, and, by default, the package plots only taxa representing more than 1% of the total abundance. In addition, low-abundance taxa (X) can be grouped into their respective “Others” taxonomic level and the information about their taxonomy (i.e. Phylum) is kept while taxa belonging to low abundance Phyla (X+1) are grouped into a general “Others” category. Thus, by still including the low-abundance taxa, StackbarExtended provides a more accurate representation of the taxa in a microbiota community.

Finally, StackbarExtended includes DESeq2 differential abundance analysis functionality (Love et al., 2014) which allows users to statistically assess the difference in taxa abundance between experimental groups and apply fdr corrections. Significant features (i.e. taxa with significant differences in abundance between 2 groups) are highlighted in the legend using bold font, and the significance levels (p-value) after fdr corrections are automatically shown using stars in the legend. When more than two treatment groups are compared, multiple pairwise comparisons can be computed and data-frames are produced with results providing information about the taxa identified as significant at each phylogenetic level analyzed through the DESeq2 analysis. This information includes the taxa names and phylogeny, their corresponding abundance levels, statistical metrics such as log2 fold-change, p-values and fdr-corrected p-values.

Use cases

To demonstrate the use of this package, we have provided ready-to-use example data “ps” from our previously published study (Cuisiniere, 2024a; Cuisiniere et al., 2021) which assessed shifts in mouse gut microbiota composition after antibiotic treatment (Figure 1).

403d272b-d993-4da2-bd16-c516fa18ab29_figure1.gif

Figure 1. StackbarExtended graphical output.

Mice (n = 9) received oral antibiotic treatment with neomycin and metronidazole for one week. Fecal samples were collected before (Day 0) and after antibiotic treatment (Day 7). 16S rRNA of DNA extracted from fecal samples was sequenced using the Illumina MiSeq platform. The 4 most abundant phyla and families are represented. Families with a mean abundance lower than 1% across the samples are regrouped into “Others”. Taxa represented in bold within the legends are statistically significant after fdr correction (*P < 0.05, **P < 0.01, ***P < 0.001). Data used are from the “ps” dataset and are included in the StackbarExtended R package.

Animal experiments were approved by the Institutional Animal Care committee of the Centre de recherche du Centre hospitalier de l’Université de Montréal (CRCHUM) in agreement with the guidelines of the Canadian Council of Animal Care. The study was carried out in compliance with the ARRIVE guidelines (Cuisiniere, 2024b). No criteria were set for including and excluding animals during the experiment or data points during the analysis. No exclusions of animals, experimental units, or data points were applied for the analysis.

In order to obtain the data, four-week-old female C57Bl/6 mice were purchased from Charles River Laboratories (Saint-Constant, QC, Canada). Constant efforts were made to minimize the suffering of the animals. Nine mice were kept under controlled specific pathogen free (SPF) conditions in the CRCHUM animal facility at a temperature of 22°C, 45-60% humidity with a light-dark cycle of 12-12. They were housed at three mice per cage with ad libitum access to chow and water. Cages were enriched with nesting material. Mice were allowed one week of acclimation following arrival to the CRCHUM animal facility, after which oral antibiotics consisting of metronidazole (1 mg.ml−1, Hospira, St-Laurent, QC, Canada) and neomycin (1 mg.ml−1, Sigma, St-Louis, MO, USA) were added to the drinking water for one week. Fecal samples were collected before (Day 0) and after antibiotic treatment (Day 7), snap-frozen and stored at -80°C. Mice were then euthanized using CO2 followed by cervical dislocation. DNA was extracted using the Qiagen DNeasy PowerSoil® kit (Qiagen, Toronto, ON) and quantified using a spectrophotometer (DeNovix DS-11 FX, Wilmington, DE). The 16S ribosomal RNA (rRNA) library preparation and sequencing was performed using the Illumina MiSeq platform at Genome Québec targeting the V3-V4 (Primers: 341F, 805R) region of the 16S rRNA gene. Forward and reverse, raw, demultiplexed 16S rRNA reads were denoised, chimera filtered, and clustered into sequence variants using the Dada2 package (version 1.16) (Callahan et al., 2016) in R (version 4.0.1). Reads were trimmed at the first instance of a quality score less than or equal to 2 or removed if they contained ambiguous nucleotides (N) or if two or more errors were expected based on the quality of the trimmed read. After taxonomic assignment using Silva training set v132 (Quast et al., 2013), ASV (Table 1), taxonomy (Table 2) and metadata (Table 3) were combined into a phyloseq object (McMurdie and Holmes, 2013).

Table 1. Subset of the ASV table used.

Rows represent each sample and columns represent individual ASVs. Each cell indicates the count of a particular ASV in a specific sample.

ASV1ASV2ASV3ASV4ASV5
15186T020352717160851346
15189T741192552341209
15187T0275786855215389
15190T7417330271
15188T02535123087445434

Table 2. Subset of the taxonomy table used.

Rows represent individual ASVs, and columns represent each taxonomic rank. Each cell contains the taxonomic name at that rank for the corresponding ASV.

KingdomPhylumClassOrderFamilyGenusSpecies
ASV1BacteriaVerrucomicrobiaVerrucomicrobiaeVerrucomicrobialesAkkermansiaceaeAkkermansiamuciniphila
ASV2BacteriaBacteroidetesBacteroidiaBacteroidalesTannerellaceaeParabacteroidesNA
ASV3BacteriaBacteroidetesBacteroidiaBacteroidalesBacteroidaceaeBacteroidesvulgatus
ASV4BacteriaBacteroidetesBacteroidiaBacteroidalesTannerellaceaeParabacteroidesdistasonis
ASV5BacteriaBacteroidetesBacteroidiaBacteroidalesRikenellaceaeAlistipesNA

Table 3. Subset of the metadata table used.

Rows contain metadata associated with each sample.

timepointMouse.IdSampleIDconcentration.ng.ulantibiotic
15186T0Day 01518615186T089.7640
15189T7Day 71518915189T77.7021
15187T0Day 01518715187T035.6920
15190T7Day 71519015190T718.3861
15188T0Day 01518815188T021.6190

Users can then use the following code to create graphical representation of the gut microbiota composition at the phylum and family levels comparing mice before and after antibiotic treatment and performing differential abundance with fdr correction.

# The plot and the output tables are stored into a single list object

my_plot <- plot_microbiota(
  ps_object = ps,
  exp_group = 'timepoint',
  sample_name = 'SampleID',
  hues = c("Purples", "Blues", "Greens", "Oranges"),
  differential_analysis = T,
  sig_lab = T,
  fdr_threshold = 0.05
)

print (my_plot$plot)

In addition to the graphical representation, two output tables (one for each level) are created containing statistical information concerning the differentially abundant taxa (Table 4). The tables can be accessed as follows:

#Display statistically significant taxa at the phylum level
print(my_plot$significant_table_main)

#Display statistically significant taxa at the family level
print(my_plot$significant_table_sub)

Table 4. Subset of data frame object output of the StackBarExtended R package.

The data frame output contains the results columns: baseMean, log2FoldChange, lfcSE, stat, pvalue (unajusted p-values) and padj (fdr-corrected p-values), and also includes metadata columns of related taxonomic information. The lfcSE gives the standard error of the log2FoldChange. For the Wald test, stat is the Wald statistic: the log2FoldChange divided by lfcSE, which is compared to a standard Normal distribution to generate a two-tailed pvalue. For the likelihood ratio test (LRT), stat is the difference in deviance between the reduced model and the full model, which is compared to a chi-squared distribution to generate a p-value.

baseMeanlog2FoldChangelfcSEstatpvaluepadjKingdomPhylumClassOrderFamily
ASV15761.02.20.445.13.65E-075.48E-07BacteriaVerrucomicrobiaVerrucomicrobiaeVerrucomicrobialesAkkermansiaceae
ASV23577.71.10.502.10.0371132710.038726891BacteriaBacteroidetesBacteroidiaBacteroidalesTannerellaceae
ASV5205.1-2.20.75-2.90.0035032410.004003703BacteriaBacteroidetesBacteroidiaBacteroidalesRikenellaceae
ASV8228.5-5.90.89-6.63.62E-117.25E-11BacteriaProteobacteriaDeltaproteobacteriaDesulfovibrionalesDesulfovibrionaceae
ASV9880.85.90.5211.31.75E-292.10E-28BacteriaProteobacteriaGammaproteobacteriaBetaproteobacterialesBurkholderiaceae

The primary objective of our example dataset is to identify which taxa at the Family and Phylum levels were impacted by antibiotic treatment in mice gut microbiota. This analysis is crucial for understanding how antibiotic interventions alter microbial communities. The graphical output (Figure 1) provided a clear assessment, revealing that among the four most abundant Phyla, three were significantly affected by the antibiotic treatment. Specifically, significant alterations were observed in Verrucomicrobia, Bacteroidetes, and Proteobacteria (fdr < 0.05), indicating a substantial shift in the microbial composition due to antibiotic exposure. Among the abundant Families (>1% of the total abundance) within these Phyla, 10 out of 11 exhibited significant alterations (fdr < 0.05). On average, the six remaining low-abundance Phyla, grouped into the “Others” category, accounted for 2.5% of the total relative abundance. Similarly, within the abundant Phyla, the 14 low-abundance Families accounted for 2.4% of the total relative abundance.

The two generated output tables provide detailed statistical information about the differentially abundant taxa at each level, including log2 fold-change, exact p-values, and fdr-corrected p-values. This comprehensive data allows for identification of taxa significantly impacted by the treatment. Notably, four additional low-abundance Phyla - Cyanobacteria, Deferribacteres, Tenericutes, and Actinobacteria - were found to be significantly affected (fdr < 0.05). Furthermore, a total of 24 families were identified as statistically different (fdr < 0.05) (Cuisiniere, 2024a).

These findings are consistent with previous studies that have reported significant shifts in gut microbiota composition following antibiotic treatment (Fishbein et al., 2023). Hence, this use case demonstrated the advantage of using StackbarExtended to present a clear and interpretable graphical representation while retaining the capacity to perform detailed statistical analysis.

Discussion

StackbarExtended offers the opportunity to enhance the widely utilized stacked bar graphical representation by incorporating information about taxonomy and statistical significance in regard to differentially abundant taxa. Furthermore, it enables users to retain data on rare taxa while maintaining phylogeny information. These functionalities have been implemented into a user-friendly R package, StackbarExtended, which is freely accessible on GitHub. The package facilitates the processing of large microbiota datasets and produces publication-ready and information-rich graphical representations with a high level of personalization. StackbarExtended is particularly useful to biology and molecular biology students, fellows and researchers working in microbiota analysis, and its output visualizations are suitable for publications and time-limited presentations at conferences and seminars requiring quick interpretation of displayed data.

Software availability

Source code available from: https://github.com/ThibaultCuisiniere/StackbarExtended.

Archived software available from: https://doi.org/10.5281/zenodo.11166800 (Cuisiniere, 2024a).

License: This R package and underlying data are freely available under the Gnu Public License (GPL-3).

Ethics and consent

Animal experiments were approved on April 3rd 2019 by the Institutional Animal Care committee of the Centre de recherche du Centre hospitalier de l’Université de Montréal (CRCHUM) in agreement with the guidelines of the Canadian Council of Animal Care, approval number C19006MSs. The study was carried out in compliance with the ARRIVE guidelines (Cuisiniere, 2024b).

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 09 Aug 2024
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Cuisiniere T and M Santos M. StackbarExtended: a user-friendly stacked bar-plot representation incorporating phylogenetic information and microbiota differential abundance analysis [version 1; peer review: 1 approved, 1 approved with reservations]. F1000Research 2024, 13:914 (https://doi.org/10.12688/f1000research.151662.1)
NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 09 Aug 2024
Views
20
Cite
Reviewer Report 29 Aug 2024
Monica Steffi Matchado, Ludwig-Maximilians-Universitat Munchen Medizinische Fakultat (Ringgold ID: 54187), Munich, Bavaria, Germany 
Approved with Reservations
VIEWS 20
Overview: 
This manuscript describes a new R package, StackbarExtended, which combines phylogenetic information and statistical significance into a single graphical output. This will visually improve microbiota composition in better understanding. The technique solves conventional stacked bar plot shortcomings, most ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Matchado MS. Reviewer Report For: StackbarExtended: a user-friendly stacked bar-plot representation incorporating phylogenetic information and microbiota differential abundance analysis [version 1; peer review: 1 approved, 1 approved with reservations]. F1000Research 2024, 13:914 (https://doi.org/10.5256/f1000research.166324.r312937)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
9
Cite
Reviewer Report 14 Aug 2024
Zheng Sun, Harvard Medical School, Boston, Massachusetts, USA 
Approved
VIEWS 9
Interesting tool! It provides a clear and effective visualization of microbiome data, particularly by using similar colors to represent genus and its corresponding higher family, which enhances human interpretability. To further improve the tool, I suggest the following: (1) Adapt ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Sun Z. Reviewer Report For: StackbarExtended: a user-friendly stacked bar-plot representation incorporating phylogenetic information and microbiota differential abundance analysis [version 1; peer review: 1 approved, 1 approved with reservations]. F1000Research 2024, 13:914 (https://doi.org/10.5256/f1000research.166324.r312942)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.

Comments on this article Comments (0)

Version 1
VERSION 1 PUBLISHED 09 Aug 2024
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.