Facilitating bootstrapped and rarefaction-based microbiome diversity analysis with q2-boots

Isaiah Raspet; Elizabeth Gehret; Chloe Herman; Jeff Meilander; Andrew Manley; Anthony Simard; Evan Bolyen; J Gregory Caporaso

doi:10.12688/f1000research.156295.1

Home Browse Facilitating bootstrapped and rarefaction-based microbiome diversity...

ALL Metrics

-

Views

-

Downloads

Get PDF

Get XML

Export

▬

✚

Software Tool Article

Facilitating bootstrapped and rarefaction-based microbiome diversity analysis with q2-boots

[version 1; peer review: 1 approved, 2 approved with reservations]

Isaiah Raspet^1,2, Elizabeth Gehret¹, Chloe Herman^1,2, [...] Jeff Meilander¹, Andrew Manley¹, Anthony Simard¹, Evan Bolyen¹, J Gregory Caporaso ^1,2

Isaiah Raspet^1,2, Elizabeth Gehret¹, [...] Chloe Herman^1,2, Jeff Meilander¹, Andrew Manley¹, Anthony Simard¹, Evan Bolyen¹, J Gregory Caporaso ^1,2

PUBLISHED 15 Jan 2025

Author details Author details

¹ Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, Arizona, 86011, USA
² School of Informatics, Computing and Cyber Systems, Northern Arizona University, Flagstaff, Arizona, 86011, USA

Isaiah Raspet
Roles: Software, Writing – Original Draft Preparation

Elizabeth Gehret
Roles: Supervision, Writing – Review & Editing

Chloe Herman
Roles: Supervision, Validation, Writing – Review & Editing

Jeff Meilander
Roles: Data Curation, Validation

Andrew Manley
Roles: Validation, Writing – Review & Editing

Anthony Simard
Roles: Software, Supervision, Writing – Review & Editing

Evan Bolyen
Roles: Conceptualization, Software, Supervision, Writing – Review & Editing

J Gregory Caporaso
Roles: Conceptualization, Software, Supervision, Validation, Writing – Original Draft Preparation, Writing – Review & Editing

OPEN PEER REVIEW

REVIEWER STATUS

Abstract

Background

We present q2-boots, a QIIME 2 plugin that facilitates bootstrapped and rarefaction-based microbiome diversity analysis. This plugin provides eight new actions that allow users to apply any of thirty different alpha diversity metrics and twenty-two beta diversity metrics to bootstrapped or rarefied feature tables, using a single QIIME 2 Pipeline command, or more granular QIIME 2 Action commands.

Results

Given a feature table, an even sampling depth, and the number of iterations to perform (n), the command qiime boots core-metrics will resample the feature table n times and compute alpha and beta diversity metrics on each resampled table. The results will be integrated in summary data artifacts that are identical in structure and type to results that would be generated by applying diversity metrics to a single table. This enables all the same downstream analytic tools to be applied to these tables and ensures that all collected data is considered when computing microbiome diversity metrics.

Conclusions

A challenge of this work was deciding how to integrate distance matrices that were computed on n resampled feature tables, as a simple average of pairwise distances (median or mean) does not account for the structure of distance matrices. q2-boots provides three options, and we show here that the results of these approaches are highly correlated. q2-boots is free and open source. Source code can be found at https://github.com/caporaso-lab/q2-boots; installation instructions and a tutorial can be found in the project’s documentation at https://q2-boots.readthedocs.io.

Keywords

microbiome, rarefaction, bootstrap, QIIME 2

Corresponding author: J Gregory Caporaso

Competing interests: EB and JGC are co-founders and hold equity in Cymis Benefit Corporation, a biological data science software company.

Grant information: This work was funded in part by the National Cancer Institute grant 1U24CA248454-01 to JGC.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Copyright: © 2025 Raspet I et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

How to cite: Raspet I, Gehret E, Herman C et al. Facilitating bootstrapped and rarefaction-based microbiome diversity analysis with q2-boots [version 1; peer review: 1 approved, 2 approved with reservations]. F1000Research 2025, 14:87 (https://doi.org/10.12688/f1000research.156295.1) First published: 15 Jan 2025, 14:87 (https://doi.org/10.12688/f1000research.156295.1) Latest published: 15 Jan 2025, 14:87 (https://doi.org/10.12688/f1000research.156295.1)

Background

Two recent papers^1,2 have brought discussion about rarefaction back into focus in microbiome diversity analysis. Inspired by this work we developed q2-boots, a new QIIME 2³ plugin that facilitates bootstrapped and rarefaction-based diversity analysis for microbiome researchers and data scientists. While rarefaction analysis has always been possible in QIIME 2, the diversity analysis workflow that most users consider the default (i.e., running qiime diversity core-metrics, or qiime diversity core-metrics-phylogenetic) only includes a single step of rarefying the input feature table to a user-specified sequencing depth. Other workflows, such as qiime diversity alpha-rarefaction and qiime diversity beta-rarefaction, enable rarefaction analysis (i.e., multiple iterations of rarefying the input feature table and comparison of the diversity metrics computed on each of the rarefied tables), but because these result in terminal QIIME 2 Visualizations, rather than QIIME 2 Artifacts that can be used in downstream steps such as statistical analysis, performing analysis on rarefaction data in QIIME 2 is currently not common in practice.

Implementation

q2-boots was designed to serve as a stand-in replacement for the widely used q2-diversity plugin. It provides eight new Actions (i.e., qiime2.Action objects) that broadly fall into four categories: resampling ( Figure 1), the alpha diversity suite of Actions ( Figure 2), the beta diversity suite of Actions ( Figure 3), and the core metrics Pipeline.

Figure 1. q2-boots workflow for resampling a feature table with or without replacement.

Figure 2. q2-boots workflow for performing alpha diversity analysis, integrating resampling.

Figure 3. q2-boots workflow for performing beta diversity analysis, integrating resampling.

The computationally expensive steps of the q2-boots plugin (i.e., resampling tables, and diversity calculations on the many resulting tables) can all be run in parallel if requested by the user, supported by QIIME 2’s use of parsl⁴ for its formal parallel computing support. This enables full use of individual multiprocessor computers (such as laptops or cloud-based virtual machines such as those available from Amazon Web Services or Digital Ocean) or multi-node high performance computing machinery (e.g., university cluster computers). Additionally, by nature of being a QIIME 2 plugin, failed Pipeline runs can be resumed avoiding the need to recompute results. This is useful, for example, if an out-of-memory error requires a re-run of analysis with a larger memory allocation after a large portion of work has already been completed.

As we discuss implementation details in the following subsections we use some terms that have specific meaning in QIIME 2, such as Action, Pipeline, and Artifact Class. The glossary of Developing with QIIME 2⁵ is the canonical reference for definitions of these terms.

Operation

As of this writing, q2-boots can be installed in the latest development version (2025.4) and release version (2024.10) of the amplicon and metagenome distributions of QIIME 2 by following instructions in the project’s documentation. q2-boots can be installed on macOS, Linux, and Windows Subsystem for Linux (WSL), or installed in QIIME 2 Docker containers. The Actions and Pipelines provided in q2-boots provide different operational workflows which are documented in the following sub-sections.

Resampling

A single Action, resample, supports resampling an input feature table (QIIME 2 artifact class: FeatureTable [Frequency]) to a user-specified per-sample sampling depth n times. Sampling is performed with replacement (i.e., bootstrapped) or without replacement (i.e., rarefied²). Samples with total frequencies less than the specified sampling depth are dropped from the resulting feature tables. The output of this Action is n feature tables that can be used in downstream QIIME 2 Actions, or which can be exported to biom-formatted files⁶ for analysis with other tools.

In a comparison of bootstrapped and rarefaction-based diversity analysis on a collection of 1,018 samples spanning human excrement, compost, and soil samples from the Meilander et al. (2024) “gut-to-soil axis” human excrement composting (HEC) experiment,⁷ we find that the results are effectively identical when comparing averaged alpha and beta diversity metrics computed from the resulting tables (Supplementary Table 1). In two iterations each of bootstrapped and rarefaction-based diversity analysis, 100 resampled tables were created. Four alpha and four beta diversity metrics were computed on each resampled table, and the resulting alpha diversity vectors and beta diversity distance matrices were averaged. On a per-diversity metric basis, correlation between the averaged alpha diversity vectors or beta diversity distance matrices resulting from bootstrapped or rarefaction-based resampled feature tables, measured with Spearman rank correlation and Mantel test using Spearman rank correlation respectively, all achieved correlation coefficient (rho) values greater than 0.99 and p-values less than 0.001. These results, and all code used to perform the analysis, are presented in Supplementary Data.

Rarefaction-based diversity analysis is more common than bootstrapped diversity analysis in microbiome research, but has its roots in macro-ecology, where the population sizes being sampled from are small relative to those sampled in microbial ecology. The availability of both approaches through a common interface will facilitate assessment of whether one is more appropriate in microbiome diversity analysis generally, or under specific circumstances.

Alpha diversity suite

The alpha diversity suite introduces three Actions: alpha, alpha-collection, and alpha-average. alpha is an analog of q2-diversity’s alpha and alpha-phylogenetic actions. Those Actions were introduced in q2-diversity before optional input Artifacts were supported in QIIME 2, so two Actions were required to support phylogenetic diversity metrics (which take a feature table and a phylogenetic tree as input) and non-phylogenetic diversity metrics (which only take a feature table as input). The implementation of alpha in q2-boots improves upon this by making the phylogenetic tree an optional input and therefore supports both phylogenetic and non-phylogenetic diversity metrics. This same change relative to q2-diversity was made for the beta diversity suite of Actions and core metrics. (Because the QIIME 2 Amplicon Distribution strives to maintain backward compatibility, q2-diversity has not been updated to use optional inputs for these Actions.)

The implementation of alpha in q2-boots integrates resampling of the input feature table n times. In addition to providing the typical inputs and parameters for an alpha diversity computation (a feature table, an optional phylogenetic tree, and the name of a supported diversity metric to compute), the user provides an even sampling depth, whether to sample with or without replacement, and the number of resampled feature tables to compute when calling qiime boots alpha. These parameters are passed through to the resample Action described above. The input feature table is resampled n times to create n feature tables, the user-specified alpha diversity metric is computed on all samples in each feature table, the per sample diversity metrics computed on each table are averaged (the averaging method is median, by default), and the Action outputs a single vector of the averaged per-sample alpha diversity metric values. The resulting vector can be used in any downstream actions compatible with this type (QIIME 2 artifact class: SampleData [AlphaDiversity]) in the QIIME 2 ecosystem, or exported as tab-separated text for use elsewhere. This facilitates the use of bootstrapping or rarefaction in alpha diversity analysis.

The alpha-collection Action works similarly to the alpha Action, but rather than integrating the averaging step, it outputs the full collection of alpha diversity vectors that were computed on each of the resampled feature tables. It thus allows users to interact directly with these diversity vectors to support estimates of variance across resampled tables, or other computations that require the full data rather than simple averages.

The final tool in the alpha suite, alpha-average, takes a collection of alpha diversity vectors, such as those generated by alpha-collection, computes their average (the default averaging method is median) and outputs a single vector containing per-sample averages.

alpha is implemented as a Pipeline (i.e., a qiime2.Pipeline object) that integrates the resample, alpha-collection, and alpha-average Actions into a single command. The availability of the Pipeline facilitates the use of this workflow by all QIIME 2 users. The availability of the component Actions (alpha-collection and alpha-average) provides flexibility for more advanced users who may need access to the intermediary data.

Beta diversity suite

The beta diversity suite of Actions in q2-boots mirrors the Actions in the alpha diversity suite. beta is the analog to q2-diversity’s beta and beta-phylogenetic Actions, integrating bootstrapping or rarefaction and outputting a distance matrix (QIIME 2 artifact class: DistanceMatrix). Like alpha, beta is a Pipeline composed of calls to resample, beta-collection, and beta-average. beta-collection outputs the collection of distance matrices computed on each resampled table, and beta-average averages those to create a single distance matrix that can be used in any downstream analysis that operates on distance matrices in the QIIME 2 ecosystem or exported as tab-separated text for analysis with other tools.

Averaging distance matrices is more complex than averaging alpha diversity vectors because distance matrices are observations within a metric space with specific properties: 1. Hollowness (i.e., the diagonal of the distance matrix must be zero); 2. Non-negative (i.e., all values must be greater than or equal to zero); 3. Symmetry (i.e., the values must be symmetric across the diagonal, such that the distance between samples A and B is always equal to the distance between samples B and A); and 4. Triangle Inequality (e.g., the distance between sample A and C is always greater than or equal to the distance between samples A and B plus the distance between samples B and C).

beta-average, and all other Actions that use it in q2-boots, provides three options for averaging distance matrices: medoid, which retains all of these properties (assuming the distance metric used produces distance matrices with these properties), and non-metric-median and non-metric-mean, which retain properties 1-3, but may not retain property 4.

medoid takes a set of distance matrices, identifies the distance matrix with the smallest sum of dissimilarities (by Euclidean distance) to all of the other distance matrices in the set, and returns that distance matrix as a representative of the set. Thus if all distance matrices in the set are metric, the selected representative will also be metric. non-metric-median and non-metric-mean are naive approaches that take a set of distance matrices and compute the per-cell median or mean, respectively, across the set of distance matrices. The result is a new distance matrix composed of the averaged pairwise distances.

q2-boots uses the medoid implementation from hdmedians (https://github.com/daleroberts/hdmedians, accessed 9 May 2024). In practice, we have found that the memory requirements of this method don’t scale well to large numbers of n (e.g., n > 100). To assess whether non-metric-median or non-metric-mean is a suitable surrogate for medoid in lieu of a more memory efficient medoid implementation, we computed distance matrices for the HEC experiment described above using four beta diversity metrics. We find that on a per-distance-metric basis, the distance matrices resulting from applying qiime boots beta-average with the medoid, non-metric-mean, and non-metric-median averaging methods across 100 bootstrap and rarefaction iterations are effectively identical (Supplementary Table 1). All Mantel tests achieved correlation coefficients greater than 0.99 and p-values less than 0.001. We additionally compared multiple iterations of bootstrapping and rarefaction followed by averaging of the resulting distance matrices with medoid, non-metric-median, and non-metric-mean, to assess whether different random seeds during resampling would impact the similarity of the resulting distance matrices. Using the same four distance metrics, we again found that all Mantel rho scores were greater than 0.99 and all p-values were less than 0.001. These results, and all code used to perform the analysis, are presented in Supplementary Data.

Core metrics

q2-boots provides an omnibus Action, core-metrics, that captures the behavior of q2-diversity’s core-metrics and core-metrics-phylogenetic Actions. Like the alpha and beta Actions, core-metrics integrates bootstrapping and rarefaction. The output it creates mirrors that of the corresponding Actions in q2-diversity: alpha diversity vectors that are averaged across the collection of alpha diversity vectors computed for each rarefied feature table; beta diversity distance matrices that are averaged across the collection of distance matrices computed for each rarefied feature table; principal coordinates analysis (PCoA) matrices computed on each average distance matrix; and Emperor-based interactive ordination plots⁸ for each PCoA matrix. This Action enables QIIME 2 users to run bootstrapped or rarefaction-based diversity analysis with no additional work relative to running diversity analysis through the widely used core-metrics and core-metrics-phylogenetic Actions in q2-diversity.

Testing, distribution, and maintenance

q2-boots integrates unit tests that cover the breadth of its functionality. These tests are automatically run on every commit to the main branch of the code repository, and on all pull requests.

q2-boots is distributed as a QIIME 2 community plugin, meaning that at this time it is not included in the canonical QIIME 2 distributions, but rather is built and distributed as recommended in Developing with QIIME 2. As of this writing, q2-boots can be installed in the latest development version (2025.4) and release version (2024.10) of the amplicon and metagenome distributions of QIIME 2 by following instructions in the project’s documentation at https://q2-boots.readthedocs.io. Source code for q2-boots is available on GitHub at https://github.com/caporaso-lab/q2-boots.

q2-boots was developed for a master’s degree in computer science project and will be maintained by the Caporaso Lab at Northern Arizona University. Technical support is available on the QIIME 2 Forum (https://forum.qiime2.org).

Conclusions

q2-boots provides eight actions that facilitate bootstrapped and rarefaction-based microbiome diversity analysis workflows. The topic of rarefaction in microbiome analysis remains controversial, though as pointed out in,¹ these analyses do enable consideration of all data, including low abundance features such as rare taxa. Whether using these Actions directly, as advocated in,² or as a basis for comparison against more sophisticated normalization techniques, q2-boots makes it straightforward for researchers to integrate bootstrapped and/or rarefaction-based microbiome diversity analysis in their workflows.

Ethics and consent to participate

Human fecal samples were collected under Northern Arizona University IRB protocol 1773199-3, Bridging the Gap between Gut and Soil Microbiomes with written informed consent from the study participants. Ethical approval for this study was provided on 9 Jul 2021.

Authors’ contributions

IR and JGC were the primary developers of q2-boots. EG, CH, AS, EB, and JGC advised on the development of q2-boots. AS and EB performed code review prior to the initial release of q2-boots. JM and JGC generated the data used in Supplementary Table 1. AO and JGC performed testing of q2-boots on real-world data. EB and JGC conceived of the project. IR and JGC drafted the first version of the manuscript. All authors reviewed and provided feedback on the manuscript.

Supplementary files

Supplementary Data and Supplementary Table 1 are available at https://doi.org/10.5281/zenodo.13287126.⁹ They are not included in this pre-print due to size.

Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).

Availability of data and material

All data analyzed in this study, including Supplementary Table 1 and the code used to generate it, are available in Supplementary Data at https://doi.org/10.5281/zenodo.13287126.⁹

Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).

Software availability

The q2-boots source code is available on GitHub at https://github.com/caporaso-lab/q2-boots under the BSD 3-clause license.

The specific version used for the analyses presented here (version 0.0.1+46.g1c32499) is archived at https://doi.org/10.5281/zenodo.13287126.⁹ q2-boots is open source and free for all use, including commercial. q2-boots is written in Python 3.

The project documentation can be found at https://q2-boots.readthedocs.io.

References

1. Schloss PD: Waste not, want not: revisiting the analysis that called into question the practice of rarefaction. mSphere. 2023 Dec 6; e0035523.
2. Schloss PD: Rarefaction is currently the best approach to control for uneven sequencing effort in amplicon sequence analyses. mSphere. 2024 Feb 28; 9(2): e0035423. PubMed Abstract | Publisher Full Text | Free Full Text
3. Bolyen E, Rideout JR, Dillon MR, et al.: Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat. Biotechnol. 2019 Jul 24.
4. Babuji Y, Woodard A, Li Z, et al.: Parsl: Pervasive Parallel Programming in Python. Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing. New York, NY, USA: Association for Computing Machinery; 2019; pp. 25–36. (HPDC’19).
5. Caporaso JG, Bolyen E: Developing with QIIME 2.2024. Reference Source
6. McDonald D, Clemente JC, Kuczynski J, et al.: The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome. Gigascience. 2012 Jul 12; 1(1): 7. PubMed Abstract | Publisher Full Text | Free Full Text
7. Meilander J, Herman C, Manley A, et al.: Upcycling human excrement: The gut microbiome to soil microbiome axis. arXiv [q-bio.GN]. 2024.
8. Vázquez-Baeza Y, Pirrung M, Gonzalez A, et al.: EMPeror: a tool for visualizing high-throughput microbial community data. Gigascience. 2013 Nov 26; 2(1): 16. PubMed Abstract | Publisher Full Text
9. Raspet I, Gehret E, Herman C, et al.: Data supporting q2-boots manuscript analysis. [Dataset]. Zenodo. 2024. Publisher Full Text

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 15 Jan 2025

Author details Author details

¹ Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, Arizona, 86011, USA
² School of Informatics, Computing and Cyber Systems, Northern Arizona University, Flagstaff, Arizona, 86011, USA

Isaiah Raspet
Roles: Software, Writing – Original Draft Preparation

Elizabeth Gehret
Roles: Supervision, Writing – Review & Editing

Chloe Herman
Roles: Supervision, Validation, Writing – Review & Editing

Jeff Meilander
Roles: Data Curation, Validation

Andrew Manley
Roles: Validation, Writing – Review & Editing

Anthony Simard
Roles: Software, Supervision, Writing – Review & Editing

Evan Bolyen
Roles: Conceptualization, Software, Supervision, Writing – Review & Editing

J Gregory Caporaso
Roles: Conceptualization, Software, Supervision, Validation, Writing – Original Draft Preparation, Writing – Review & Editing

Competing interests

EB and JGC are co-founders and hold equity in Cymis Benefit Corporation, a biological data science software company.

Grant information

This work was funded in part by the National Cancer Institute grant 1U24CA248454-01 to JGC.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Article Versions (1)

version 1

Published: 15 Jan 2025, 14:87

https://doi.org/10.12688/f1000research.156295.1

Copyright

© 2025 Raspet I et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download

Export To

metrics

	Views	Downloads
F1000Research	-	-
PubMed Central Data from PMC are received and updated monthly.	-	-

Citations

0

SEE MORE DETAILS

CITE

how to cite this article

Raspet I, Gehret E, Herman C et al. Facilitating bootstrapped and rarefaction-based microbiome diversity analysis with q2-boots [version 1; peer review: 1 approved, 2 approved with reservations]. F1000Research 2025, 14:87 (https://doi.org/10.12688/f1000research.156295.1)

NOTE: If applicable, it is important to ensure the information in square brackets after the title is included in all citations of this article.

track

receive updates on this article

Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?

Key to Reviewer Statuses VIEW HIDE

ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested

Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.

Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions

Version 1

VERSION 1

PUBLISHED 15 Jan 2025

Views

7

Reviewer Report 05 May 2025

Luke R Thompson, Northern Gulf Institute, Mississippi State University, Oktibbeha, USA

Approved with Reservations

https://doi.org/10.5256/f1000research.171585.r374737

This articles describes a new QIIME 2 plugin, q2-boots, which provides bootstrapping and rarefaction of feature tables, perform alpha and beta diversity analysis on those tables, and summarize the output. The article provides an overview of the plugin and its ... Continue reading

This articles describes a new QIIME 2 plugin, q2-boots, which provides bootstrapping and rarefaction of feature tables, perform alpha and beta diversity analysis on those tables, and summarize the output. The article provides an overview of the plugin and its commands, plus an example dataset and analysis. Much more detailed guidance and a full tutorial are provided in the online documentation. This software should be broadly useful to microbiome researchers and represents a valuable contribution to the microbiome field.

Abstract
- In the first sentence of the results section, it might be helpful to insert "n times—with replacement (bootstrapped) or without replacement (rarefied)—and compute" to define these two related terms in the abstract.
- In the conclusion, it's unclear why the fact that the three metrics for averaging beta-diversity matrices being highly correlated would be relevant. Could you briefly add that this allows selection of a more computationally efficient metric with minimal tradeoffs?

Keywords
- Suggest adding "environmental DNA", "metabarcoding", and "amplicon sequencing", as these are related fields/techniques that share many features with microbiome research and whose practitioners may be interested in this plugin.

Background
- In second sentence, suggest changing "microbiome researchers and data scientists" to "microbiome and environmental DNA researchers and other data scientists working with biological feature tables."
- I agree with the other reviewer that some additional background on why bootstrapping/rarefaction are important would be helpful. Just a few sentences summarizing the two papers by Schloss would provide more justification for using this plugin.
- Related, the advantages to researchers of using this plugin could be highlighted in the figures and perhaps in the Conclusions.

Figures
- The fonts in the workflow diagrams are very small. Would it be possible to make them bigger, perhaps by adding line breaks?
- The figures do not help show what the tool does or why it's useful. Would it be possible to include some figures showing how bootstrapping/rarefaction improve an analysis? These could be extracted from the tutorial. A short explanation of the value added by this plugin could go in the main text, perhaps in the Conclusions.

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Yes
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Yes
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Yes

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Microbiome and environmental DNA research, including metabarcoding analysis and data science

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Respond or Comment

Views

9

Reviewer Report 18 Apr 2025

Vitor Heidrich, CIBIO, Universita degli Studi di Trento, Trento, Trentino-Alto Adige/South Tyrol, Italy

Approved

https://doi.org/10.5256/f1000research.171585.r374740

The authors developed a QIIME 2 plugin allowing to perform diversity analyses following repeated rarefaction of feature-tables followed by summarizing the results generated by each rarefaction run. I agree with the authors that the plugin developed was much needed by ... Continue reading

The authors developed a QIIME 2 plugin allowing to perform diversity analyses following repeated rarefaction of feature-tables followed by summarizing the results generated by each rarefaction run. I agree with the authors that the plugin developed was much needed by the QIIME 2 community, as I also have followed the discussion brought to light by PD Schloss in his recent papers and was surprised to realize it was not possible to perform repeated rarefaction in QIIME 2. The rational for the plugin is clearly and concisely stated, as well as the actions implemented in it. The authors were also very fair in stating that other normalization techniques (possibly more tailored to a particular kind of analysis) do exist. Although not essential for the understanding of this work, I only wonder whether mentioning some examples of those other non-rarefaction-based normalization approaches (or specifically those available in QIIME 2) could be useful for the reader (could be an "e.g." in this sentence of the Conclusions: "or as a basis for comparison against more sophisticated normalization techniques").

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Yes
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Yes
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Yes

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Computational biology, metagenomics, human microbiome

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

CITE

Report a concern

Respond or Comment

Views

10

Reviewer Report 10 Apr 2025

Digvijay Verma, Babasaheb Bhimrao Ambedkar University, Lucknow, India

Approved with Reservations

https://doi.org/10.5256/f1000research.171585.r370290

The manuscript introduces q2-boots, a plugin for QIIME 2 aimed at simplifying the analysis of microbiome diversity through bootstrapping and rarefaction techniques. The authors present a comprehensive set of functionalities that allow for the calculation of thirty distinct alpha diversity ... Continue reading

The manuscript introduces q2-boots, a plugin for QIIME 2 aimed at simplifying the analysis of microbiome diversity through bootstrapping and rarefaction techniques. The authors present a comprehensive set of functionalities that allow for the calculation of thirty distinct alpha diversity metrics and twenty-two beta diversity metrics on resampled or rarefied feature tables. This tool's incorporation into the QIIME 2 framework is a significant advancement in microbiome research, effectively filling a notable gap in existing QIIME 2 methodologies. The manuscript clearly outlines the various functionalities offered by q2-boots and their integration within the broader QIIME 2 ecosystem. Furthermore, the authors provide a thorough comparison of bootstrapping and rarefaction-based diversity analyses, demonstrating a strong correlation between the results derived from both approaches. This analysis addresses a critical methodological issue and reassures users regarding the effectiveness of either method. In summary, this manuscript makes a significant contribution to the QIIME 2 ecosystem and the field of microbiome research as a whole. The user-friendly integration of bootstrapping and rarefaction-based diversity analysis has the potential to improve the rigor and reproducibility of microbiome studies. The writing is clear and showcases robust experimental validation, making this tool a valuable addition to the QIIME 2 suite.

With minor clarifications and further details on certain methodological aspects, it could serve as an even more beneficial resource for the researchers. The following suggestions may be incorporated to further improve the manuscript.
While the authors describe three methods for averaging distance matrices—medoid, non-metric-median, and non-metric-mean—there could be more discussion on the implications of each method for downstream analysis. Specifically, the manuscript mentions that all methods provide highly similar results in the specific case study but does not delve deeply into which approach might be preferable in various contexts (e.g., for different types of microbiome data or diversity metrics). Additional guidance on when to use each method would be valuable for users.

Although the manuscript mentions the parallel computing capabilities, a more detailed discussion on the performance of q2-boots with large datasets, for example, when n > 100, would be beneficial. The results from the HEC experiment are promising, but the paper could emphasize how q2-boots scales with dataset size and the corresponding computational requirements e.g., memory usage.

The manuscript does not mention how q2-boots interacts with other QIIME 2 plugins outside of the core-diversity workflows. For example, it would be helpful to discuss any potential compatibility with statistical or visualization tools in QIIME 2, such as q2-longitudinal or q2-phylogeny. This would give users a better idea of how q2-boots fits within a larger analysis pipeline.
A more detailed example of how q2-boots can be used in practice, including a step-by-step guide or case study, would help potential users understand how to incorporate it into their workflows. This could be particularly useful for new QIIME 2 users.
A more explicit discussion on the limitations of bootstrapping and rarefaction in microbiome studies, and how q2-boots addresses or mitigates these limitations, would be helpful. The authors briefly mention the controversy around rarefaction but could benefit from elaborating on the pros and cons of rarefaction versus bootstrapping and under what conditions each method might be preferred.

Minor suggestions:
Please double-check for consistency in the use of terms, particularly the distinction between "alpha diversity" and "beta diversity."
Clarify the averaging method for diversity vectors (e.g., median by default) in relevant sections for better transparency.

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Yes
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Yes
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Yes

Competing Interests: No competing interests were disclosed.

Reviewer Expertise: Metagenomics, Microbial divrsity, Extremozymes

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

CITE

Report a concern

Respond or Comment

Comments on this article Comments (0)

Version 1

VERSION 1 PUBLISHED 15 Jan 2025

Open Peer Review

Reviewer Status

Reviewer Reports

	Invited Reviewers
	1	2	3
Version 1 15 Jan 25	read	read	read

Digvijay Verma, Babasaheb Bhimrao Ambedkar University, Lucknow, India
Vitor Heidrich, Universita degli Studi di Trento, Trento, Italy
Luke R Thompson, Northern Gulf Institute, Mississippi State University, Oktibbeha, USA

Comments on this article

All Comments(0)

Add a comment

Sign up for content alerts

Browse by related subjects

Back to all reports

Reviewer Report

7 Views

05 May 2025 | for Version 1

Luke R Thompson, Northern Gulf Institute, Mississippi State University, Oktibbeha, USA

7 Views Cite this report Responses(0)

Approved With Reservations

This articles describes a new QIIME 2 plugin, q2-boots, which provides bootstrapping and rarefaction of feature tables, perform alpha and beta diversity analysis on those tables, and summarize the output. The article provides an overview of the plugin and its commands, plus an example dataset and analysis. Much more detailed guidance and a full tutorial are provided in the online documentation. This software should be broadly useful to microbiome researchers and represents a valuable contribution to the microbiome field.

Abstract
- In the first sentence of the results section, it might be helpful to insert "n times—with replacement (bootstrapped) or without replacement (rarefied)—and compute" to define these two related terms in the abstract.
- In the conclusion, it's unclear why the fact that the three metrics for averaging beta-diversity matrices being highly correlated would be relevant. Could you briefly add that this allows selection of a more computationally efficient metric with minimal tradeoffs?

Keywords
- Suggest adding "environmental DNA", "metabarcoding", and "amplicon sequencing", as these are related fields/techniques that share many features with microbiome research and whose practitioners may be interested in this plugin.

Background
- In second sentence, suggest changing "microbiome researchers and data scientists" to "microbiome and environmental DNA researchers and other data scientists working with biological feature tables."
- I agree with the other reviewer that some additional background on why bootstrapping/rarefaction are important would be helpful. Just a few sentences summarizing the two papers by Schloss would provide more justification for using this plugin.
- Related, the advantages to researchers of using this plugin could be highlighted in the figures and perhaps in the Conclusions.

Figures
- The fonts in the workflow diagrams are very small. Would it be possible to make them bigger, perhaps by adding line breaks?
- The figures do not help show what the tool does or why it's useful. Would it be possible to include some figures showing how bootstrapping/rarefaction improve an analysis? These could be extracted from the tutorial. A short explanation of the value added by this plugin could go in the main text, perhaps in the Conclusions.

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Yes
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Yes
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Microbiome and environmental DNA research, including metabarcoding analysis and data science

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

9 Views

18 Apr 2025 | for Version 1

Vitor Heidrich, CIBIO, Universita degli Studi di Trento, Trento, Trentino-Alto Adige/South Tyrol, Italy

9 Views Cite this report Responses(0)

Approved

The authors developed a QIIME 2 plugin allowing to perform diversity analyses following repeated rarefaction of feature-tables followed by summarizing the results generated by each rarefaction run. I agree with the authors that the plugin developed was much needed by the QIIME 2 community, as I also have followed the discussion brought to light by PD Schloss in his recent papers and was surprised to realize it was not possible to perform repeated rarefaction in QIIME 2. The rational for the plugin is clearly and concisely stated, as well as the actions implemented in it. The authors were also very fair in stating that other normalization techniques (possibly more tailored to a particular kind of analysis) do exist. Although not essential for the understanding of this work, I only wonder whether mentioning some examples of those other non-rarefaction-based normalization approaches (or specifically those available in QIIME 2) could be useful for the reader (could be an "e.g." in this sentence of the Conclusions: "or as a basis for comparison against more sophisticated normalization techniques").

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Yes
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Yes
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Computational biology, metagenomics, human microbiome

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Respond to this report

Responses (0)

Back to all reports

Reviewer Report

10 Views

10 Apr 2025 | for Version 1

Digvijay Verma, Babasaheb Bhimrao Ambedkar University, Lucknow, India

10 Views Cite this report Responses(0)

Approved With Reservations

The manuscript introduces q2-boots, a plugin for QIIME 2 aimed at simplifying the analysis of microbiome diversity through bootstrapping and rarefaction techniques. The authors present a comprehensive set of functionalities that allow for the calculation of thirty distinct alpha diversity metrics and twenty-two beta diversity metrics on resampled or rarefied feature tables. This tool's incorporation into the QIIME 2 framework is a significant advancement in microbiome research, effectively filling a notable gap in existing QIIME 2 methodologies. The manuscript clearly outlines the various functionalities offered by q2-boots and their integration within the broader QIIME 2 ecosystem. Furthermore, the authors provide a thorough comparison of bootstrapping and rarefaction-based diversity analyses, demonstrating a strong correlation between the results derived from both approaches. This analysis addresses a critical methodological issue and reassures users regarding the effectiveness of either method. In summary, this manuscript makes a significant contribution to the QIIME 2 ecosystem and the field of microbiome research as a whole. The user-friendly integration of bootstrapping and rarefaction-based diversity analysis has the potential to improve the rigor and reproducibility of microbiome studies. The writing is clear and showcases robust experimental validation, making this tool a valuable addition to the QIIME 2 suite.

With minor clarifications and further details on certain methodological aspects, it could serve as an even more beneficial resource for the researchers. The following suggestions may be incorporated to further improve the manuscript.
While the authors describe three methods for averaging distance matrices—medoid, non-metric-median, and non-metric-mean—there could be more discussion on the implications of each method for downstream analysis. Specifically, the manuscript mentions that all methods provide highly similar results in the specific case study but does not delve deeply into which approach might be preferable in various contexts (e.g., for different types of microbiome data or diversity metrics). Additional guidance on when to use each method would be valuable for users.

Although the manuscript mentions the parallel computing capabilities, a more detailed discussion on the performance of q2-boots with large datasets, for example, when n > 100, would be beneficial. The results from the HEC experiment are promising, but the paper could emphasize how q2-boots scales with dataset size and the corresponding computational requirements e.g., memory usage.

The manuscript does not mention how q2-boots interacts with other QIIME 2 plugins outside of the core-diversity workflows. For example, it would be helpful to discuss any potential compatibility with statistical or visualization tools in QIIME 2, such as q2-longitudinal or q2-phylogeny. This would give users a better idea of how q2-boots fits within a larger analysis pipeline.
A more detailed example of how q2-boots can be used in practice, including a step-by-step guide or case study, would help potential users understand how to incorporate it into their workflows. This could be particularly useful for new QIIME 2 users.
A more explicit discussion on the limitations of bootstrapping and rarefaction in microbiome studies, and how q2-boots addresses or mitigates these limitations, would be helpful. The authors briefly mention the controversy around rarefaction but could benefit from elaborating on the pros and cons of rarefaction versus bootstrapping and under what conditions each method might be preferred.

Minor suggestions:
Please double-check for consistency in the use of terms, particularly the distinction between "alpha diversity" and "beta diversity."
Clarify the averaging method for diversity vectors (e.g., median by default) in relevant sections for better transparency.

Is the rationale for developing the new software tool clearly explained?

Yes
Is the description of the software tool technically sound?

Yes
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?

Yes
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?

Yes
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?

Yes

Competing Interests

No competing interests were disclosed.

Reviewer Expertise

Metagenomics, Microbial divrsity, Extremozymes

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Respond to this report

Responses (0)

[1] 1. Schloss PD: Waste not, want not: revisiting the analysis that called into question the practice of rarefaction. mSphere. 2023 Dec 6; e0035523.

[2] 2. Schloss PD: Rarefaction is currently the best approach to control for uneven sequencing effort in amplicon sequence analyses. mSphere. 2024 Feb 28; 9(2): e0035423. PubMed Abstract | Publisher Full Text | Free Full Text

[3] 3. Bolyen E, Rideout JR, Dillon MR, et al.: Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat. Biotechnol. 2019 Jul 24.

[4] 4. Babuji Y, Woodard A, Li Z, et al.: Parsl: Pervasive Parallel Programming in Python. Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing. New York, NY, USA: Association for Computing Machinery; 2019; pp. 25–36. (HPDC’19).

[5] 5. Caporaso JG, Bolyen E: Developing with QIIME 2.2024. Reference Source

[6] 6. McDonald D, Clemente JC, Kuczynski J, et al.: The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome. Gigascience. 2012 Jul 12; 1(1): 7. PubMed Abstract | Publisher Full Text | Free Full Text

[7] 7. Meilander J, Herman C, Manley A, et al.: Upcycling human excrement: The gut microbiome to soil microbiome axis. arXiv [q-bio.GN]. 2024.

[8] 8. Vázquez-Baeza Y, Pirrung M, Gonzalez A, et al.: EMPeror: a tool for visualizing high-throughput microbial community data. Gigascience. 2013 Nov 26; 2(1): 16. PubMed Abstract | Publisher Full Text

[9] 9. Raspet I, Gehret E, Herman C, et al.: Data supporting q2-boots manuscript analysis. [Dataset]. Zenodo. 2024. Publisher Full Text

Facilitating bootstrapped and rarefaction-based microbiome diversity analysis with q2-boots

Abstract

Background

Results

Conclusions

Keywords

Background

Implementation

Figure 1. q2-boots workflow for resampling a feature table with or without replacement.

Figure 2. q2-boots workflow for performing alpha diversity analysis, integrating resampling.

Figure 3. q2-boots workflow for performing beta diversity analysis, integrating resampling.

Operation

Resampling

Alpha diversity suite

Beta diversity suite

Core metrics

Testing, distribution, and maintenance

Conclusions

Ethics and consent to participate

Authors’ contributions

Supplementary files

Availability of data and material

Software availability

References

Comments on this article Comments (0)

Open Peer Review

Comments on this article Comments (0)

Open Peer Review

Reviewer Status

Reviewer Reports

Comments on this article

Browse by related subjects

Competing Interests Policy

Stay Updated