ShinyDiversity-Understanding Alpha and Beta Diversity through Interactive Visualizations

In the past few decades, ecologists have developed many diversity indices to describe within and between sample diversity. Consequently, it can be difficult to determine which index to choose and how the distribution of microbial communities affect these indices. We've developed an interactive application, , that dynamically visualizes different alpha or ShinyDiversity beta diversity indices. In enabling users to select and simultaneously visualize different indices, our application aims to facilitate understanding of how the microbial data affects selected indices.

This article is included in the gateway.RPackage

Introduction
Microbial survey studies (i.e.microbiome survey analysis) use alpha and beta diversity indices to estimate within and between sample diversity.Alpha diversity is the diversity in a single sample site (e.g.human gut) and beta diversity describes the difference in diversity between those sites 1 (e.g.different regions of the body).With a variety of alpha and beta diversity indices available, it can be difficult to determine which index to choose.
Previously developed user-friendly HTML web applications such as Microbiome Analyst 2 and Dynamic Assessment of Microbial Ecology (DAME) 3 allow users to visualize alpha and beta diversity.However, these tools do not address and explore how the different alpha and beta diversity indices impact their results.In this regard, we've developed an interactive user-friendly application that utilizes real data to dynamically visualize different alpha or beta diversity indices (Figure 1).The user is able to see how the distribution, normalization, and datasets alter the resulting diversity indices.Ultimately, this leads to an intuitive understanding of how these different diversity indices affect the data.The majority of the tool's development was undertaken as part of the hackseq genomics hackathon in Vancouver, BC.

Implementation
ShinyDiversity is an interactive HTML web application that utilizes the shiny (version 1.5.5.872)R package 4 .The application allows users to interactively visualize both alpha and beta diversity of multiple datasets.All diversity plots are generated using the phyloseq (version 1.16.2) R package which conveniently allows for phylogenetic analysis and visualization of microbial communities and provides 44 supported distance methods 5 .The underlying data used for calculations is an operational taxonomic unit (OTU) abundance table.An OTU abundance table is a matrix where the rows represent the various taxa and the columns are different samples.The table values are the counts of how often those taxa are observed.

Operation
System requirements are computers that can successfully install Bioconductor (Release 3.6) and R (≥ 3.4.0).

Data
Our application utilizes two built-in datasets from phyloseq (version 1.16.2):GlobalPatterns and esophagus.GlobalPatterns is a dataset composed of nine different sample types obtained from areas ranging from freshwater to the human gut 6 .The esophagus dataset is a small example dataset of three samples of a human esophageal community, with one sample from each of the three subjects 7 .In addition to these two datasets, we created a third dataset GP3, which is a subset of the GlobalPatterns dataset.The following R code generates this dataset.library(phyloseq) data("GlobalPatterns") GP3 <-subset_samples(GlobalPatterns, SampleType %in% c("Skin", "Tongue", "Feces")) GP3 only includes human feces, skin, and tongue samples and was created for easier visualization of multiple sample groups.

Use cases Alpha diversity
The alpha diversity page (Figure 2) currently gives users the option to visualize up to five different alpha diversity indices: Abundance Coverage-based Estimator (ACE), Shannon, Simpson, Inverse Simpson, and Fisher.The application dynamically produces side by side comparisons of the original data and any indices selected by the user.The side by side comparison allows the user to compare and contrast their selected indices.
It was also important for users to have a top level and individual sample view of their data in order to quickly identify interesting features (Figure 3).The alpha diversity page features a heat map displaying the frequency count of each sample in the dataset.A barplot right beside the heat map shows the intensity pattern for a single sample, which provides a quick way to identify and focus on interesting samples.Lastly, the alpha diversity page also shows the singleton and doubleton count for each sample (Figure 4).Some of the alpha diversity indices are sensitive to singletons and doubletons, which are OTUs that appear in the data only once or twice, respectively.These rare OTUs may suggest undersampling and hence a higher, true abundance in the population 8 .If a OTU is found more than two times then we can be more confident that it is not a false positive.

Beta diversity
Currently, the beta diversity page (Figure 5 and Figure 6) allows users to dynamically visualize and compare two groups of the most common beta diversity indices: 1) non-phylogenetic distance indices -Euclidean, Bray-Curtis, Jaccard and 2) phylogenetic distance indices -unweighted UniFrac and weighted UniFrac.All distance indices were visualized with principal coordinate analysis (PCoA) plots, which have two principal coordinates that explain the greatest distance between samples.The dataset used to visualize beta diversity is the GP3 normalized dataset.The dataset is normalized by rarefying, a resampling method 9 , to the sample with the smallest library size (N = 100,187).Users have the option to rarefy the samples to any library size below 100,187.This option allows users to visualize how rarefying affects beta diversity.

Conclusions and future work
Ecologists have spent decades developing these diversity indices, each having their own assumptions and use cases.Current software tools make it easy to calculate many of these indices without making the strengths and weaknesses of each clear.ShinyDiversity facilitates an exploration and understanding of alpha and beta diversity indices on microbiome data.This understanding is developed by enabling users to compare and contrast the visual differences in the plotted indices on their data.
Our software is the first step in making these indices more understandable.Future work includes allowing users to input an abundance table for both the genus and OTU taxa level, use other normalization techniques (i.e.scaling), and select the degree of sparsity and dispersion.Additionally, we plan to include more diversity indices and options for users to change the distribution of selected samples.This will enable users to observe how each diversity index is influenced by sample distribution, providing a deeper understanding of diversity indices.
The advent of microbial ecologic studies using 16S rRNA gene analysis has opened the field of microbial diversity to a wide range of new investigators.Most investigations include ecologic analyses of microbial diversity using a range of different diversity indices.The authors of the ShinyDiversity have developed a new software package that can aid the investigator in these analyses.Appropriate to its use is the interactive nature of the package.ShinyDiversity should be of value to many investigators in the field.Users, however, should be aware that interpretation of results needs to be in the context of the broad assumptions built in to the diversity indices.In particular, those using this tool probably are comparing ecologic environments and that identifying differences in diversity can be used to imply differences in microbial community makeup based on some differences in the environment.However, minor differences can occur as the result of defined microbial differences that may or may not grossly affect the diversity indices.Thus, interpretation of the result must go beyond the integration of large data sets into indices but need to take in to account the true compositionally differences.I believe that the authors do understand this.Leung et al described a software tool for visual analysis of microbiome alpha and beta diversity.The motivations are to be user-friendly and to allow users to explore how different alpha and beta diversity indices impact their results.The tool is developed using Shiny and Phyloseq R package Their solution is to put the graphical outputs side-by-side to facilitate comparisons.The benefits of such arrangement is very clear.However, it is important to realize that real sample size are often very large, making such arrangement not practical (esp.for alpha diversity).I also appreciate the table showing singletons and doubletons.

Is
Main limitations: I cannot find a way to upload data!Without this, it is not a tool yet, not even mention user friendliness, as it cannot be used at the moment.I would strongly urge authors to implement this very basic feature.

Figure 1 .
Figure 1.ShinyDiversity page.The homepage includes a brief description of the project, how to run the application locally, and the motivation behind the project.

Figure 2 .
Figure 2. The alpha diversity page.Here is an example with the GlobalPatterns dataset and three plots for alpha diversity indices: observed taxa (i.e.number of different taxa), ACE, and Inverse Simpson.The dropdown box allows the user to choose different datasets.

Figure 3 .
Figure 3. Top level and individual sample view of data.The heat map gives a cursory view of the similarities and differences between samples.The bar graph shows an individual sample where the x-axis are the taxa and the y-axis are the taxa counts.The slider allows the user to move through the different samples to change the bar plot.These plots are shown at the bottom of the alpha diversity page.

Figure 4 .
Figure 4. Singleton and doubleton summary table.The table shows the different samples and the number of single and double taxa.

Figure 5 .
Figure 5.The beta diversity page with non-phylogenetic distance indices.The Euclidean, Bray-Curtis, and Jaccard distances are plotted.The slider will rarefy the GP3 dataset to the specified library size.

Figure 6 .
Figure 6.The beta diversity page with phylogenetic distance indices.The indices plotted are the unweighted UniFrac and weighted UniFrac.
the rationale for developing the new software tool clearly explained?YesIs the description of the software tool technically sound?YesAre sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?YesIs sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?PartlyAre the conclusions about the tool and its performance adequately supported by the findings Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?Yes No competing interests were disclosed.Competing Interests:I have read this submission.I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.03 May 2018 Reviewer Report https://doi.org/10.5256/f1000research.15467.r33354© 2018 Xia J.This is an open access peer review report distributed under the terms of the Creative Commons Attribution , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is Licence properly cited.Jianguo Xia Institute of Parasitology, Department of Animal Science, McGill University, Sainte-Anne-de-Bellevue, QC, Canada

2 )
Fig 3 should be normalized for more informative view.Is the rationale for developing the new software tool clearly explained?YesIs the description of the software tool technically sound?YesAre sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?YesThe benefits of publishing with F1000Research: Your article is published within days, with no editorial bias You can publish traditional articles, null/negative results, case reports, data notes and moreThe peer review process is transparent and collaborative Your article is indexed in PubMed after passing peer review Dedicated customer support at every stage For pre-submission enquiries, contact research@f1000.com