easy16S
easy16S.Rmd
Usage
Load data
Users can load a phyloseq object directly when launching the app
using the following syntax:
easy16S::run_app(physeq = phyloseq.extended::food)
.
Alternatively, there are three ways to load data when the app is launched:
- Use one of the demo datasets provided with the application.
- Upload flat files to build a phyloseq object:
- a BIOM file (Standard format or FROGS format) [mandatory].
- a metadata table with variables (in columns) and samples (in rows). Ensure that sample names (1st column) are spelled exactly as in the BIOM file. The delimiter and format of columns can be specified.
- a phylogenetic tree in Newick format.
- a FASTA file with representative sequences.
- Upload a phyloseq object as :
- a RDS file.
- a RData file containing a phyloseq object named
data
.
Additionally, a RDS object can be provided directly from an URL:
https://shiny.migale.inrae.fr/app/easy16S/?rds=https://mywebsite.com/path/to/my/data.rds
Preprocess data
Before doing any analysis, it is customary to preprocess the data to refine and clean the raw data. The following operations are available and can be applied iteratively to achieve rich selections:
- Samples filter
- Select samples based on their name.
- Filter samples based on the sample variables available in the metadata table.
- Prune samples whose sum (depth) does not satisfy a given threshold.
- Taxa transformation
- Aggregate taxa at a specified taxonomic rank (e.g. Genus, Family, etc)
- Spread taxonomy to remove unknown and multi-affiliations by spreading the last known rank to further ranks (e.g. “Bacillus;multi-affiliation” would become “Bacillus; unknown Bacillus species”).
- Abundance-transformation
- Rarefaction (resample the abundance table to ensure that all samples have the same depth, set as the minimum one among samples).
- Transform the abundances in the abundance table, using one the
following:
prop
(change abundances to proportions / relative abundances),sqrt
(square root),sqrt_prop
(square root of relative abundances),clr
(centered log-ratio, after adding a pseudo-count of 1).
Once the desired operations are selected, users can seamlessly switch between the raw and preprocessed data to assess the impact of the applied transformations.
A few words about rarefaction
For many analyses (notably all those based on presence / absence data and more generally diversity analyses), it is recommended to normalize the samples by rarefying to account for variations in sequencing effort and ensure that the detection probability is comparable across sampling. Rarefying involves subsampling each sample to the same depth, ensuring a more equitable comparison of microbial diversity across samples. It is however not advised for differential abundance analyses as it decreases statistical power.
Explore and Analyse Data
Tables
Users can visualize and explore key tables constituting the phyloseq object under study:
- OTU/ASV Table: Abundance of each OTU/ASV in all samples.
- Taxonomy Table: Taxonomic affiliation of each OTU/ASV at different taxonomic ranks (e.g. Phylum to Species).
- Agglomerate OTU/ASV Table: Same as OTU/ASV Table but after merging all ASV/OTU sharing the same taxonomic affiliation up to user-specified rank.
- Sample Data Table: Metadata associated with each sample, as provided by the user during the import process (metadata table).
For a deeper understanding of how phyloseq objects function, refer to the phyloseq documentation on data import.
Metadata
This section provides access to the sample data table for use with the esquisse addin. It is useful to explore and assess associations between sample variables (but not metabarcoding data).
This addin allows you to interactively explore your data by visualizing it with the ggplot2 package. It allows you to draw bar plots, curves, scatter plots, histograms, boxplot and sf objects, then export the graph or retrieve the code to reproduce the graph.
Barplot
Used to create composition graphs (stacked barplots of relative
abundances), based on the
phyloseq.extended::plot_composition()
function. This
feature provides users with the option to:
- Specify the taxonomic rank used for aggregation and coloring.
- Filter and display results for a specific taxon.
- Group samples based on metadata.
Composition barplots show the relative abundance of all or part of the sample diversity.
See also bar plots on phyloseq documentation.
Rarefaction
Used to create rarefaction curves, based on the
phyloseq.extended::ggrare()
function. These settings
provide users with the option to:
- Color, annotate and group samples based on metadata.
- Display a minimum sample threshold.
Rarefaction curves are used to evaluate the relationship between richness and sampling effort (number of reads, or sequencing depth) in each sample. This curve shows the expected number of OTUs/ASVs observed in each sample based on the sequencing depth. Rarefaction curves generally grow rapidly at first, as the most common OTUs/ASVs are found, but the curves plateau as the diversity saturates as only the rarest ones remain to be observed.
Heatmap
To create an ecologically-organized heatmap, use the
phyloseq::plot_heatmap()
function. These settings provide
users with the option to:
- Select only the n most abundant taxa for display.
- Agglomerate taxa at a user-specified taxonomic rank.
- Group, annotate and order samples based on metadata.
- Display the affiliation of each OTU/ASV at a user-specified taxonomic rank.
Heatmaps can be used to investigate the structuring of sample communities, ordered using a “NMDS” ordination (samples ordered by increasing angle between the x-axis and their projection). It can also be used to observe core and condition-specific microbiota.
-Diversity
-diversity measures richness within a sample. Detailed information on this concept and the different metrics available in easy16S can be found in the alpha diversity section of the phyloseq documentation.
Table
Compute the main alpha diversity estimators using the
phyloseq::estimate_richness()
function. If a sample data
table is available, it is included in the table for further analyses
(e.g. ANOVA, regression, etc)
Plot
Visualize the previously calculated metrics with the
phyloseq::plot_richness()
function. Users can customize the
arrangement of samples along the x-axis (X
), color and
shape of samples based on metadata. Additionally, diversity data can be
displayed as boxplots instead of points.
ANOVA
This section performs ANOVA on the diversity table enriched with the metadata to assess the impact of a covariate of interest on the alpha-diversity. For categorical variables, a post-hoc pairwise comparison table is also provided to identify levels of the variable with significantly different diversities.
-diversity
-diversity measures the dissimilarity between samples, capturing richness variations. The selection of a distance metric is crucial, and detailed information is available in the phyloseq documentation or in the gusta.me website. These functions can be compositional or qualitative, phylogenetic or not, and the choice depends on the features of interest.
Different distances capture different features of the samples. There is no “one size fits all.” However, choosing an appropriate measure is essential as it will strongly affect how your data is treated during analysis and what kind of interpretations are meaningful.
Table
Compute distances between each pair of samples using the
phyloseq::distance()
function and the chosen distance
metric.
Samples heatmap
Plot matrix of pairwise distances using the
phyloseq.extended::plot_dist_as_heatmap()
function. Users
can customize sample order based on metadata to highlight patterns
(e.g. lower within-group than between-group distances).
Samples clustering
Use the distance matrix and a user-specified linkage method
(e.g. Ward, complete, average, etc) to compute and plot a hierarchical
clustering tree of the samples with the
phyloseq.extended::plot_clust()
function. Users can color
leaves of the tree (i.e. samples) according to a categorical metadata to
identify the variables along which the samples separates.
MultiDimensional Scaling
Use the distance matrix to ordinate the samples (i.e. project them
while preserving at best their pairwise distances) in a low-dimensional
space with the phyloseq::ordination()
function, and
visualize this ordination with the
phyloseq::plot_ordination()
function. In addition to
selecting the ordination method (MDS/PCoA,
NMDS,
etc), users can customize color, shape and labels of samples based on
metadata. Additionally, ellipses can be added to group samples in the
same category of a variability (e.g. healthy versus diseased
individuals). By defaults, the ordination represents the principal plane
(axes 1 and 2) of the projection but further axes can be used for
plotting.
These graphs serve as powerful tools for exploring and interpreting factors structuring the microbial community structures.
For more examples and details, refer to ordination plots on phyloseq documentation or GUSTA ME.
Multivariate ANOVA
Use Permutational Multivariate ANOVA to assess the impact of
one or several covariates on community structure with
vegan::adonis2(by = 'terms', perm = 9999)
. The test
compares the structure given by sample data with 9999 randomly generated
structures. Permutational Multivariate ANOVA (also called non
parametric multivariate ANOVA or npmanova) accommodates complex
designs, but it tests only location effects (e.g. are the typical
communities similar in groups A and B?) and assumes equal dispersions
(i.e. same biological variability in both groups).
Users should specify up to 3 covariates and their potential interactions to be included in the model.
PCA
Perform PCA using stats::prcomp()
on the abundance
matrix. While MultiDimensional Scaling (MDS) is often
recommended for microbiome analysis, Principal Component
Analysis (PCA) after appropriate data transformation can be an
alternative. The transformed abundances can be centered and/or scaled
during the analysis. Users can customize color, shape and labels of
samples based on metadata, add ellipses to group samples from the same
category, and select the axes of the projection like in Multidimensional
Scaling. Loadings (OTU/ASV) of the principal axes can also be
incorporated to understand the individual contributions of taxa each
axis.
Differential abundance
This section is dedicated to the identification of over- or
under-abundant OTU/ASVs based on an experimental variable (categorical
or numeric). The main tool for this analysis is the DESeq2
package (with the sfType = "poscounts"
used by default to
ignore null values when computing scale factors), utilized through the
phyloseq::phyloseq_to_deseq2()
function (refer to the
accompanying vignette).
However, note that while DESeq2
was developed for
transcriptomics data using negative binomial models, amplicon
metagenomics data are typically very sparse, and how well these models
handle such sparsity, even with sfType = "poscounts"
is not
clear.
To proceed with differential abundance analysis, users need to
- select an experimental design model
- select a contrast of two covariates (for categorical variables).
An interactive volcano plot representing the differentially abundant OTUs is then showed (clicking on any OTU/ASV displays a barplot representing its relative abundance across the samples) alongside an interactive table with detailed information on the differential abundance statistics (p-value, effect size, etc) and the taxonomy of each OTU.
This analysis allows the user to identify and visualize the taxa that exhibit significant differences in abundance between two conditions, providing valuable insights into the impact of experimental variables on individual microbes.
Export data, plot, and results
Users can export their (potentially preprocessed) data with the “download” icons. The export options include:
- Exporting data in
.biom
format. Note that if a phylogenetic tree is present, it will not be included in the exported biom file. This format facilitates compatibility with other tools. - Exporting the constructed phyloseq object in
.rds
format. This enables further analysis within R or for use in Easy16S.
For results tables, users can easily export them
using the CSV
, Copy
(to clipboard) or
Excel
buttons.
To export a plot, click on the camera button located at the top right of each plot. Global export parameters, such as height, width, scale, and format, can be configured through the menu at the top right of the header. This functionality provides users with resize plots as needed before export.
These export features enhance the usability and accessibility of both data and results, allowing users to seamlessly integrate Easy16S with their preferred analysis tools and workflows.