Several microbiome studies that included both 16S sequencing and WGS metagenome sequencing for the same samples were used to test the accuracy of PICRUSt. These included 530 paired human microbiome samples22 (link), 39 paired mammal gut samples24 (link), 14 paired soil samples34 (link), 10 paired hypersaline microbial mats23 (link), 24 (link) and two even/staggered synthetic mock communities from the HMP33 (link). We additionally used PICRUSt to make predictions on three 16S-only microbiome studies, specifically 6,431 HMP samples (http://hmpdacc.org/HMQCP ), 993 vaginal time course samples43 and 335 coral mucus samples(http://www.microbio.me/qiime/ ; Study ID 1854).
For 16S data, PICRUSt-compatible OTU tables were constructed using the closed-reference OTU picking protocol in QIIME 1.5.0-dev (pick_reference_otus_through_otu_table.py) against Greengenes+IMG using ‘uclust’48 (link). For paired metagenomes, WGS reads were annotated to KOs using v0.98 of HUMAnN30 (link). Expected KO counts for the HMP mock communities were obtained by multiplying the mixing proportions of community members by the annotated KO counts of their respective reference genomes in IMG. PICRUSt was used to predict the metagenomes using the 16S-based OTU tables, and predictions were compared to the annotated WGS metagenome across all KOs using Spearman rank correlation. In addition, KOs were mapped to KEGG Module abundances, following the conjugative normal form as implemented in HUMAnN script “pathab.py” for the HMP and vaginal datasets to compare modules and pathways. Bray-Curtis distances (for Beta-diversity comparison between OTU or PICRUSt KO abundances across samples) were calculated using as implemented in the QIIME “beta_diversity.py” script. The PCA plot and identification of KEGG modules with significant mean proportion differences for both the HMP and vaginal datasets was created using STAMP v2.036 (link).
The Nearest Sequenced Taxon Index (NSTI) was developed as an evaluation measure describing the novelty of organisms within an OTU table with respect to previously sequenced genomes. For every OTU in a sample, the sum of branch lengths between that OTU in the Greengenes tree to the nearest tip in the tree with a sequenced genome is weighted by the relative abundance of that OTU. All OTU scores are then summed to give a single NSTI value per microbial community sample. PICRUSt calculates NSTI values for every sample in the given OTU table, and we compared NSTI scores and PICRUSt accuracies for all of the metagenome validation datasets.
In the metagenome rarefaction analysis (Fig. 4 ), a given number of counts were randomly selected from either the collection of microbial OTUs for each sample (i.e. the 16S rRNA OTU table) or the collection of sequenced genes in that sample using the multiple_rarefactions.py script in QIIME 1.5.0-dev29 (link). To estimate the number of raw reads at which PICRUSt outperforms metagenomic sequencing the annotated shotgun reads were transformed to total sequenced reads by dividing by the mean annotation rates from the original manuscript (17.3%), while 16S rRNA reads were transformed using the success rate for closed-reference OTU picking at a 97% 16S rRNA identity threshold (68.9%). Both the subsampled metagenome and the PICRUSt predictions from the subsampled OTU table were compared for accuracy using Spearman rank correlation versus the non-subsampled metagenome.
For 16S data, PICRUSt-compatible OTU tables were constructed using the closed-reference OTU picking protocol in QIIME 1.5.0-dev (pick_reference_otus_through_otu_table.py) against Greengenes+IMG using ‘uclust’48 (link). For paired metagenomes, WGS reads were annotated to KOs using v0.98 of HUMAnN30 (link). Expected KO counts for the HMP mock communities were obtained by multiplying the mixing proportions of community members by the annotated KO counts of their respective reference genomes in IMG. PICRUSt was used to predict the metagenomes using the 16S-based OTU tables, and predictions were compared to the annotated WGS metagenome across all KOs using Spearman rank correlation. In addition, KOs were mapped to KEGG Module abundances, following the conjugative normal form as implemented in HUMAnN script “pathab.py” for the HMP and vaginal datasets to compare modules and pathways. Bray-Curtis distances (for Beta-diversity comparison between OTU or PICRUSt KO abundances across samples) were calculated using as implemented in the QIIME “beta_diversity.py” script. The PCA plot and identification of KEGG modules with significant mean proportion differences for both the HMP and vaginal datasets was created using STAMP v2.036 (link).
The Nearest Sequenced Taxon Index (NSTI) was developed as an evaluation measure describing the novelty of organisms within an OTU table with respect to previously sequenced genomes. For every OTU in a sample, the sum of branch lengths between that OTU in the Greengenes tree to the nearest tip in the tree with a sequenced genome is weighted by the relative abundance of that OTU. All OTU scores are then summed to give a single NSTI value per microbial community sample. PICRUSt calculates NSTI values for every sample in the given OTU table, and we compared NSTI scores and PICRUSt accuracies for all of the metagenome validation datasets.
In the metagenome rarefaction analysis (