q2-feature-classifier multinomial naive Bayes classifier. Varied k-mer length in {4, 6, 7, 8, 9, 10, 11, 12, 14, 16, 18, 32} and confidence threshold in {0, 0.5, 0.7, 0.9, 0.92, 0.94, 0.96, 0.98, 1}.
BLAST+ [9 (link)] local sequence alignment followed by consensus taxonomy classification implemented in q2-feature-classifier. Varied max accepts from 1 to 100; percent identity from 0.80 to 0.99; and minimum consensus from 0.51 to 0.99. See description below.
VSEARCH [10 (link)] global sequence alignment followed by consensus taxonomy classification implemented in q2-feature-classifier. Varied max accepts from 1 to 100; percent identity from 0.80 to 0.99; and minimum consensus from 0.51 to 0.99. See description below.
Ribosomal Database Project (RDP) naïve Bayesian classifier [11 (link)] (QIIME1 wrapper), with confidence thresholds between 0.0 and 1.0 in steps of 0.1.
Legacy BLAST [15 (link)] (QIIME1 wrapper) varying e-value thresholds from 1e-9 to 1000.
SortMeRNA [13 (link)] (QIIME1 wrapper) varying minimum consensus fraction from 0.51 to 0.99; similarity from 0.8 to 0.9; max accepts from 1 to 10; and coverage from 0.8 to 0.9.
UCLUST [12 (link)] (QIIME1 wrapper) varying minimum consensus fraction from 0.51 to 0.99; similarity from 0.8 to 0.9; and max accepts from 1 to 10.
Classification of bacterial/archaeal 16S rRNA gene sequences was made using the Greengenes (13_8 release) [5 (link)] reference sequence database preclustered at 99% ID, with amplicons for the domain of interest extracted using primers 27F/1492R [27 (link)], 515F/806R [28 (link)], or 27F/534R [29 (link)] with q2-feature-classifier’s extract_reads method. Classification of fungal ITS sequences was made using the UNITE database (version 7.1 QIIME developer release) [31 (link)] preclustered at 99% ID. For the cross validation and novel taxon classification tests, we prefiltered to remove sequences with incomplete or ambiguous taxonomies (containing the substrings ‘unknown,’ ‘unidentified,’ or ‘_sp’ or terminating at any level with ‘__’).
The notebooks detailing taxonomy classification sweeps of mock communities are available at