We tested AbundantOTU on two mock datasets, for which the microbial composition and reference sequences are known, and three metagenomic datasets derived from real communities (see Table I for the summary of the datasets). The first mock dataset (designated as Priest09) is the ‘divergent sequence’ dataset from [4 (link)] that contains amplified and pyrosequenced sequences from 23 divergent 16S rRNA fragments spanning V5 (the pyrosequences dataset and reference sequences were downloaded from http://people.civil.gla.ac.uk/~quince/Data/PyroNoise.html). The second mock dataset (designated as Mock07) contains short sequences generated by pyrosequencing PCR amplicon libraries of 43 known 16S rRNA gene fragments spanning V6 using the Roche GS20 system, generated in a study of sequencing errors [17 (link)]. This dataset was downloaded from http://genomebiology.com/2007/8/7/R143. The three real metagneomic datasets contain reads from oral [18 (link)] and skin [2 ] samples, respectively, downloaded from the NCBI Short Read Archive (SRA) with accession numbers SRR002260 (oral/plaque), SRR002259 (oral/saliva), and SRR00606 (skin).