To calculate error rates, we retrieved the Titanium mock community dataset of Quince et al. [6 (link)], which was used to validate AmpliconNoise, as well as other denoising algorithms [23 (link)]. The 62,873 reads were derived from PCR amplification of the V4-V5 region of the 16S gene, using 91 plasmid clones as the source DNA (mock community). The set of original reads (“Stage 0”) was determined by filtering only for mid tag and primer sequences and allowing one and two mismatches to them, respectively. The initial error rate was calculated by finding the best match of each read to the 90 reference sequences (see Additional files
To evaluate scalability, we analyzed the large datasets from Krych et al. [25 (link)]. In this study of the human gut microbiome, the V3-V4 region of the 16S gene was amplified by PCR and sequenced on the Roche-454 GS FLX Titanium platform. The total number of reads for all three groups (baseline, synbiotic, and placebo) was 2.2 million.