Three test datasets were used in this work. (i) SIM2 is a selected subset of the simulated bimeras and control sequences used to train and evaluate ChimeraSlayer. (ii) MOCK is the Uneven datasets used to evaluate Perseus (Quince et al., 2011 (link)). They are derived from pyrosequencing reads of ‘mock’ communities, i.e. experimentally mixed DNAs of known composition. These reads were processed by AmpliconNoise (Quince et al., 2011 (link)), which attempts to remove sequencing error and generates a set of predicted sequences for the amplicons. Sequences in this set were classified as biological or chimeric by comparing them to reference sequences for the species in each community, and chimera detection algorithms were assessed by their success in reproducing this classification. (iii) SIMM is a new set of simulated m-meras created for this work. SIM2 and SIMM were used to compare the performance of the reference database mode of UCHIME with ChimeraSlayer, MOCK was used to compare the de novo mode of UCHIME with Perseus. The parameters of UCHIME were trained on SIM2; the score threshold h was set to a value giving an average error rate over the whole SIM2 dataset lower than the error rate of ChimeraSlayer on the same data. UCHIME was trained by an exhaustive search over manually selected pairs (β,n). The optimal pair (β+,n+) was identified by maximizing the area under a receiver operating characteristic curve (Mason and Graham, 2002 ). Given β+ and n+, an optimal score threshold h+ is determined by (i) specifying a maximum desired error rate or minimum desired sensitivity and (ii) maximizing sensitivity or minimizing error rate, respectively. After training, the sensitivity of UCHIME averaged over all SIM2 sets was 70.6% with an error rate of 0.49%, compared with 54.6% sensitivity and 0.62% errors for ChimeraSlayer.