The 454-pyrosequencing errors, PCR single base errors and chimeric sequences were removed from the 454-pyrosequencing amplicon library employing AmpliconNoise (v1.26; Quince et al., 2011 (link)) followed by Perseus (Quince et al., 2011 (link)). Pyrosequencing reads not matching multiplex identifier and/or primer sequences were removed just as were reads shorter than 200 bp. Reads were further truncated at 450 bp, eliminating additional noise (Mardis, 2008 (link)), and finally trimmed off multiplex identifier and primer sequences.
Denoised 454-pyrosequences were clustered into operational taxonomic units (OTUs) at a level of 97% sequence identity (AmpliconNoise, v1.29; Quince et al., 2011 (link)) and classified based on the RDP naive Bayesian rRNA Classifier (RDP Classifier, v2.6; Wang et al., 2007 (link)). Representative sequences were aligned based on the SILVA alignment (release 102; Quast et al., 2013 (link)) using mothur (v1.33.2; Schloss et al., 2009 (link)). Finally, pyrosequences that could neither be aligned nor assigned, or were assigned as Archaea or Eukaryota (for example, chloroplasts) were further removed. The 454-pyrosequencing reads of both experimental (Ba, GW, HL, and VA) and environmental (EnvBa, EnvGW, EnvHL, and EnvVA) samples have been deposited at the NCBI Sequence Read Archive under accession number SRP021096.