The 16S rRNA gene amplicon data from human vaginal samples in [19 (link)] (2.13M paired-end Illumina Miseq reads in 157 samples) and from mouse feces in [17 (link)] (3.65M paired-end Illumina Miseq reads in 362 samples) were analyzed with the DADA2 pipeline outlined above. First the demultiplexed fastq files were filtered and trimmed in the same manner as the test datasets. Each sample was dereplicated, a portion of the dataset was used to estimate the error parameters, and dada() was applied to the full pooled dataset using those inferred error parameters. isBimeraDenovo() was used to remove chimeras.
For the human vaginal samples, output sequences that appeared in at least two samples and at least 0.3% of the total reads were taxonomically identified by BLAST. Further analysis focused on the six L. crispatus sequence variants identified by this procedure.