Partial 16S rRNA genes from some bacterial species are too similar to be readily distinguished from one another. Therefore, in the STIRRUPS method, reference sequences are clustered into species-level taxa that can be readily differentiated to improve classification accuracy. We describe the method as applied to the Vaginal 16S rDNA Reference Database.
Vaginal 16S rDNA Reference Database sequences that aligned at ≥ 97% identity in the V1-V3 region using the USEARCH v4.0 global alignment algorithm [23 (link)] were assigned to the same species-level taxon. The Vaginal Human Microbiome Project protocol uses a forward sequencing orientation of V1-V3 16S rDNA reads. Reference sequences may differ in some regions (e.g., the V3 region), but may be very similar in others (e.g., the V1-V2 regions), thus complicating species-level distinctions of short reads. Therefore, we generated subsequences of the V1-V3 region of each reference sequence by trimming from the 3' end in one nucleotide increments to a minimum length of 200 bases, the minimum length of sequences that we process, and we subsequently aligned each subsequence to the reference library of the V1-V3 region of the selected sequences. Additionally, reference sequences with subsequences that aligned with 97% identity or greater were assigned to the same species-level taxon. More formally, in a graph G where vertices are reference sequences and edges connect reference sequences assigned to the same species-level taxon based on sequence identity, each connected component was named as a species-level cluster. For STIRRUPS Classifier analysis (see below), we specified the V1-V3-trimmed version of the 16S rDNA database with species-level taxon assignments and a 97% identity threshold.
Free full text: Click here