We previously used the close-to-full-length 16S rRNA gene sequences from clone library-based microbiota studies of the human aerodigestive tract, as described in Supplemental Text S1 of [20 ]: Segre-Kong nostril (SKn) [62 (link)–67 (link)], Pei-Blaser [68 (link), 69 (link)], Harris-Pace [70 (link)], van der Gast-Bruce [71 (link)], Flanagan-Bristow [72 (link)], and Perkins-Angenent [73 (link)]. Here, we compiled these into one dataset along with clones from NCBI PopSet UIDs 399192397, 399202217, 399199823, 399197584, 399194446, 399189902, 399186216, 399183739, 399182414, 399179617, 399175646, and 399173254 [74 ]. Aligned eHOMDrefs (eHOMDv15.1) sequences were trimmed from Escherichia coli. position 28-1373 and used to query this compiled dataset via blastn. We retained 27,816 sequences that hit with 100% coverage and ≥ 99.5% identity to 401 HMTs as the full-length human aerodigestivetract clone library dataset (FL_hADT_CL; Additional file 18). Of these, 5254 (18.9%) matched to more than one HMT, whereas 22,562 (81.1%) unambiguously matched to single HMTs. Sequences in this full-length CL dataset were then aligned using MAFTT v6.935b with default parameters [60 (link)]. Segments corresponding to the V1–V3 region were extracted based on positions of V1–V3 in the alignment (V1V3_hADT_CL, Additional file 9) using bedtools getfasta with default parameters (bedtools version 2.26.0, https://bedtools.readthedocs.io/en/latest/).
Free full text: Click here