Vibrio is the most diverse genus in Vibrionaceae, currently including 151 described species and 5 subspecies (LPSN database, https://www.bacterio.net/, accessed June 2022)50 (link). To carry out in silico analysis, we created a data repository by retrieving all copies of ribosomal operon genes (i.e., 16S rRNA and 23S rRNA) from 40 representative, fully-sequenced Vibrio genomes, one genome per species (Supplementary Table S7). Genome taxonomic assignment was further verified when Vibrio spp. didn’t form highly supported and unambiguously differentiated monophyletic clades. We classified levels of certainty of genome taxonomic assignment in the following way: first, literature support existed and the NCBI taxonomic check criteria were satisfied; second, only the NCBI taxonomic check criteria were satisfied; and third, when none of these criteria were satisfied (Fig. 1, Supplementary Table S8). When multiple genomes were available, we preferentially selected published and annotated genomes of validated Vibrio species in the LPSN database that were assembled using both long- and short-read sequences (e.g., those obtained by both PacBio and Illumina sequencing). To choose representative genomes of V. diabolicus, V. natriegens, and V. scophthalmi from IMG/M database (https://img.jgi.doe.gov/cgi-bin/m/main.cgi), we constructed a similarity matrix of gene copies from the same genome based on NCBI BLASTn results (https://blast.ncbi.nlm.nih.gov/) and analyzed the number of gaps and mismatches to find the genomes with the highest internal variability in 16S and 23S rRNA gene copies. Next, the ribosomal sequences that were downloaded from NCBI GenBank and IMG/M databases (Supplementary Fig. S5) were manually curated by adding missing conserved terminal nucleotides to obtain full-length copies. We assigned to each retrieved sequence a unique ID in which the last three digits referred to the operon carrying the corresponding 16S and 23S rRNA gene copies and a letter to distinguish each operon within the corresponding genome. We employed our custom code (Parts 1–5, see supplementary file “Custom code”) based on the automated webpage scraping functionality in the RSelenium (Version 1.7.7)51 and rEntrez packages (Version 1.2.2)52 to formulate a search query in R (Version 1.1.442) to obtain species and strain names, sequence accession numbers, and the corresponding sequences in FASTA format.
We obtained additional 16S rRNA sequences from the SILVA SSU r.138.1 database20 (link). We used these sequences to ascertain whether outlier gene copies were fortuitous and potentially caused by sequencing errors, or occur more broadly in a larger sample of sequenced genes. We conducted a BLASTn homology search with the variable regions of outlier gene copies V. chagasii M and V. campbellii E. We subsequently used the five SILVA sequences with complete 16S rRNA sequence and the highest BLAST homology in polyphyly analysis of the 16S rRNA-based phylogenetic tree (Fig. 2). Additionally, 2072 non-redundant Vibrio 23S rRNA sequences were also retrieved from SILVA LSU Ref NR r.138.1 database20 (link), corresponding to 45 species and 19 additional strains without species designation. These were then used to locate 23S rRNA conserved regions for PCR primer design (Fig. 7a, Supplementary Fig. S5). We further supplemented our repository with 26 genomes that belong to non-Vibrio species in Vibrionaceae and nine other non-Vibrionaceae bacteria. The non-Vibrio Vibrionaceae genera included Aliivibrio, Photobacterium, Salinivibrio, Enterovibrio and Grimontia, whereas non-Vibrionaceae families included Woeseiaceae, Comamonadaceae, Rhodobacteraceae, Desulfobacteraceae and Enterobacteriaceae (Escherichia coli).
Free full text: Click here