Extracting Microbial Diversity Markers

We downloaded 503,971 aligned small subunit rRNA sequences from the SILVA database, version 92 [35] (link). Using the SILVA quality assessments, we eliminated low-quality sequences (sequence quality < = 50, alignment quality < = 50, pintail score < = 40). SSU rRNA genes whose sequences were identical were flagged as redundant. The resultant dataset included 417,433 unique sequences, of which 99% were between 350 and 2000 nt in length. Although the sequences vary in length and coverage of the full-length SSU rRNA gene, we refer to these sequences as “long” or “full-length” sequences for the purposes of this paper, and the dataset of these sequences as RefSSU. From all aligned RefSSU sequences, we extracted the V3 and V6 hypervariable regions, defined as homologous positions between positions 338 and 533 of the E. coli SSU rRNA sequence (U00096) for V3, and 967 to 1046 for V6. Sequences shorter than 50 nt or containing ambiguous bases were culled. We removed all gap characters to create a set of 293,265 V3 reference tags (RefV3 database) and 195,344 V6 reference tags (RefV6 database). The higher representation of sequences spanning the V3 region in molecular databases is likely a consequence of the experimental design used to generate PCR amplicon libraries favoring the beginning of the molecule. These databases include 123,206 unique V3 tag sequences and 59,830 unique V6 tag sequences. Most V3 sequences (99+%) range in length from 80 nt to 180 nt (max 447), while the most V6 sequences (99+%) range from 50 nt to 80 nt with a maximum of 349 nt (http://vamps.mbl.edu/resources/databases.php).
We classified all bacterial and archaeal long sequences directly with the Ribosomal Database Project Classifier (RDP) [28] (link). We used only RDP classifications with a bootstrap value of > = 80%. If the bootstrap value was <80%, the taxonomic assignment was moved to a higher classification level until an 80% or better bootstrap value was achieved. For example, if the genus assignment had a bootstrap value of 70%, but the family had a value of 85%, that sequence would be assigned only as far as family and not to genus. RDP Classifier does not classify sequences below the genus level but the GAST process is not inherently limited to genus; its resolution is constrained by the taxonomy of the reference sequence database. The accuracy of GAST will improve in response to refinements of the reference database including increased number of taxonomically-resolved sequences, removal of cryptic chimeric and short sequences, improvement of taxonomic identities for long sequences, and elimination of low quality entries.

Free full text: Click here

Huse S.M., Dethlefsen L., Huber J.A., Welch D.M., Relman D.A, & Sogin M.L. (2008). Exploring Microbial Diversity and Taxonomy Using SSU rRNA Hypervariable Tag Sequencing. PLoS Genetics, 4(11), e1000255.

Publication 2008

Archaeal Bacterial Characters Chimeric Cryptic E coli Ribosomal Rrna Rrna gene Subunit Vamps

Corresponding Organization :

Other organizations : Marine Biological Laboratory, Stanford Medicine, VA Palo Alto Health Care System

Top 5 similar protocols

Protocol cited in 88 other protocols

Variable analysis

independent variables

Protocol used to download, filter, and curate the aligned small subunit rRNA sequences from the SILVA database

dependent variables

Number of unique sequences in the resultant dataset (RefSSU)
Length distribution of the sequences in the RefSSU dataset
Number of unique V3 and V6 reference tags in the RefV3 and RefV6 databases
Length distribution of the V3 and V6 reference tags

control variables

SILVA quality assessments used to eliminate low-quality sequences
Removal of redundant sequences with identical SSU rRNA gene sequences
Removal of sequences shorter than 50 nt or containing ambiguous bases
Removal of gap characters to create the V3 and V6 reference tag databases
RDP Classifier classifications with a bootstrap value of >=80%

positive controls

None specified

negative controls

None specified

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!