Taxonomic Classification of Environmental V6 Tags

In Sogin et al. [9] , we proposed a tag mapping methodology, GAST (Global Alignment for Sequence Taxonomy) to assign a taxonomic classification to environmental V6 tags (http://vamps.mbl.edu/resources/software.php). The first step in GAST is to BLAST each tag against the RefV3 or RefV6 database (no minimum score, expectation value or other cutoffs were imposed). Because the top BLAST hit may not have the highest overall similarity to the tag sequence, particularly because edge-effects in such a short region can be pronounced, we aligned the tag sequence to the reference hypervariable region tags corresponding to the top 100 BLAST hits. We used MUSCLE [38] (link) (with parameters –diags and -maxiters 2 to reduce processing time) because it is well suited to high-throughput experiments. We calculated the global distance from the sample tag to each of the aligned reference sequence tags as the number of insertions, deletions and mismatches divided by the length of the tag, using quickdist [9] . We considered the reference sequence or sequences with the minimum global distance to be the top GAST match(es). The top BLAST hit was frequently the best global match; however, for 5% to 25% of tags the best global match was to a reference sequence with a lower BLAST score.
For each tag, we identified all of the reference long sequences in RefSSU that contained the exact hypervariable sequence of the top GAST match(es). We compared the taxonomic classification of all corresponding SSU rRNA sequences (with RDP bootstrap values> = 80) and generated a consensus taxonomy. If two-thirds or more of the full-length sequences shared the same assigned genus, the tag was assigned to that genus. If there was no such agreement, we proceeded up one level to family. If there was a two-thirds or better consensus at the family level, we assigned this taxonomy to the tag, and if not, we continued to proceed up the tree. Occasionally, a tag could not be assigned taxonomic classification at the domain level. This was because the RDP Classifier could not assign a domain with an adequate bootstrap value, rather than a tag mapping to full-length sequences from different domains. These may represent novel organisms whose taxonomy has not yet been determined. Sample tags that did not have a single BLAST match in the RefSSU database also were not given a taxonomic assignment. We chose to use a 66% (two-thirds) majority although other values or a distributional vs. strict percentage approach can be implemented. We reviewed nearly 17 million tags in our sequencing database (primarily of the V6 region) from a wide range of studies using the 66% majority as the threshold for assignment. A distribution curve of voting majority did not show any obvious break points (graph not shown), although 95% of the tags had a voting majority of 75% or better, and 90% had a voting majority > = 83%.

Free full text: Click here

Huse S.M., Dethlefsen L., Huber J.A., Welch D.M., Relman D.A, & Sogin M.L. (2008). Exploring Microbial Diversity and Taxonomy Using SSU rRNA Hypervariable Tag Sequencing. PLoS Genetics, 4(11), e1000255.

Publication 2008

Deletions Insertions Muscle Rrna Sequence alignment Tree Vamps

Corresponding Organization :

Other organizations : Marine Biological Laboratory, VA Palo Alto Health Care System

Top 5 similar protocols

Protocol cited in 12 other protocols

Variable analysis

independent variables

None explicitly mentioned

dependent variables

Taxonomic classification of environmental V6 tags

control variables

None explicitly mentioned

controls

Positive control: None mentioned
Negative control: None mentioned

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!