Improved BLAST Filters for Rfam Annotation

In order to make it feasible to search more than 120 gigabases of sequence with hundreds of covariance models in a reasonable time, we use sequence-based filters to prune the search space prior to applying the more accurate and more computationally expensive CMs. One of the primary limitations of the Rfam annotation pipe-line has been the use of BLAST-based sequence filters, which are likely to compromise search sensitivity. In order to address this issue at least partially, NCBI-BLAST has been replaced with a WU-BLAST search, which has been tuned for high sensitivity and low sequence similarity. A benchmark of several homology search tools has shown WU-BLAST to be the more accurate of the two methods on nucleotide data (5 (link)). Additionally, in order to make the BLAST filters more similar to profile HMMs, a sequence mask has been applied to each sequence in the alignment. Any nucleotide in an alignment column that has either a low frequency or is an insert relative to the majority of the rest of the sequences is ‘soft masked’ and not used for the BLAST word matches. These masked nucleotides do, however, still contribute to alignments that were seeded in the flanking regions. This approach has resulted in many fewer spurious hits with no detectable cost to sensitivity (data not shown), thus allowing E-value thresholds to be further relaxed. These observations together mean that the BLAST filters have been improved in terms of specificity and sensitivity.

Partial Protocol Preview
This section provides a glimpse into the protocol.
The remaining content is hidden due to licensing restrictions, but the full text is available at the following link: Access Free Full Text.

Gardner P.P., Daub J., Tate J.G., Nawrocki E.P., Kolbe D.L., Lindgreen S., Wilkinson A.C., Finn R.D., Griffiths-Jones S., Eddy S.R, & Bateman A. (2008). Rfam: updates to the RNA families database. Nucleic Acids Research, 37(Database issue), D136-D140.

Publication 2008

Hmms Nucleotides Sensitivity

Corresponding Organization :

Other organizations : Wellcome Sanger Institute

Top 5 similar protocols

Protocol cited in 129 other protocols

Variable analysis

independent variables

Sequence-based filters to prune the search space
Replacement of BLAST-based sequence filters with WU-BLAST search
Application of sequence mask to alignment columns with low frequency or insert relative to majority

dependent variables

Search sensitivity
Number of spurious hits
Sensitivity of BLAST filters

control variables

Not explicitly mentioned

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!