Generation of RNA Structure Alignment Reference

For the construction of reference alignments we used "seed" alignments from the Rfam database version 7.0 [24 (link),23 (link)]. In most cases these alignments are hand-curated and thus of higher quality than Rfam's "full" alignments generated automatically by the INFERNAL RNA profile package [40 (link)]. Alignments with less than 50 sequences were discarded to increase the possibility for creation of subalignments (see below). The SCI (see below) for scoring of structural alignment quality is based on a combination of thermodynamic and covariation measures. Thermodynamic structure prediction becomes increasingly inaccurate with increasing sequence length – e. g. due to kinetic effects – but is widely regarded as sufficiently accurate for sequences not exceeding 300 nt in length [41 (link),42 (link)]. Thus we excluded alignments with an average sequence length above 300 nt to ensure proper thermodynamic scoring.
To each remaining seed alignment we applied a "naive" combinatorial approach that extracts sub-alignments with k ∈ {2, 3, 5, 7, 10, 15} sequences for a given average pairwise sequence identity range (APSI; a measure for sequence homology computed with ALISTAT from the squid package [43 ]). Therefore we computed identities for all sequence pairs from an alignment and selected those pairs possessing the desired APSI ± 10 %. From the remaining list of sequences we randomly picked k unique sequences. Additionally we dropped all alignments with an SCI below 0.6 to assure the structural quality of the alignments and to make sure that the SCI can be applied later to score the test alignments. This way we generated overall 18,990 reference alignments with an average SCI of 0.93; the data-set1 used in [22 (link)] consists of only 388 alignments with an average SCI of 0.89. For further details see Tables 1 and 6.

Free full text: Click here

Wilm A., Mainz I, & Steger G. (2006). An enhanced RNA alignment benchmark for sequence alignment programs. Algorithms for Molecular Biology, 1, 19.

Publication 2006

Apsi Kinetic Sequence alignments Squid

Corresponding Organization :

Other organizations : Heinrich Heine University Düsseldorf

Top 5 similar protocols

Protocol cited in 11 other protocols

Variable analysis

independent variables

Sequence homology (APSI) range used to extract sub-alignments
Number of sequences (k) in sub-alignments: 2, 3, 5, 7, 10, 15

dependent variables

Number of reference alignments generated
Average SCI (Structural Conservation Index) of reference alignments

control variables

Seed alignments from the Rfam database version 7.0
Alignments with less than 50 sequences
Alignments with an average sequence length above 300 nucleotides
Alignments with an SCI below 0.6

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!