Computational Pipeline for miRNA Identification

Reads from each library were trimmed using a procedure described in Shi et al. [4 (link)] to globally optimize read quality over all start and stop positions using quality parameters computed with ELAND. The reads were then aligned to the Ciona genome (JGI version 1.0) using BLAST with an E-value of 10, a word size of 7, and a gap penalty of 10,000. Hits to the genome were then filtered to only include those with an E-value ≤ 0.01.
After the reads have been aligned to the genome, read regions are defined. A read region is defined as a contiguous span of overlapping reads. Only reads with fewer than five hits to the genome are considered for the purposes of defining the read regions. Read regions shorter than 160 nucleotides and that do not overlap a repeat region or a tRNA are then used as candidate loci to be tested as a possible miR.
Our approach for the identification of microRNAs using high-throughput sequencing reads is to compute a set of quantities for each candidate locus, and by using thresholds for each quantity we define a space of values that contain the microRNA loci.
A key challenge to the program is to designate all read products on a potential hairpin as corresponding to miR/miR*, moR/moR* and/or loops because our program relies on this information to test whether the products are consistent with miRNA biogenesis. Once candidate loci are folded, all reads that overlap the locus are grouped to define 'products', and these products are then identified as miR, moR, or loop products according to Figure S1 in Additional file 1.
Many quantities we consider pertain to the structure of the hairpin and positions of reads. The distance between a miR and moR on the same arm of the hairpin, the offset of the 5' positions of products that overlap at least 2 nucleotides on the same arm of the hairpin, and the offset of overlapping products on opposite arms of the hairpin are used to evaluate the spacing and distribution of products. The 5' heterogeneity, defined as the fraction of reads within the miR product with the same 5' position as the predominant splice variant of this product, is evaluated for the most abundant miR product. Furthermore, we define the AAPD as the average distance between sense and antisense products that overlap, and apply this measure across all sense products that overlap antisense products. Additionally, the minimum number of base pairs per nucleotide for either a miR or miR* product is used to evaluate the locus.
Two additional quantities take into account information from the sequencing data outside the candidate locus under consideration. The average number of hits to the genome for reads within the most abundant miR product is evaluated as an additional level of repeat filtering. Finally, after producing a list of predicted positive loci using the above measures, we define the non-miR-neighbor-count as the number of read regions that do not overlap a predicted positive locus within a ± 1-kb window surrounding the locus in question. All read regions, including those overlapping repeat regions, tRNAs, and those longer than 160 nucleotides, are considered for this calculation.
Each of these quantities has user-defined thresholds that can be adjusted to meet the desired level of stringency of the predictions. The default values used in this analysis are summarized in Table S1 in Additional file 1. The software for miRTRAP and other resources are available on our website [49 ].

Free full text: Click here

Hendrix D., Levine M, & Shi W. (2010). miRTRAP, a computational method for the systematic identification of miRNAs from high throughput sequencing data. Genome Biology, 11(4), R39.

Publication 2010

A a 1 Arms Biogenesis Ciona Genome Heterogeneity Library Mirna Nucleotides Trnas

Corresponding Organization :

Other organizations : University of California, Berkeley

Top 5 similar protocols

Protocol cited in 8 other protocols

Variable analysis

independent variables

Thresholds for each quantity used to define a space of values that contain the microRNA loci

dependent variables

Candidate loci to be tested as a possible miR
Quantities computed for each candidate locus to evaluate if it is a microRNA locus

control variables

Read regions shorter than 160 nucleotides and that do not overlap a repeat region or a tRNA
Reads with fewer than five hits to the genome

controls

Positive controls: Not explicitly mentioned.
Negative controls: Not explicitly mentioned.

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!