Enhancing RNA-Seq Read Alignment Accuracy

Exon-spanning reads sometimes have very small anchors (defined here as 1–7 bp) in one of the exons. Correctly aligning these reads is extremely difficult because a 1- to 7-bp anchor will align to numerous locations, even in a local FM index. Arguably the most effective approach to align such short-anchored reads is to use splice site information to remove the introns computationally before alignment. We can identify and collect splice site locations when aligning reads with long anchors and then rerun HISAT for the short-anchored reads (Supplementary Fig. 9). This two-step approach is very similar to the two-step algorithm in TopHat2.
More specifically, in the two-step HISATx2 method, we use the first run of HISAT (HISATx1) to generate a list of splice sites supported by reads with long anchors. In the second run we then use the splice sites to align reads with small anchors. For example, consider the unmapped read spanning exons e2 and e3 (the upper portion of Supplementary Fig. 9). The right part of the read will be mapped to exon e3 using the global search and extension operations, leaving a short, 3-bp segment unmapped. We then check the splice sites found in the first run of HISAT to find any splice sites near this partial alignment. In this example, we find a splice site supported by a read spanning exons e2 and e3 with long anchors in each exon. On the basis of this information, we directly compare the 3 bp of the read and the corresponding genomic sequence in exon e2. If it matches, we combine the 3-bp alignment with the alignment of the rest of the read. This ‘junction extension’ procedure that makes use of previously identified splice sites is represented by brown arrows in the figure.
As we show in our experiments on simulated reads, this two-step strategy produces accurate alignment of reads with anchors as small as 1 bp (see Results). Although HISATx2 has considerably better sensitivity, it takes twice as long to run as HISATx1. As an alternative, we developed a hybrid method, HISAT, which has sensitivity almost equal to that of HISATx2 but with the speed of HISATx1. HISAT collects splice sites as it processes the reads, similarly to the first run of HISATx2. However, as it is processing, it uses the splice sites collected thus far to align short-anchored reads. In the vast majority of cases, it can align even the shortest anchors because it has seen the associated splice sites earlier. This result follows from the observation that most splice sites can be discovered within the first few million reads, and most RNA-seq data sets contain tens of millions of reads. As our results show, HISAT provides alignment sensitivity that very nearly matches the two-step HISATx2 algorithm, with a run time nearly as fast as the one-step HISAT method.
The hybrid approach is also effective in aligning reads spanning more than two exons, which are more likely to have small anchors. The alignment sensitivity for such reads increases from 53% using HISATx1 to 95% using HISAT (Supplementary Fig. 2).

Partial Protocol Preview
This section provides a glimpse into the protocol.
The remaining content is hidden due to licensing restrictions, but the full text is available at the following link: Access Free Full Text.

Kim D., Langmead B, & Salzberg S.L. (2015). HISAT: a fast spliced aligner with low memory requirements. Nature methods, 12(4), 357-360.

Publication 2015

Exons Genomic Hybrid Introns Rna seq Seen Sensitivity Tens

Corresponding Organization : Johns Hopkins University

Top 5 similar protocols

Protocol cited in 2 594 other protocols

Variable analysis

independent variables

Splice site information used to remove introns computationally before alignment

dependent variables

Alignment sensitivity for reads with small anchors (1-7 bp)
Alignment sensitivity for reads spanning more than two exons

control variables

Not explicitly mentioned

controls

Positive control: Reads with long anchors used to identify splice sites
Negative control: Not mentioned

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!