RSEQtools: Comprehensive Toolkit for RNA-Seq Data Analysis

The RSEQtools suite contains a set of modules to perform a large variety of tasks including the quantification of expression values, manipulation of gene annotation sets, visualization of the mapped reads, generation of signal tracks, the identification of transcriptional active regions and several auxiliary utilities (Supplementary Table S1).
Genome annotation tools: to generate a splice junction library from any annotation set, we extract the genomic sequences of all the exons and synthetically create all splice junctions specified in the annotation set. This splice junction library can be used in combination with the reference sequences. A second tool is particularly useful when estimating expression levels. In order to capture the information of the various transcript isoforms, a ‘gene model’ is required. The module mergeTranscripts collapses the transcript isoforms into a single gene model by either taking the union or intersection of the exonic nucleotides.
Quantification of gene expression: one of the key features of RNA-Seq is the quantification of expression at different levels. Hence, a key module calculates the gene expression values for a given annotation set and a collection of mapped reads in MRF format. The annotation set specifies which ‘elements’ will be quantified. The program mrfQuantifier calculates RPKM (reads per kilobase per million mapped reads) values at the nucleotide level (Mortazavi et al., 2008 ). Briefly, for a given entry in the annotation set (typically an exon or gene model), the number of nucleotides from all the reads that overlap with this annotation entry are added up and then this count is normalized by sequence length of the annotation entry (per killobase) and by the total number of mapped nucleotides (per million). This calculation is not performed at the transcript level, which requires a more sophisticated analysis (Guttman et al., 2010 (link); Trapnell et al., 2010 (link)).
Visualization of mapped reads: the RSEQtools package also contains various tools for visualizing the results in genome browsers, by means of wiggle (WIG) and bedGraph files, which are commonly used to represent a signal track of mapped reads. Also, a GFF file can be generated from MRF files to visualize splice junction reads (example in Fig. 1).
Identification of transcriptionally active regions (TARs): transcribed regions can be identified de novo by performing a maxGap/minRun segmentation (Kampa et al., 2004 (link); Royce et al., 2005 (link)) from the signal files using the wigSegmenter program. Briefly, the signal is first thresholded to identify transcribed elements. Contiguous elements whose distance is less than ‘maxGap’ are joined together and then filtered if the final size is less than ‘minRun’. This type of analysis is particularly useful in discovering novel TARs such as small RNAs, etc.
MRF selection and auxiliary utilities: lastly, RSEQtools includes a set of utilities to easily manipulate MRF files and a collection of format conversion tools allowing for rapid pipeline development.
Implementation and run time: the modules of the RSEQtools suite were implemented in C and the code was optimized in order to efficiently handle large datasets. The importance of code scalability cannot be overemphasized in a time where datasets become increasingly large and easily exceed several gigabytes. For example, the conversion of an ELAND export file (uncompressed file size: ∼4 GB; total number of reads: ∼20 million; number of mapped reads: ∼12 million) to MRF takes ∼2 min and the resulting MRF file is significantly smaller (∼400 MB uncompressed, ∼130 MB compressed with gzip). Converting the same ELAND export file to SAM generates a file of ∼3.1 GB (uncompressed) and the corresponding BAM file has a size of ∼1.2 GB. The subsequent quantification of gene expression using mrfQuantifier requires 45 s to calculate estimates for about 20 000 genes.
In addition, the modularity of RSEQtools also enables the development of additional programs in any programming language and their seamless integration into this framework. Finally, most modules use STDIN and STDOUT to process the data, making them suitable to be integrated into an automated pipeline.

Partial Protocol Preview
This section provides a glimpse into the protocol.
The remaining content is hidden due to licensing restrictions, but the full text is available at the following link: Access Free Full Text.

Habegger L., Sboner A., Gianoulis T.A., Rozowsky J., Agarwal A., Snyder M, & Gerstein M. (2010). RSEQtools: a modular framework to analyze RNA-Seq data using compact, anonymized data summaries. Bioinformatics, 27(2), 281-283.

Publication 2010

Collapses Development programs Exon Gene annotation Gene expression Genes Genome Isoforms Library Nucleotides Rna seq Rnas Transcriptional

Corresponding Organization :

Other organizations : Stanford University, Harvard University, Yale University

Top 5 similar protocols

Protocol cited in 55 other protocols

Variable analysis

independent variables

Annotation set
Mapped reads in MRF format

dependent variables

Gene expression values
Transcriptionally active regions (TARs)

control variables

Reference sequences
Sequence length of the annotation entry
Total number of mapped nucleotides

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!