Simulation of Plant sRNA-seq Libraries

sRNA-seq libraries from Arabidopsis thaliana, Oryza sativa, and Zea mays were obtained from the NCBI Sequence Read Archive (SRA) (Supplemental Material, Table S1). Libraries were selected that had > 5 million raw reads, were available in an unprocessed format, and were derived from an Illumina instrument. 3′-Adapter sequences were discovered using find_3p_adapter.pl (available at http://sites.psu.edu/axtell/), and removed using ShortStack’s internal adapter trimming protocol. Simulated sRNA-seq libraries were produced to closely emulate real sRNA-seq data. This process was accomplished through a custom python script and wrapper run under default settings: sRNA-simulator.py (File S1). This script uses a real sRNA-seq library as the basis for each simulated library. Real sRNA-seq libraries were aligned using bowtie (Langmead et al. 2009 (link)) reporting all alignments. Regions of the genome that had no alignments were removed from consideration as simulated loci, while genomic regions prone to alignments with certain length classes of sRNAs became candidate regions for simulated heterochromatic siRNA (hc-siRNA; 23–24 nt) and trans-acting siRNA (21 nt) loci. miRNA candidate regions were picked based on prior annotated loci, available through miRBase (Kozomara and Griffiths-Jones 2014 (link)). Simulated loci were chosen from these candidate regions at random. Five million reads were then generated from these simulated loci, generating roughly 3.25 M hc-siRNA, 1.5 M miRNA, and 250 k tasiRNA reads. Loci were made to approximate real loci in size and pattern: hc-siRNA as primarily 24 nt RNAs from 200- to 1000-nt loci, from both genomic strands; miRNA as 21-nt RNAs from 125-nt loci with a miRNA and miRNA* pattern; tasiRNA as 21-nt RNAs from 140-nt loci producing a number of phased reads, from both genomic strands. All three loci types produced a realistic distribution of differently sized or shifted reads to simulate misprocessing. Sequencing errors are simulated at a rate of one mis-sequenced base per 10,000 reads. Unlike real data, simulated reads are traceable to their loci of origin, and thus are suitable to discern correct placements from incorrect ones. PolyA+ mRNA-seq data were obtained from SRA (Table S1). Reference genome versions were TAIR10 (A. thaliana), IRGSP7 (O. sativa), and B73v3 (Z. mays).

Free full text: Click here

Johnson N.R., Yeoh J.M., Coruh C, & Axtell M.J. (2016). Improved Placement of Multi-mapping Small RNAs. G3: Genes|Genomes|Genetics, 6(7), 2103-2111.

Publication 2016

A thaliana Genome Library Mirna Origin loci Oryza sativa Polya mrna Python Rnas Sirna Tasirna Zea mays

Corresponding Organization : Pennsylvania State University

Other organizations : Knox College

Top 5 similar protocols

Protocol cited in 55 other protocols

Variable analysis

independent variables

SRNA-seq library source (Arabidopsis thaliana, Oryza sativa, Zea mays)

dependent variables

SRNA-seq read characteristics (length, distribution, etc.)
Accuracy of sRNA-seq read placement

control variables

Minimum 5 million raw reads per sRNA-seq library
SRNA-seq libraries obtained from Illumina instruments
3'-adapter sequences removed using ShortStack's internal adapter trimming protocol
Simulated sRNA-seq libraries closely emulating real sRNA-seq data
Regions of the genome with no alignments removed from consideration as simulated loci
Genomic regions prone to alignments with certain length classes of sRNAs used as candidate regions for simulated hc-siRNA, tasiRNA, and miRNA loci
Simulated loci chosen from candidate regions at random
Simulated loci made to approximate real loci in size and pattern
Simulated sequencing errors at a rate of one mis-sequenced base per 10,000 reads
Reference genome versions used (TAIR10 for Arabidopsis, IRGSP7 for Oryza, B73v3 for Zea mays)

positive controls

None explicitly mentioned

negative controls

None explicitly mentioned

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!