In vitro transcription (IVT) RNA was derived from an amplified plasmid library of 1062 human cDNAs (IVT), taken from the Mammalian Gene Collection (Lahens et al., 2014 (link)). Samples were sequenced by two ribosomal depletion protocols polyA selection and Ribo-Zero Gold kit (Epicentre catalog no. RZHM11106). Afterwards the RNA was converted into Illumina RNA-Seq libraries with the TruSeq RNA sample prep kit (Ilumina catalog no. FC-122-1001) and sequenced with an Illumina HiSeq 2000 (paired 100 bp reads). The IVT data have advantages of being a dataset where we know ground truth and it can be sequenced with standard methods, thereby capturing all normal sources of technical error. Importantly, because IVT is efficient, the expression of each base pair is theoretically the same. We used 1062 human full-length cDNAs and performed IVT-Seq. As with simulated data, the full-length transcript forms are known. In this dataset 50 genes had 2 or more splice forms. These ribosomal depletion protocols polyA selection and Ribo-Zero are the two most common protocols, which introduce within-transcript variance (Fig. 3) that cannot easily be simulated.
These data are available at GEO (accessions GSM1219408 for the polyA and GSM1219398, GSM1219399 for Ribo-Zero).
Free full text: Click here