We performed full-length cDNA capture and isoform sequencing as previously described [3 (link)]. In brief, we designed a set of complementary oligonucleotide capture probes (Additional file 2: Table S8) to enrich for cDNA originating from NPIP paralogous copies and coupled this with a method to enrich for full-length cDNA molecules based on reverse transcriptase (RT) template switching [59 (link)]. Next, we generated PacBio Iso-Seq libraries and performed post-capture size selection to enrich for larger cDNA molecules according to the manufacturer’s guidelines (SMRTbell template prep kit 1.0, PacBio). SMRT sequencing was performed using the P6-C4 chemistry on the PacBio RS II instrument with 6-h movies [3 (link)]. A modified version of the Iso-Seq bioinformatics incorporating ToFU (Transcript isOforms: Full-length and Unassembled) was used for processing the long-read RNA-seq data (available at https://github.com/EichlerLab/isoseq_pipeline). Circular consensus sequence reads designated as putatively full length (if the expected terminal sequences and a poly(A) tract were observed) were then mapped to large-insert clone-assembled custom contigs using GMAP (v 2015-07-23) [60 (link)]. ORFs were identified using ANGEL (https://github.com/PacificBiosciences/ANGEL) and TRANSLATE as part of the ExPASy: SIB bioinformatics resource portal [61 (link)].
Free full text: Click here