Libraries were generated with the Illumina Truseq protocol (Illumina, San Diego, CA) and TruSeq sample prep kits (catalogue ID: FC-122-1001). Libraries were sequenced on a HiSeq2000 instrument (Illumina) for 2 × 101 cycles, generating 101 base-paired-end reads. All samples were run on the same flowcell, multiplexed with 4 samples per lane. The Illumina basecalling and read generation algorithm versions were RTA 1.12.4.2 and CASAVA-1.8.2. Sequencing, read pre-processing and alignment were performed by the Center for Cancer Research (CCR) Genomics Core (Frederick, MD). Raw reads were trimmed before alignment. The trimming software used was ea-utils FasqMcf software (http://code.google.com/p/ea-utils/wiki FastqMcf Expression Analysis, Durham, NC). The trimming parameters used were: -l 15 -q 0 -u -P 33 (minimum retained sequence length=15, quality threshold causing base removal=0, enable Illumina PF filtering, Phred 33 scale). Trimmed reads were mapped to the hg19 reference genome with TopHat v.2.0.8 (ref. 65 (link), accepting only unique alignments (parameters -g 1 -r 10 --mate-std-dev 100), with the Bowtie2 alignment engine66 (link). Gene model annotation for hg19 from Ensembl was provided to the aligner with the -G parameter.
Free full text: Click here