Microarray data came from the FDA MAQC main study [23 (link)] and consisted of data from Affymetrix HG-U133_Plus_2 microarrays. The RNA-Seq data were from the FDA SEQC main study [16 (link)] using the Illumina HiSeq 2000 platform. The microarray data were generated by Affymetrix site 1 in the MAQC study, while the RNA-Seq data were generated by site BGI in the SEQC study. Both sets of data were generated from the same set of four human RNA samples, that is, Universal Human Reference RNA (UHRR, Agilent), Human Brain Reference RNA (HBRR, Life Technologies), and mixtures C and D of UHRR and HBRR samples in a ratio of 3:1 and 1:3, respectively. The HG-U133_Plus_2 arrays were normalized with MAS5 algorithm. The RNA-Seq data were generated with Illumina HiSeq 2000 using the paired-end 100 bp TruSeq v3 RNA-Seq protocol and were analyzed with the P2 pipeline [22 (link)] using UCSC human genome hg19 as reference. Gene counts were normalized into reads per million (RPM) with a global scaling approach [35 (link)]. The microarray and RNA-Seq data can be obtained from GEO database with series accession numbers GSE5350 and GSE47774, respectively.
Free full text: Click here