The NCBI Gene Expression Omnibus database (http://www.ncbi.nlm.nih.gov ) was searched for human primary cell expression datasets. Data sets were selected based on the following three criteria: (1) chip platform (Affymetrix human genome U133 plus 2.0 expression arrays); (2) cell type studied; (3) availability of raw data (.cel) files. Accordingly, a diverse set of human leukocyte gene expression data was collected comprising a total of 1,103 chips from 105 separate studies. All raw data (.cel) files were downloaded and the quality of the raw data from each dataset was reanalysed using the arrayQualityMetrics package in Bioconductor (http://www.bioconductor.org ) and scored on the basis of 5 metrics, namely maplot, spatial, boxplot, heatmap and rle. Any array failing on more than one QC metric was removed from the dataset. Normalisation of all data was performed independently using the robust multi-array average (RMA) expression measure
[51 (link)]. Probesets were then annotated using latest annotation available in Bioconductor (26 June 2009) and samples ordered according to cell-type grouping to ease interpretation of the data (iPS cells, ES cells, BM, BM progenitors, macrophages, lymphocytes etc.).
[51 (link)]. Probesets were then annotated using latest annotation available in Bioconductor (26 June 2009) and samples ordered according to cell-type grouping to ease interpretation of the data (iPS cells, ES cells, BM, BM progenitors, macrophages, lymphocytes etc.).