The relative popularity of normalization methods was assessed by querying the GEO [2 (
link)] and ArrayExpress [3 (
link)] websites on June 28th, 2012 for the following keywords: RMA, MAS5, dChip, GCRMA, PLIER, VSN. ArrayExpress queries were limited to RNA assays. Both sets of queries resulted in the following top-four ordering: RMA > MAS5 > dChip > GCRMA.
Figure
1 was created in Microsoft PowerPoint 2003 (Microsoft Corporation, Redmond, WA, USA). All results pertaining to MAS5 and RMA were derived from data processed with the Affymetrix Power Tools software, v1.12.0 (Affymetrix, Inc., Santa Clara, CA, USA). All results pertaining to dChip were derived from data processed with the December 17th, 2010 Windows binary. All plots and PCA analyses were generated with Evince, v2.5.5 (UmBio AB, Umeå, Sweden). Final figures were composited using Inkscape, v0.48 (
http://inkscape.org/).
The data for Figures
2,
3, Additional file
1: Figure S1, and Additional file
2: Figure S2 were generated using GEO [2 (
link)] dataset GSE18864, after excluding GSM467575 as a bad chip. Both IRON and non- background-subtracted quantile normalization were performed using libaffy. IRON normalization was performed versus chip GSM467598. dChip normalization was performed against GSM467598 for Figure
3, Additional file
1: Figure S1, and Additional file
2: Figure S2, and against the default median brightness chip for all other analyses. The data for Figure
4 was generated from ArrayExpress [3 (
link)] dataset E-MTAB-37, using the subset of adenocarcinoma and small-cell lung tumor-derived cell line chips. IRON normalization was performed versus sample NCI-H1437-Rep3. Three samples (NCI-H1355-Rep1, NCI-H1792-Rep2, NCI-H2107-Rep1), denoted with ‘X’ symbols in the figure, were outlier technical replicates. Three technical replicates were run for most cell lines, and these three samples were unlike the other two replicates for their respective cell lines. These outlier replicates were left in the analysis to highlight the effect of choice of post-processing algorithm on the behavior of the principle component analysis. Removal of the outliers prior to PCA analysis does not noticeably impact the behavior of the non-outliers (data not shown).
Normalized expression data for Figure
5 was loaded into R 2.15.1. The GoldenSpike package (v0.5) was used and modified to evaluate spike-in probesets. Briefly, cyberT was used to identify differentially expressed probesets, and the statistic was used as the score for ROC analysis. ROC and AUC analysis was performed using the pROC package [19 (
link)], using spike-in probesets as positives (cases), background / not spiked-in probesets as negatives (controls), requiring cases to have larger scores than controls.
Table
1 was generated by submitting MAS5, RMA, and IRON results to the Affycomp III web-server [10 (
link),20 ], then entering the results into Microsoft Excel 2007 (Microsoft Corporation, Redmond, WA, USA). dChip results were taken directly from the Affycomp III competition results webpage.
Welsh E.A., Eschrich S.A., Berglund A.E, & Fenstermacher D.A. (2013). Iterative rank-order normalization of gene expression microarray data. BMC Bioinformatics, 14, 153.