Consistent gene expression changes were identified between 44 stage A and 61 stage D CRCs from this study and 42 stage A and 62 stage D CRCs from expO. For the expO dataset, separate comparisons were performed for primary stage D cancers and distant metastases to identify gene expression maintained during metastatic spread. For each cohort, MAS5.0-calculated signal intensities were normalized using the quantile normalization procedure implemented in robust multiarray analysis (RMA) (17 , 18 (link)) and the normalized data were log transformed (base 2). Probe sets which were not expressed or probe sets which showed a low variability across samples were excluded. Expression values were required to be above the median of all expression measurements in at least 25% of samples, and the interquartile range across the samples on the log scale was required to be at least 0.5. Genes mapping to sex chromosomes were excluded as cases were not matched by gender. A total of 6716 gene probes passed these filtering steps in all three sample sets.
Differentially expressed genes were identified using Significance Analysis of Microarrays (SAM) with a Wilcoxon rank-sum test and a false discovery rate (FDR) of 10% (19 (link)). Separate lists were generated for genes significantly up- or down-regulated in stage A CRCs as compared to stage D CRCs for each of the three comparisons. For differentially expressed genes identified repeatedly between cohorts, consistency of up- or down-regulation was assessed using Pearson’s chi-squared test.