Affymetrix data was downloaded from a total of ten datasets from published studies listed in Table 2 from the Gene Expression Omnibus [3 (link)] or Array Express [1 (link)] repositories. Raw .cel files were not available for the Wang et al. dataset, so all other datasets were normalized as in this study using the MAS 5.0 algorithm with a target intensity of 600 as implemented in the Simpleaffy package [50 (link)], using R [31 (link)] within BioConductor [48 (link)]. NetAffx [9 (link)] was used to identify Affymetrix probesets representing the 'intrinsic gene set' previously used to classify human breast tumours [8 (link)]. Centered average linkage clustering was performed using the Cluster [35 (link)] and TreeView programs as described previously [7 (link)]. Supervised principal components analysis using the Superpc for R package was used as previously described [29 ,30 (link)], in order to compare the predictive power of combining different published datasets. The follow up endpoints for the Loi et al., Pawitan et al. and Sotoriou et al. datasets were recurrence-free survival, for Desmedt et al. and Ivshina et al. datasets it was disease-free survival and for the Minn et al. and Wang et al. datasets it was distant metastasis-free survival.
Free full text: Click here