Gene expression modules were calculated from a dataset compiled from 10 independent studies, in total representing 1,608 breast cancer samples hybridized to Affymetrix HG-U133A arrays (U133A set; Additional file 1). The data were MAS5 normalized, mean centered across assays and samples were classified into molecular subtypes based on gene expression centroids from Hu et al. [6 (link)] as described [17 (link)]. Cross-hybridizing probes, defined as probes referring to more than one unique Entrez Gene ID or marked as cross-hybridizing by Affymetrix (x_at probes), were removed, and features were subsequently merged by calculating the mean expression of probes relating to the same Entrez Gene ID resulting in 12,208 gene-representative transcripts. Distant metastasis-free survival (DMFS) was not available for GSE3494 and GSE1456 and for these datasets relapse-free survival was used as a substitute for DMFS in survival analysis (Additional file 1). Clinical co-variates for the U133A set are described in Additional file 1. For validation of network modules a second gene expression breast cancer dataset representing 676 breast cancer samples was compiled from 12 independent studies performed on the Affymetrix HG-U133Plus2 platform (MAS5 normalized; Additional file 1). In addition, the NKI breast cancer dataset of 295 samples, representing an independent array technology, was used (Additional file 1). Additional datasets representing colon, ovarian, lung and bladder cancer, melanoma, diffuse large B-cell lymphoma and acute myeloid lymphoma are described in Additional file 2. For U133Plus2, data probes overlapping with the U133A platform were selected and expression data were merged based on Entrez Gene ID. Probe mapping between array platforms was done based on Entrez Gene IDs.
Free full text: Click here