We defined modules as sets of co-expressed genes that were considered as a functional unit. Using multiple approaches, we built a collection of 323 gene expression modules, including 115 gene lists obtained from 53 publications: (1) 221 modules were built using the median expression of all genes within the module that homogeneously expresses these genes (i.e. all genes in the module were high or low together within a given sample). The sources of the selected homogenous gene lists were the following: 50 were identified by bicluster analyses[54 (link)] using the microarray dataset of 359 human breast tumors and 8 normal breast samples (i.e. the aforementioned 2/3 training set); 52 modules were identified from an unsupervised hierarchical clustering analysis of the same human breast tumor database; 50 were identified by bicluster analyses using microarray data of 122 mouse mammary tumors[13 (link)]; 56 were identified from unsupervised hierarchical clustering analysis of the same mouse mammary database; 13 were identified from previously published gene lists [13 (link),14 (link),17 (link),18 (link),21 (link),26 (link),35 (link),55 (link)]. (2) 77 modules were represented as the first Principal Component of previously published gene lists [3 (link),8 (link)-10 (link),12 (link),13 (link),15 (link),19 (link),20 (link),22 (link),23 (link),25 (link),27 (link)-32 (link),34 (link),36 (link),37 (link),40 (link),56 (link)-66 (link)] that showed heterogeneous expression patterns (i.e. the gene list contained genes with high and low expression within a given sample). (3) 22 modules were correlations to previously published training dataset centroids [4 (link),9 (link),11 (link),16 (link),24 (link),33 (link),38 (link),39 (link),50 (link),67 (link),68 (link)]. (4) 3 modules were built from previously published gene expression prognostic models [5 (link),46 (link),47 (link)]. We acknowledge that our implementation of some of the previously published signatures may be suboptimal, however, we attempted within reason, to apply each signature as published. All modules, with gene lists and references, can be found in Additional File 1.
Free full text: Click here