To determine whether any of the modules we identified were related to clinical breast cancer biomarkers, we calculated the overlap between module genes and the PAM50 intrinsic subtype gene set [1] (link), [32] (link), the NKI70 MammaPrint® gene set [33] (link), and the 21 genes used in OncotypeDX® [34] (link). Since different gene sets can be used to derive an identical classification schema, we also fit univariate logistic regression models relating intrinsic subtype assignments to module scores in GSE1456, GSE21653, and METABRIC, and then performed ROC analysis on these models to calculate an AUC estimate of how well each individual module is able to predict each subtype. For comparison of modules to other previously published signatures, pretreatment biopsies in GSE21653, GSE1456, and GSE2034 were scored for expression of the STAT1 immune cluster [19] (link), the IR-7 immune signature [20] (link), the IFN interferon cluster [21] , the proliferation signature MS-14 [37] , and for subsets of T cell and B cell surface markers [22] (link) by calculating the mean expression levels of signature genes weighted by +1 or −1 according to direction of association with RFS as previously described [31] (link); ECM1-4 cluster scores were calculated as the Pearson correlations between expression of the genes in the published ECM signature and the four ECM centroids, respectively [36] (link). Pearson correlation coefficients (r) between the module and signature scores were calculated to assess relatedness.
Free full text: Click here