Differentially expressed probe sets in two responder groups: pCR or minimal residual cancer burden (RCB-I) defining excellent response, versus moderate or extensive residual cancer burden (RCB-II/III) defining partial response20 (link) were identified separately in ER+/HER2− and ER−/HER2− training cases using a robust unequal variance t-statistic under a bootstrap scheme. The 209 and 244 probe sets that were significant in at least 30% of the bootstrap replicates in the two cohorts were selected as candidates. Subsequently, a multivariate penalized optimization algorithm, gradient directed regularization, was then used with maximum penalization to select a minimal signature that maximized the area under the ROC curve (AUC) under complete cross-validation.28 The final response predictors used 39 and 55 probe sets for the ER+/HER2− and ER−/HER2− cohorts respectively. Risk scores calculated as the weighted sum of the standardized log2-transformed expression signal of the signature probe sets were dichotomized at zero for both cohorts to predict “responders” (positive scores) or “non-responders” (negative scores).
A similar procedure was followed to develop the predictor for resistance by comparing patients with extensive residual disease (RCB-III) after neoadjuvant chemotherapy treatment versus remaining patients. The final predictor of extensive residual disease used 73 and 54 probe sets for ER+/HER2−and ER−/HER2− subsets respectively (Supplemental Appendix).