All aspects of identification of the gene-expression signatures and development of the survival model were based solely on the data from the CHOP training group and are outlined in detail in the Supplementary Appendix . No previous survival analysis or subgroup analysis was performed with the validation groups (i.e., the MMMLNP CHOP and the R-CHOP cohorts). A Cox model was used to identify genes associated with survival and to build multivariate survival models. The models and their associated scaling coefficients were fixed, based on the CHOP training group, and then evaluated in the validation groups. All reported P values are two-sided, except those in the validation groups, which are one-sided P values in the direction of the observed effect on the training group. P values reported for survival associations were based on single-hypothesis testing, except those for testing of multivariate models involving the germinal-center B-cell, stromal-1, proliferation, and MHC class II signatures in the R-CHOP cohort, which were not adjusted for multiple testing.
To discover new signatures associated with survival, we selected individual genes with expression patterns that contributed significantly (P<0.01) to the survival association in the CHOP training group, in a model containing that gene and the germinal-center B-cell and stromal-1 signatures. We organized these genes by hierarchical clustering according to their expression levels in the CHOP training group, and we identified five clusters of coordinately expressed genes (r>0.6). For each of these five candidate signatures, we averaged the expression levels of the component genes and tested whether the average for the signature added to the predictive significance of the bivariate survival model for the CHOP training group. One signature was clearly superior to the others with respect to its predictive contribution to the survival model and was therefore chosen for further analysis. This signature also added to the predictive significance of the bivariate model for the R-CHOP cohort (P = 0.001) and for the MMMLNP CHOP cohort (P = 0.011) (Fig. 8B and 8C in the Supplementary Appendix ). In these survival models, this new signature was associated with reduced survival, whereas the stromal-1 signature was associated with increased survival, even though these two signatures were correlated with one another (r>0.8). Therefore, to refine this new signature, we identified genes that were more closely correlated with it than with the stromal-1 signature (P<0.02) in the CHOP training group, and we organized these genes into three signatures by hierarchical clustering, as described above. The signature that most improved the survival model (stromal-2) was chosen for subsequent analyses.
To discover new signatures associated with survival, we selected individual genes with expression patterns that contributed significantly (P<0.01) to the survival association in the CHOP training group, in a model containing that gene and the germinal-center B-cell and stromal-1 signatures. We organized these genes by hierarchical clustering according to their expression levels in the CHOP training group, and we identified five clusters of coordinately expressed genes (r>0.6). For each of these five candidate signatures, we averaged the expression levels of the component genes and tested whether the average for the signature added to the predictive significance of the bivariate survival model for the CHOP training group. One signature was clearly superior to the others with respect to its predictive contribution to the survival model and was therefore chosen for further analysis. This signature also added to the predictive significance of the bivariate model for the R-CHOP cohort (P = 0.001) and for the MMMLNP CHOP cohort (P = 0.011) (