We used logistic regression for our model to predict PD as the best RECIST response vs. nonPD rather than responder vs. progressor to better reflect the real-world setting where all outcomes (PD, SD, MR, PR, CR) are possible.. We evaluated genomic, transcriptomic, and clinical features. Categorical features were converted to binary features for each categorical value. To be conservative, no gene-level mutations or expression values were individually considered. Global genomic tumor characteristics such as TMB, purity, ploidy, heterogeneity, aneuploidy were considered. Features were generated from the transcriptome including ssGSEA values for genesets representing Cancer Hallmark pathways, and MHC-II and -I antigen presentation genes, as well as gene expression signatures following the methodology as described in their respective publications as described above and in Supplemental Table 6. Clinical characteristics including LDH and ECOG at start of anti-PD1 ICB, number of metastatic organs, gender, Mstage, number of different metastatic sites, metastatic sites, and melanoma subtype were evaluated (Supplemental Table 1). Features were chosen in a forward-selection based process, where features that were statistically significantly predictive (p<0.05) when added to the base model were ranked based on the ability of the combined model to discriminate outcomes (using ROC AUC as the metric), and the best feature chosen to be added to the base model. Potential features were evaluated also based on a manual review considering biological interpretability and clinical applicability. This process was then iterated with the new base model, and stopped when no features under consideration were statistically significantly predictive.
The set of tumors with both WES and RNAseq is smaller than the set of tumors with only WES; when the features chosen in model development for ipilimumab-naive tumors resulted in WES features only being chosen, model development was repeated in the superset of tumors requiring only clinical and WES data, and this model in the larger set is reported in the main text.
To estimate the “out-of-bag” AUC, we used k-fold cross-validation (splitting the data set into k subsets, training on k-1 subsets, and calculating AUC on the held out subset), and calculated the mean cross-validation AUC. Given the partially manual review of features, feature selection was not included in cross-validation. For the ipilimumab-treated subset (n=34), we chose k=5 folds, and for the larger ipilimumab-naive subset (n=85), we chose k=10 folds to maintain a cross-validation holdout set of >5 tumors. Cross-validation scores were calculated using the cross_val_score function from the Python sklearn package.
To further evaluate the statistical support for our models, we calculated the Akaike Information Criteria and Bayesian Information Criteria of each subsequent model after adding an additional feature in forward selection in the ipilimumab-experienced and ipilimumab-naive subgroups (Extended Data Figure 8cd), and also evaluated the addition of mutational burden as an additional feature to the selected models.