We assessed the performance, stability, and reproducibility of the PSO-enhanced thromboSeq platform using multiple training, evaluation, and independent validation cohorts. All classification experiments were performed with the PSO-enhanced thromboSeq algorithm, using parameters optimized by particle swarm intelligence. We assigned for the matched cohort 133 samples for training-evaluation, of which 93 were used for RUV-correction, gene panel selection, and SVM training (training cohort), and 40 were used for gene panel optimization (evaluation cohort). The full cohort contained 208 samples for training-evaluation, of which 120 were used for RUV-correction, gene panel selection, and SMV training (training cohort), and 88 were used for gene panel optimization (evaluation cohort). All random selection procedures were performed using the sample-function as implemented in R. For assignment of samples per cohort to the training and evaluation cohorts, only the number of samples per clinical group was balanced, whereas other potentially contributing variables were not stratified at this stage (assuming random distribution among the groups). Following, an SVM model was trained using the training samples, and the samples assigned to the independent validation cohort were predicted. The late-stage NSCLC samples and early-stage locally advanced NSCLC samples were validated separately resulting in two ROC curves. The 53 locally advanced NSCLC samples were age-matched with 53 non-cancer individuals selected from the non-cancer samples of the independent validation cohort. Performance of the training cohort was assessed by a leave-one-out cross validation approach (LOOCV, see also (Best et al., 2015 (link))). During a LOOCV procedure, all samples minus one (‘left-out sample’) are used for training of the algorithm. Each sample is predicted once, resulting in the same number of predictions as samples in the training cohort. The list of stable genes among the initial training cohort, determined RUV-factors for removal, and final gene panel determined by swarm-optimization of the training-evaluation cohort were used as input for the LOOCV procedure. As a control for internal reproducibility, we randomly sampled training and evaluation cohorts, while maintaining the validation cohorts and the swarm-guided gene panel of the original classifier, and perform 1000 (matched and full cohort NSCLC/non-cancer) training and classification procedures. As a control for random classification, class labels of the samples used by the SVM-algorithm for training of the support vectors were randomly permutated, while maintaining the swarm-guided gene list of the original classifier. This process was performed 1000 times for the matched and full NSCLC/non-cancer cohort classifiers. P values were calculated accordingly, as described previously (Best et al., 2015 (link)). Results were presented in receiver operating characteristics (ROC) curves, and summarized using area under the curve (AUC)-values, as determined by the ROCR-package in R. AUC 95% confidence intervals were calculated according to the method of Delonge using the ci.auc-function of the pROC-package in R.
Free full text: Click here