Continuous baseline characteristics are presented as mean and SD in the case of normally distributed data, whereas skewed data are presented as median and IQR.
We used multiple performance measures to evaluate model performance based on previously published recommendations for reporting on external validation studies.10 (link) These included: calibration plot (calibration-in-the-large) and model intercept, calibration slope, discrimination with concordance statistic and clinical usefulness with decision curve analysis.
As recommended by Steyerberg et al,12 (link) we used the scaled Brier score as a combined measure of model discrimination and calibration instead of the goodness-of-fit (Hosmer-Lemeshow) test.17 18
Sensitivity and specificity rates were calculated for all models. Negative and positive predictive values strongly depend on delirium incidence and were therefore not reported.
Calculations were performed semi-automatically using R-based validation software V.2.18 (available at https://www.evidencio.org).19 (link) Differences in discriminative power between CPMs were assessed by comparing area under the curves using MedCalc V.20.015.