After random distribution of the original ELSA dataset into the training and test sets the validation approach was applied to the training set. We used the repeated stratified 5-fold cross-validation method to train and validate the prediction models and to achieve a reliable measure. In repeated stratified 5-fold cross-validation, the original sample is randomly partitioned into 5 equal-sized subsets in a way that maintains the same class distribution in each subset. This could preserve the imbalanced class distribution in each fold and enforce the class distribution in each split of the data to match the distribution in the complete training dataset. Then, the validation process is repeated 3 times, with each of the 5 subsamples used exactly once as the validation data [21 (link)]. In this study, the goal was to ensure that each fold had the same proportion of major cognitive decline observations.
Three indexes of sensitivity (SEN), specificity (SPE), and area under the curve (AUC) are used to calculate the results of models for binary classification. The “sensitivity” indicates the ability of the model to determine major cognitive decliners, correctly. This measure is defined as TP/(TP+FN), where TP and FN stand for the proportion of true positive (i.e., minor cognitive decliners who are classified correctly) and false negative cases (i.e., major cognitive decliners who are classified as minor cognitive decliners), respectively. In contrast, “specificity” indicates the ability of the model to determine minor cognitive decliners, correctly. This measure is defined as TN/(TN+FP), where TN and FP stand for the proportion of true negative (i.e., minor cognitive decliners who are correctly classified) and false positive case (i.e., minor cognitive decliners who are classified as major cognitive decliners), respectively. The derived measure of AUC determines the inherent ability of the test to discriminate between individuals with minor and major cognitive decline. Another interpretation of AUC is “the average value of sensitivity for all the possible values of specificity” [58 (link)]. A higher score on these indexes indicates a better model performance. We did not include the index of accuracy as a performance metric. This measure does not provide meaningful information regarding the performance of classification models due to the unequal number of participants in two groups of minor and major cognitive decliners [59 (link)].
Free full text: Click here