We attempted the classification of dementia status in 146 samples after removing missing values from the 177 that were used in the feature selection process. The 146 samples had a slight class imbalance, with 89 demented versus 57 non-demented patients. Before training our models, we randomly selected 57 patients from the demented group using the sample() function from the random module in Python3. Then, the rows were shuffled using sklearn.utils version 0.22.2.post1. As a result, 114 samples were utilized after balancing the class label. The 32 samples were held out for final assessment. The hippocampal tau stage feature, which had 50% missing values, was dropped during the training process. Age and brain weight were removed before training the models, ending up with 22 features and 114 samples for classification. The dataset was split into a training set of 70% (80 samples) and a testing set of 30% (34 samples).
Seven classification algorithms were trained to classify individuals’ dementia status from the 22 top-ranked features. Scikit-learn version 0.22.2.post1 was used to implement and train the ML classifiers, and then measure their classification performance. Logistic regression was implemented using the sklearn.linear_model package where penalty was set to 12, the regularization parameter C was set to 1, the maximum number of iterations taken for the solvers to converge was set to 2000, and other parameters were set to default values. A decision tree classifier was implemented using the sklearn.tree package. K-nearest neighbors classifier was implemented using the sklearn.neighbors with the number of neighbors set to 5, the function “uniform weights” used for prediction, the “Minkowski” distance metric utilized for the tree, and with other parameters were set to default values. The linear discriminant analysis classifier was implemented using the sklearn.discriminant_analysis package with singular value decomposition for solver hyperparameter and other parameters were set to default values. The Gaussian naïve Bayes classifier was implemented using sklearn.naive_bayes. The support vector machine with a radial basis function kernel (SVM-RBF) was implemented using sklearn.svm with the regularization parameter C set to 1, the kernel coefficient gamma = “scale” and other parameters were set to default values. The support vector machine with a linear kernel (SVM-LINEAR) was implemented using the sklearn.svm package with regularization parameter C set to 1, with a “linear” kernel, gamma coefficient “scale” and other parameters were set to default. The sklearn.metrics package was used to report classification performance. Training and performance evaluation were performed 500 times, from which the average performance measure was calculated as overall performance. Accuracy, balanced accuracy, F1-score, precision, sensitivity and specificity utilizing regression plots were measures used for performance. ML models and feature selection libraries were built using Python 3.7.3.
Free full text: Click here