The 0-9 levels of disease severity at an interval of 1 were used as response variable for the generation of regression models. At the same time, the ten classes of disease severity levels were used as dependent variables for the development of classification models. The calibration of all the machine learning models has been done using 2/3rd of the total 600 data set whereas the remaining 1/3rd of the data was used for validation purposes. The root mean square error (RMSE), coefficient of determination (R2), and residual prediction deviation (RPD) were used to evaluate the regression models’ accuracy.
Where Pi is the predicted value, Oi is the observed value and n is the number of samples.
Chang et al. (2001) (link) classified prediction accuracies into accurate (RPD > 2), moderate (1.4< RPD< 2), and poor (RPD< 1.4).
The assessment of classification accuracy of different techniques used for the classification was made through confusion or error matrix. The overall accuracy or total accuracy (Ta) was obtained by dividing the total number of correct predictions to the total number of tested predictions as suggested by Lillesand et al. (2000) , p. 724. Another coefficient that was estimated from the confusion matrix function was the Kappa coefficient (K) which denoted the degree to which the percentage correct estimations of a confusion matrix due to “genuine” agreement versus “chance” agreement was made. It ranged from 0 (worst) to 1 (best). The formulae of these parameters are (Hasmadi et al., 2009 ):
Where, xij = number of counts in the ijth cell of the confusion matrix, N = total number of counts in the confusion matrix, xi+ = marginal total of row i, and x+i = marginal total of column i.