Correlation between Classification Metrics

After having introduced the statistical background of Matthews correlation coefficient and the other two measures to which we compare it (accuracy and F₁ score), we explore here the correlation between these three rates. To explore these statistical correlations, we take advantage of the Pearson correlation coefficient (PCC) [100 (link)], which is a rate particularly suitable to evaluate the linear relationship between two continuous variables [101 (link)]. We avoid the usage of rank correlation coefficients (such as Spearman’s ρ and Kendall’s τ [102 ]) because we are not focusing on the ranks for the two lists.
For a given positive integer N≥10, we consider all the possible

(\binom{N + 3}{3})

confusion matrices for a dataset with N samples and, for each matrix, compute the accuracy, MCC and F₁ score and then the Pearson correlation coefficient for the three set of values. MCC and accuracy resulted strongly correlated, while the Pearson coefficient is less than 0.8 for the correlation of F₁ with the other two measures (Table 3). Interestingly, the correlation grows with N, but the increments are limited.
Table 3

Correlation between MCC, accuracy, and F₁ score values

N	PCC (MCC, F₁ score)	PCC (MCC, accuracy)	PCC (accuracy, F₁ score)
10	0.742162	0.869778	0.744323
25	0.757044	0.893572	0.760708
50	0.766501	0.907654	0.769752
75	0.769883	0.912530	0.772917
100	0.771571	0.914926	0.774495
200	0.774060	0.918401	0.776830
300	0.774870	0.919515	0.777595
400	0.775270	0.920063	0.777976
500	0.775509	0.920388	0.778201
1 000	0.775982	0.921030	0.778652

Pearson correlation coefficient (PCC) between accuracy, MCC and F₁ score computed on all confusion matrices with given number of samples N

Similar to what Flach and colleagues did for their isometrics strategy [66 ], we depict a scatterplot of the MCCs and F₁ scores for all the 21 084 251 possible confusion matrices for a toy dataset with 500 samples (Fig. 1). We take advantage of this scatterplot to overview the mutual relations between MCC and F₁ score.
Fig. 1

Relationship between MCC and F₁ score. Scatterplot of all the 21 084 251 possible confusion matrices for a dataset with 500 samples on the MCC/ F₁ plane. In red, the (−0.04, 0.95) point corresponding to use case A1

The two measures are reasonably concordant, but the scatterplot cloud is wide, implying that for each value of F₁ score there is a corresponding range of values of MCC and vice versa, although with different width. In fact, for any value F₁=ϕ, the MCC varies approximately between [ϕ−1,ϕ], so that the width of the variability range is 1, independent from the value of ϕ. On the other hand, for a given value MCC=μ, the F₁ score can range in [0,μ+1] if μ≤0 and in [μ,1] if μ>0, so that the width of the range is 1−|μ|, that is, it depends on the MCC value μ.
Note that a large portion of the above variability is due to the fact that F₁ is independent from TN: in general, all matrices

M = (\begin{array}{l} α & β \\ γ & x \end{array})

have the same value

F_{1} = \frac{2 α}{2 α + β + γ}

regardless of the value of x, while the corresponding MCC values range from

- \sqrt{\frac{βγ}{(α + β) (α + γ)}}

for x=0 to the asymptotic

\frac{a}{\sqrt{(α + β) (α + γ)}}

for x→∞. For example, if we consider only the 63 001 confusion matrices of datasets of size 500 where TP=TN, the Pearson correlation coefficient between F₁ and MCC increases to 0.9542254.
Overall, accuracy, F₁, and MCC show reliable concordant scores for predictions that correctly classify both positives and negatives (having therefore many TP and TN), and for predictions that incorrectly classify both positives and negatives (having therefore few TP and TN); however, these measures show discordant behaviors when the prediction performs well just with one of the two binary classes. In fact, when a prediction displays many true positives but few true negatives (or many true negatives but few true positives) we will show that F₁ and accuracy can provide misleading information, while MCC always generates results that reflect the overall prediction issues.

Free full text: Click here

Chicco D, & Jurman G. (2020). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics, 21, 6.

Publication 2020

Isometrics Matrices m

Corresponding Organization : Krembil Foundation

Other organizations : Fondazione Bruno Kessler

Top 5 similar protocols

Protocol cited in 236 other protocols

Variable analysis

independent variables

N (positive integer >= 10)

dependent variables

Accuracy
Matthews correlation coefficient (MCC)
F1 score

control variables

Confusion matrices for a dataset with N samples

Annotations

Based on most similar protocols

Etiam vel ipsum. Morbi facilisis vestibulum nisl. Praesent cursus laoreet felis. Integer adipiscing pretium orci. Nulla facilisi. Quisque posuere bibendum purus. Nulla quam mauris, cursus eget, convallis ac, molestie non, enim. Aliquam congue. Quisque sagittis nonummy sapien. Proin molestie sem vitae urna. Maecenas lorem.

As authors may omit details in methods from publication, our AI will look for missing critical information across the 5 most similar protocols.

About PubCompare

Our mission is to provide scientists with the largest repository of trustworthy protocols and intelligent analytical tools, thereby offering them extensive information to design robust protocols aimed at minimizing the risk of failures.

We believe that the most crucial aspect is to grant scientists access to a wide range of reliable sources and new useful tools that surpass human capabilities.

However, we trust in allowing scientists to determine how to construct their own protocols based on this information, as they are the experts in their field.

Ready to get started?

Revolutionizing how scientists
search and build protocols!