August 14, 2020 | Margherita Grandini, Enrico Bagli, Giorgio Visani
This white paper provides an overview of metrics used in multi-class classification, highlighting their advantages, disadvantages, and applications in evaluating and comparing classification models. The paper begins by explaining the concept of multi-class classification, where outcomes belong to more than two classes, and introduces the importance of performance indicators in assessing model accuracy.
The confusion matrix is a key tool in evaluating classification performance, providing a cross-tabulation of true and predicted classifications. From this matrix, metrics such as precision, recall, and accuracy are derived. Accuracy measures the proportion of correct predictions, while balanced accuracy adjusts for class imbalance by averaging recall across classes. The F1-score, a harmonic mean of precision and recall, offers a balanced measure of a model's ability to predict both positive and negative classes.
The paper also discusses the multi-class extensions of these metrics, including macro and micro F1-scores, which provide different perspectives on model performance. The macro F1-score averages performance across all classes, while the micro F1-score considers the overall performance of the model. Cross-entropy is introduced as a metric that evaluates the similarity between predicted and true probability distributions, offering a measure of agreement between the two.
Additionally, the paper covers the Matthews Correlation Coefficient (MCC) and Cohen's Kappa, both of which assess the correlation between predicted and true classifications. MCC is a balanced measure that accounts for all entries in the confusion matrix, while Cohen's Kappa adjusts for chance agreement, providing a more accurate assessment of model performance.
The paper concludes with a discussion of the importance of selecting appropriate metrics based on the specific goals of the classification task, emphasizing the need for a nuanced understanding of each metric's strengths and limitations in different scenarios.This white paper provides an overview of metrics used in multi-class classification, highlighting their advantages, disadvantages, and applications in evaluating and comparing classification models. The paper begins by explaining the concept of multi-class classification, where outcomes belong to more than two classes, and introduces the importance of performance indicators in assessing model accuracy.
The confusion matrix is a key tool in evaluating classification performance, providing a cross-tabulation of true and predicted classifications. From this matrix, metrics such as precision, recall, and accuracy are derived. Accuracy measures the proportion of correct predictions, while balanced accuracy adjusts for class imbalance by averaging recall across classes. The F1-score, a harmonic mean of precision and recall, offers a balanced measure of a model's ability to predict both positive and negative classes.
The paper also discusses the multi-class extensions of these metrics, including macro and micro F1-scores, which provide different perspectives on model performance. The macro F1-score averages performance across all classes, while the micro F1-score considers the overall performance of the model. Cross-entropy is introduced as a metric that evaluates the similarity between predicted and true probability distributions, offering a measure of agreement between the two.
Additionally, the paper covers the Matthews Correlation Coefficient (MCC) and Cohen's Kappa, both of which assess the correlation between predicted and true classifications. MCC is a balanced measure that accounts for all entries in the confusion matrix, while Cohen's Kappa adjusts for chance agreement, providing a more accurate assessment of model performance.
The paper concludes with a discussion of the importance of selecting appropriate metrics based on the specific goals of the classification task, emphasizing the need for a nuanced understanding of each metric's strengths and limitations in different scenarios.