Empirical analysis of performance assessment for imbalanced classification

Empirical analysis of performance assessment for imbalanced classification

23 January 2024 | Jean-Gabriel Gaudreault, Paula Branco
This paper presents an empirical analysis of performance assessment in imbalanced classification. The study investigates the impact of different performance metrics on classifier evaluation in scenarios where one class is heavily underrepresented. It highlights the importance of selecting appropriate metrics based on the problem context and data characteristics. The authors compare various metrics, including the F1 score, H-measure, and G-Mean, and show that different metrics can lead to different conclusions about the best classifier. They argue that metrics such as Davis' interpolation of the area under the precision-recall curve and the Matthews Correlation Coefficient are preferable in imbalanced settings, while the F1 score and G-Mean should be avoided in noisy label scenarios. The study also emphasizes the need to use multiple metrics that are fundamentally different in imbalanced domains to gain a comprehensive understanding of model performance. The authors demonstrate that the choice of metric can significantly affect the outcome of model selection, especially in highly imbalanced data. They provide guidelines for selecting appropriate performance metrics based on the specific context of the problem. The study shows that in some cases, the best classifier according to one metric may not be the best according to another, and that this discrepancy becomes more pronounced at higher levels of imbalance. The authors aim to provide researchers and practitioners with insights to help them select the most appropriate performance metrics for evaluating imbalanced classification problems.This paper presents an empirical analysis of performance assessment in imbalanced classification. The study investigates the impact of different performance metrics on classifier evaluation in scenarios where one class is heavily underrepresented. It highlights the importance of selecting appropriate metrics based on the problem context and data characteristics. The authors compare various metrics, including the F1 score, H-measure, and G-Mean, and show that different metrics can lead to different conclusions about the best classifier. They argue that metrics such as Davis' interpolation of the area under the precision-recall curve and the Matthews Correlation Coefficient are preferable in imbalanced settings, while the F1 score and G-Mean should be avoided in noisy label scenarios. The study also emphasizes the need to use multiple metrics that are fundamentally different in imbalanced domains to gain a comprehensive understanding of model performance. The authors demonstrate that the choice of metric can significantly affect the outcome of model selection, especially in highly imbalanced data. They provide guidelines for selecting appropriate performance metrics based on the specific context of the problem. The study shows that in some cases, the best classifier according to one metric may not be the best according to another, and that this discrepancy becomes more pronounced at higher levels of imbalance. The authors aim to provide researchers and practitioners with insights to help them select the most appropriate performance metrics for evaluating imbalanced classification problems.
Reach us at info@study.space
[slides and audio] Empirical analysis of performance assessment for imbalanced classification