January 1 | PAULA BRANCO and LUIS TORGO and RITA P. RIBEIRO, LIAAD - INESC TEC, DCC - Faculty of Sciences, University of Porto, Porto, Portugal
This paper provides a comprehensive survey of techniques for handling imbalanced domains in predictive modeling, particularly focusing on classification and regression tasks. Imbalanced domains are characterized by a significant disparity in the representation of different classes or values of the target variable, where the less common classes are more relevant to the user but are poorly represented in the training data. The authors define the problem, discuss the challenges, and propose a taxonomy of approaches to address these issues. They review various performance metrics, both scalar and graphical, designed to evaluate models effectively in imbalanced domains. These metrics include accuracy, precision, recall, Fβ, geometric mean, dominance, and AUC, among others. The paper also explores methods for modifying learning algorithms, transforming predictions, and combining different strategies to improve model performance. Additionally, it discusses the limitations of traditional evaluation metrics and the need for more context-specific measures. The authors conclude by summarizing the key findings from comparative studies and theoretical analyses, highlighting the importance of adapting evaluation metrics to the specific application domain.This paper provides a comprehensive survey of techniques for handling imbalanced domains in predictive modeling, particularly focusing on classification and regression tasks. Imbalanced domains are characterized by a significant disparity in the representation of different classes or values of the target variable, where the less common classes are more relevant to the user but are poorly represented in the training data. The authors define the problem, discuss the challenges, and propose a taxonomy of approaches to address these issues. They review various performance metrics, both scalar and graphical, designed to evaluate models effectively in imbalanced domains. These metrics include accuracy, precision, recall, Fβ, geometric mean, dominance, and AUC, among others. The paper also explores methods for modifying learning algorithms, transforming predictions, and combining different strategies to improve model performance. Additionally, it discusses the limitations of traditional evaluation metrics and the need for more context-specific measures. The authors conclude by summarizing the key findings from comparative studies and theoretical analyses, highlighting the importance of adapting evaluation metrics to the specific application domain.