January 1, 2016 | PAULA BRANCO and LUÍS TORGO and RITA P. RIBEIRO
A Survey of Predictive Modeling on Imbalanced Domains by Paula Branco, Luís Torgo, and Rita P. Ribeiro presents a comprehensive review of techniques for handling imbalanced data in predictive modeling. The paper addresses both classification and regression tasks, highlighting the challenges posed by imbalanced domains where rare cases are critical for users. It discusses the need for performance metrics that prioritize rare cases and methods to adjust learning algorithms to focus on these cases. The authors propose a taxonomy of existing approaches, summarize comparative studies, and provide theoretical analyses. The paper emphasizes the importance of user-defined relevance functions and the limitations of traditional metrics in imbalanced settings. It also explores graphical-based metrics like ROC and precision-recall curves for evaluating model performance. For regression tasks, the paper introduces utility-based metrics that consider the relevance of target values. The study concludes that while many solutions exist, there is still a need for more effective methods to handle imbalanced domains in both classification and regression.A Survey of Predictive Modeling on Imbalanced Domains by Paula Branco, Luís Torgo, and Rita P. Ribeiro presents a comprehensive review of techniques for handling imbalanced data in predictive modeling. The paper addresses both classification and regression tasks, highlighting the challenges posed by imbalanced domains where rare cases are critical for users. It discusses the need for performance metrics that prioritize rare cases and methods to adjust learning algorithms to focus on these cases. The authors propose a taxonomy of existing approaches, summarize comparative studies, and provide theoretical analyses. The paper emphasizes the importance of user-defined relevance functions and the limitations of traditional metrics in imbalanced settings. It also explores graphical-based metrics like ROC and precision-recall curves for evaluating model performance. For regression tasks, the paper introduces utility-based metrics that consider the relevance of target values. The study concludes that while many solutions exist, there is still a need for more effective methods to handle imbalanced domains in both classification and regression.