Received 3 September 2006; received in revised form 10 February 2007; accepted 17 April 2007 | Yanmin Sun, Mohamed S. Kamel, Andrew K.C. Wong, Yang Wang
The paper addresses the challenge of class imbalance in classification tasks, where the minority class is underrepresented compared to the majority class, leading to poor performance of standard classifiers. The authors investigate meta-techniques applicable to most classifier learning algorithms to improve the classification accuracy of imbalanced data. AdaBoost is identified as a successful meta-technique for enhancing classification accuracy, but its accuracy-oriented approach may bias towards the majority class. To address this, three cost-sensitive boosting algorithms (AdaC1, AdaC2, and AdaC3) are proposed by introducing cost items into the AdaBoost framework. These algorithms aim to bias the learning towards the minority class by assigning higher weights to misclassified samples from the minority class. The cost items are used to denote the uneven identification importance between classes, and the algorithms are designed to minimize the cost exponential loss. The effectiveness of these algorithms is evaluated through experiments on real-world medical datasets, where the class imbalance problem is prevalent. The study also explores the weighting strategies of these boosting algorithms and their ability to identify rare cases. The results show that one of the proposed algorithms aligns with forward stagewise additive modelling, which minimizes the cost exponential loss. The paper concludes by highlighting the importance of cost-sensitive learning in handling class imbalance and suggesting future research directions.The paper addresses the challenge of class imbalance in classification tasks, where the minority class is underrepresented compared to the majority class, leading to poor performance of standard classifiers. The authors investigate meta-techniques applicable to most classifier learning algorithms to improve the classification accuracy of imbalanced data. AdaBoost is identified as a successful meta-technique for enhancing classification accuracy, but its accuracy-oriented approach may bias towards the majority class. To address this, three cost-sensitive boosting algorithms (AdaC1, AdaC2, and AdaC3) are proposed by introducing cost items into the AdaBoost framework. These algorithms aim to bias the learning towards the minority class by assigning higher weights to misclassified samples from the minority class. The cost items are used to denote the uneven identification importance between classes, and the algorithms are designed to minimize the cost exponential loss. The effectiveness of these algorithms is evaluated through experiments on real-world medical datasets, where the class imbalance problem is prevalent. The study also explores the weighting strategies of these boosting algorithms and their ability to identify rare cases. The results show that one of the proposed algorithms aligns with forward stagewise additive modelling, which minimizes the cost exponential loss. The paper concludes by highlighting the importance of cost-sensitive learning in handling class imbalance and suggesting future research directions.