Cost-sensitive boosting for classification of imbalanced data

Cost-sensitive boosting for classification of imbalanced data

2007 | Yanmin Sun, Mohamed S. Kamel, Andrew K.C. Wong, Yang Wang
This paper presents cost-sensitive boosting algorithms for improving classification performance on imbalanced data. The authors investigate the use of AdaBoost as a meta-technique to enhance classification accuracy. They propose three cost-sensitive boosting algorithms by incorporating cost items into the AdaBoost learning framework. These cost items reflect the varying importance of correctly identifying different classes, allowing the boosting strategies to bias learning towards the class with higher identification importance. The authors analyze the effectiveness of these algorithms in identifying rare cases using real-world medical datasets with class imbalance. The paper discusses the nature of the class imbalance problem, including skewed data distribution, small sample size, separability, and within-class sub-concepts. It reviews existing solutions at the data and algorithm levels, including resampling techniques and cost-sensitive learning. The authors propose three cost-sensitive boosting algorithms (AdaC1, AdaC2, and AdaC3) that incorporate cost items into the AdaBoost framework. These algorithms are evaluated based on their weighting strategies and effectiveness in identifying rare cases. The paper also discusses the relationship between AdaBoost and forward stagewise additive modelling using exponential loss. It shows that one of the proposed algorithms aligns with this statistical approach. The authors analyze the weighting strategies of the proposed algorithms and their performance on real-world medical datasets. The results indicate that the proposed cost-sensitive boosting algorithms can improve classification performance on imbalanced data by focusing on the rare class. The study highlights the importance of cost-sensitive learning in addressing the challenges of class imbalance in classification tasks.This paper presents cost-sensitive boosting algorithms for improving classification performance on imbalanced data. The authors investigate the use of AdaBoost as a meta-technique to enhance classification accuracy. They propose three cost-sensitive boosting algorithms by incorporating cost items into the AdaBoost learning framework. These cost items reflect the varying importance of correctly identifying different classes, allowing the boosting strategies to bias learning towards the class with higher identification importance. The authors analyze the effectiveness of these algorithms in identifying rare cases using real-world medical datasets with class imbalance. The paper discusses the nature of the class imbalance problem, including skewed data distribution, small sample size, separability, and within-class sub-concepts. It reviews existing solutions at the data and algorithm levels, including resampling techniques and cost-sensitive learning. The authors propose three cost-sensitive boosting algorithms (AdaC1, AdaC2, and AdaC3) that incorporate cost items into the AdaBoost framework. These algorithms are evaluated based on their weighting strategies and effectiveness in identifying rare cases. The paper also discusses the relationship between AdaBoost and forward stagewise additive modelling using exponential loss. It shows that one of the proposed algorithms aligns with this statistical approach. The authors analyze the weighting strategies of the proposed algorithms and their performance on real-world medical datasets. The results indicate that the proposed cost-sensitive boosting algorithms can improve classification performance on imbalanced data by focusing on the rare class. The study highlights the importance of cost-sensitive learning in addressing the challenges of class imbalance in classification tasks.
Reach us at info@study.space
[slides and audio] Cost-sensitive boosting for classification of imbalanced data