VOL. 39, NO. 1, FEBRUARY 2009 | Yuchun Tang, Member, IEEE, Yan-Qing Zhang, Member, IEEE, Nitesh V. Chawla, Member, IEEE, and Sven Krasser, Member, IEEE
This paper addresses the challenge of class imbalance in classification tasks, particularly in highly imbalanced datasets. Traditional classification algorithms often struggle with this issue, leading to biased models that favor the majority class. The authors propose a novel approach called Granular SVMs—repetitive undersampling algorithm (GSVM-RU) to tackle class imbalance in support vector machines (SVMs). GSVM-RU incorporates granular computing principles to perform undersampling, which helps in minimizing information loss while maximizing data cleaning. The algorithm is designed to be both effective and efficient, outperforming or matching the performance of state-of-the-art methods on various datasets using metrics such as G-mean, AUC-ROC, F-measure, and AUC-PR. The paper also compares GSVM-RU with other SVM modeling techniques, including cost-sensitive learning, oversampling, and undersampling, demonstrating its superior performance in terms of both effectiveness and efficiency. The authors conclude that GSVM-RU is a promising method for handling highly imbalanced datasets, offering a robust solution to the class imbalance problem.This paper addresses the challenge of class imbalance in classification tasks, particularly in highly imbalanced datasets. Traditional classification algorithms often struggle with this issue, leading to biased models that favor the majority class. The authors propose a novel approach called Granular SVMs—repetitive undersampling algorithm (GSVM-RU) to tackle class imbalance in support vector machines (SVMs). GSVM-RU incorporates granular computing principles to perform undersampling, which helps in minimizing information loss while maximizing data cleaning. The algorithm is designed to be both effective and efficient, outperforming or matching the performance of state-of-the-art methods on various datasets using metrics such as G-mean, AUC-ROC, F-measure, and AUC-PR. The paper also compares GSVM-RU with other SVM modeling techniques, including cost-sensitive learning, oversampling, and undersampling, demonstrating its superior performance in terms of both effectiveness and efficiency. The authors conclude that GSVM-RU is a promising method for handling highly imbalanced datasets, offering a robust solution to the class imbalance problem.