2002 | Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, W. Philip Kegelmeyer
SMOTE is a technique for handling imbalanced datasets by combining minority class oversampling and majority class undersampling. The method creates synthetic minority class examples to improve classifier performance. Experiments using C4.5, Ripper, and Naive Bayes classifiers show that SMOTE outperforms other methods in ROC space. The technique involves generating synthetic examples along line segments between minority class samples and combining it with majority class undersampling. The method is evaluated using the area under the ROC curve (AUC) and ROC convex hull. SMOTE-NC is an extension for datasets with both continuous and nominal features. Experiments show that SMOTE-NC performs worse than plain under-sampling in some cases. Future work includes adaptive selection of nearest neighbors and extensions of SMOTE for information retrieval.SMOTE is a technique for handling imbalanced datasets by combining minority class oversampling and majority class undersampling. The method creates synthetic minority class examples to improve classifier performance. Experiments using C4.5, Ripper, and Naive Bayes classifiers show that SMOTE outperforms other methods in ROC space. The technique involves generating synthetic examples along line segments between minority class samples and combining it with majority class undersampling. The method is evaluated using the area under the ROC curve (AUC) and ROC convex hull. SMOTE-NC is an extension for datasets with both continuous and nominal features. Experiments show that SMOTE-NC performs worse than plain under-sampling in some cases. Future work includes adaptive selection of nearest neighbors and extensions of SMOTE for information retrieval.