2008 | Haibo He, Yang Bai, Eduardo A. Garcia, and Shutao Li
ADASYN is an adaptive synthetic sampling approach for imbalanced learning. The method generates synthetic data for minority class examples based on their difficulty in learning, with more synthetic data generated for harder-to-learn examples. This helps reduce bias from class imbalance and adaptively shift the classification boundary toward difficult examples. Simulation results on various datasets show that ADASYN improves classification performance across five evaluation metrics.
Imbalanced learning is a significant challenge in data mining, with applications in web mining, text categorization, and biomedical data analysis. The problem arises from minority interests (interest in rare objects) and rare instances (limited data for certain events). Imbalanced learning methods include sampling strategies, synthetic data generation, cost-sensitive learning, active learning, and kernel-based methods. ADASYN falls under synthetic data generation, adapting to the data distribution to generate synthetic samples.
The ADASYN algorithm calculates the degree of class imbalance and generates synthetic samples for minority class examples based on their difficulty. It uses a density distribution to determine the number of synthetic samples needed for each example. The algorithm is tested on real-world datasets, including vehicle, diabetes, vowel, ionosphere, and abalone datasets. Evaluation metrics such as overall accuracy, precision, recall, F-measure, and G-mean are used to assess performance.
Simulation results show that ADASYN outperforms SMOTE in most evaluation metrics, particularly in G-mean, indicating improved performance for both minority and majority classes. ADASYN is also effective in reducing bias and improving classification accuracy. Future research directions include extending ADASYN to multi-class imbalanced learning and incremental learning scenarios. The method has potential applications in various real-world domains where imbalanced data is common.ADASYN is an adaptive synthetic sampling approach for imbalanced learning. The method generates synthetic data for minority class examples based on their difficulty in learning, with more synthetic data generated for harder-to-learn examples. This helps reduce bias from class imbalance and adaptively shift the classification boundary toward difficult examples. Simulation results on various datasets show that ADASYN improves classification performance across five evaluation metrics.
Imbalanced learning is a significant challenge in data mining, with applications in web mining, text categorization, and biomedical data analysis. The problem arises from minority interests (interest in rare objects) and rare instances (limited data for certain events). Imbalanced learning methods include sampling strategies, synthetic data generation, cost-sensitive learning, active learning, and kernel-based methods. ADASYN falls under synthetic data generation, adapting to the data distribution to generate synthetic samples.
The ADASYN algorithm calculates the degree of class imbalance and generates synthetic samples for minority class examples based on their difficulty. It uses a density distribution to determine the number of synthetic samples needed for each example. The algorithm is tested on real-world datasets, including vehicle, diabetes, vowel, ionosphere, and abalone datasets. Evaluation metrics such as overall accuracy, precision, recall, F-measure, and G-mean are used to assess performance.
Simulation results show that ADASYN outperforms SMOTE in most evaluation metrics, particularly in G-mean, indicating improved performance for both minority and majority classes. ADASYN is also effective in reducing bias and improving classification accuracy. Future research directions include extending ADASYN to multi-class imbalanced learning and incremental learning scenarios. The method has potential applications in various real-world domains where imbalanced data is common.