A survey on imbalanced learning: latest research, applications and future directions

A survey on imbalanced learning: latest research, applications and future directions

Accepted: 7 April 2024 / Published online: 9 May 2024 | Wuxing Chen1,2 · Kaixiang Yang3 · Zhiwen Yu3 · Yifan Shi4 · C. L. Philip Chen3
This paper provides a comprehensive review of the latest research, applications, and future directions in imbalanced learning. Imbalanced learning is a significant challenge in data mining and machine learning, where the class distribution is uneven, often leading to biased models that favor the majority class. The paper categorizes existing strategies into five types: general methods, ensemble learning methods, imbalanced regression and clustering, long-tail learning, and imbalanced data streams. It also explores real-world applications in various fields such as management science and engineering, and discusses emerging issues and challenges. The introduction highlights the prevalence of imbalanced data in real-life scenarios and the need for intelligent systems to address this bias. The paper reviews the evolution of imbalanced learning over the past two decades, noting the extensive research and the importance of handling imbalanced data in applications like fault detection, fraud detection, and medical diagnosis. The statistical research methodology section outlines a multi-stage review strategy, including preliminary results, literature search, and keyword searches across multiple databases. The paper presents a detailed search framework and the publication trends in imbalanced learning, emphasizing the growing interest in the field. The paper then delves into various approaches to address imbalanced data classification, including data-level approaches (oversampling and undersampling), algorithm-level approaches (cost-sensitive learning, weighted neural networks), hybrid approaches (combining sampling methods with ensemble learning), and ensemble learning methods (general framework, boosting, bagging, cost-sensitive ensemble). The regression and semi/unsupervised learning sections discuss the challenges and solutions for imbalanced regression problems, emphasizing the importance of developing appropriate metrics and techniques to handle undersampling and outliers in continuous output predictions. Overall, the paper aims to provide a unified and comprehensive overview of imbalanced learning, highlighting recent advancements, practical applications, and future research directions.This paper provides a comprehensive review of the latest research, applications, and future directions in imbalanced learning. Imbalanced learning is a significant challenge in data mining and machine learning, where the class distribution is uneven, often leading to biased models that favor the majority class. The paper categorizes existing strategies into five types: general methods, ensemble learning methods, imbalanced regression and clustering, long-tail learning, and imbalanced data streams. It also explores real-world applications in various fields such as management science and engineering, and discusses emerging issues and challenges. The introduction highlights the prevalence of imbalanced data in real-life scenarios and the need for intelligent systems to address this bias. The paper reviews the evolution of imbalanced learning over the past two decades, noting the extensive research and the importance of handling imbalanced data in applications like fault detection, fraud detection, and medical diagnosis. The statistical research methodology section outlines a multi-stage review strategy, including preliminary results, literature search, and keyword searches across multiple databases. The paper presents a detailed search framework and the publication trends in imbalanced learning, emphasizing the growing interest in the field. The paper then delves into various approaches to address imbalanced data classification, including data-level approaches (oversampling and undersampling), algorithm-level approaches (cost-sensitive learning, weighted neural networks), hybrid approaches (combining sampling methods with ensemble learning), and ensemble learning methods (general framework, boosting, bagging, cost-sensitive ensemble). The regression and semi/unsupervised learning sections discuss the challenges and solutions for imbalanced regression problems, emphasizing the importance of developing appropriate metrics and techniques to handle undersampling and outliers in continuous output predictions. Overall, the paper aims to provide a unified and comprehensive overview of imbalanced learning, highlighting recent advancements, practical applications, and future research directions.
Reach us at info@study.space