Understanding SMOTE for Learning from Imbalanced Data%3A Progress and Challenges%2C Marking the 15-year Anniversary

The paper "SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary" by Alberto Fernández, Salvador García, Francisco Herrera, and Nitesh V. Chawla reviews the Synthetic Minority Over-sampling Technique (SMOTE) and its impact over the past 15 years. SMOTE is a preprocessing algorithm designed to address class imbalance in machine learning datasets, which has become a "de facto" standard due to its simplicity and robustness. The authors discuss the current state of SMOTE, its applications, and the challenges it faces, particularly in handling Big Data problems. They also highlight the extensions and variations of SMOTE, such as Borderline-SMOTE, ADASYN, and MWMOTE, and their use in different learning paradigms like streaming data, semi-supervised learning, multi-instance learning, and multi-label classification. The paper concludes by identifying future research directions, including the handling of small disjuncts, noise, and lack of data, as well as the impact of dimensionality and dataset shift.The paper "SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary" by Alberto Fernández, Salvador García, Francisco Herrera, and Nitesh V. Chawla reviews the Synthetic Minority Over-sampling Technique (SMOTE) and its impact over the past 15 years. SMOTE is a preprocessing algorithm designed to address class imbalance in machine learning datasets, which has become a "de facto" standard due to its simplicity and robustness. The authors discuss the current state of SMOTE, its applications, and the challenges it faces, particularly in handling Big Data problems. They also highlight the extensions and variations of SMOTE, such as Borderline-SMOTE, ADASYN, and MWMOTE, and their use in different learning paradigms like streaming data, semi-supervised learning, multi-instance learning, and multi-label classification. The paper concludes by identifying future research directions, including the handling of small disjuncts, noise, and lack of data, as well as the impact of dimensionality and dataset shift.

SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary

Submitted 06/17; published 04/18 | Alberto Fernández, Salvador García, Francisco Herrera, Nitesh V. Chawla