Accepted: 4 February 2024 / Published online: 4 March 2024 | Juscimara G. Avelino, George D. C. Cavalcanti, Rafael M. O. Cruz
The paper "Resampling strategies for imbalanced regression: a survey and empirical analysis" by Juscimara G. Avelino, George D. C. Cavalcanti, and Rafael M. O. Cruz addresses the issue of imbalanced datasets in regression tasks, where the target values are continuous. The authors provide an extensive experimental study that evaluates various balancing and predictive models using metrics to capture important elements for users and to assess the predictive model's performance in imbalanced regression data. They propose a taxonomy for imbalanced regression approaches based on three criteria: regression model, learning process, and evaluation metrics. The study highlights the advantages of these strategies for each model's learning process and suggests directions for further research. The code, data, and additional information related to the experiments are available on GitHub.
The introduction explains that imbalanced datasets are common in real-world applications, particularly in regression tasks where the target value is continuous. Unlike classification tasks, where imbalanced datasets are defined by a minority and majority class, regression datasets can have a wide range of target values. The authors use the FuelCons dataset to illustrate the distribution and frequency of target values, emphasizing the importance of rare and extreme cases in prediction tasks. They discuss the challenges these cases pose for learning algorithms and the need for balancing strategies. Common approaches to address imbalanced regression include Random Under-sampling, Random Over-sampling, and the WEighted Relevance-based Combination Strategy (WERCS). The paper also highlights the importance of resampling strategies in dealing with rare and extreme cases in real-world applications, such as software defect prediction.The paper "Resampling strategies for imbalanced regression: a survey and empirical analysis" by Juscimara G. Avelino, George D. C. Cavalcanti, and Rafael M. O. Cruz addresses the issue of imbalanced datasets in regression tasks, where the target values are continuous. The authors provide an extensive experimental study that evaluates various balancing and predictive models using metrics to capture important elements for users and to assess the predictive model's performance in imbalanced regression data. They propose a taxonomy for imbalanced regression approaches based on three criteria: regression model, learning process, and evaluation metrics. The study highlights the advantages of these strategies for each model's learning process and suggests directions for further research. The code, data, and additional information related to the experiments are available on GitHub.
The introduction explains that imbalanced datasets are common in real-world applications, particularly in regression tasks where the target value is continuous. Unlike classification tasks, where imbalanced datasets are defined by a minority and majority class, regression datasets can have a wide range of target values. The authors use the FuelCons dataset to illustrate the distribution and frequency of target values, emphasizing the importance of rare and extreme cases in prediction tasks. They discuss the challenges these cases pose for learning algorithms and the need for balancing strategies. Common approaches to address imbalanced regression include Random Under-sampling, Random Over-sampling, and the WEighted Relevance-based Combination Strategy (WERCS). The paper also highlights the importance of resampling strategies in dealing with rare and extreme cases in real-world applications, such as software defect prediction.