Resampling strategies for imbalanced regression: a survey and empirical analysis

Resampling strategies for imbalanced regression: a survey and empirical analysis

4 March 2024 | Juscimara G. Avelino¹ · George D. C. Cavalcanti¹ · Rafael M. O. Cruz²
This paper presents an extensive experimental study on resampling strategies for imbalanced regression tasks. It explores various balancing and predictive models, using metrics to evaluate the performance of predictive models in imbalanced regression contexts. The study proposes a taxonomy for imbalanced regression approaches based on three key criteria: regression model, learning process, and evaluation metrics. The paper highlights the advantages of these strategies for different models and suggests directions for further research. The code, data, and additional information related to the experiments are available on GitHub. Imbalanced datasets are common in real-world applications. In classification tasks, imbalanced data is typically addressed through resampling or balancing algorithms. However, imbalanced data also occurs in regression tasks, where the target variable is continuous. In regression, the target value is not limited to discrete categories, making the definition of imbalance more complex. The paper discusses the challenges of imbalanced regression, where rare and extreme values can have significant impacts on prediction performance. Standard regression methods often focus on the most frequent values, neglecting rare ones that may be more important for the user or the prediction process. The paper reviews existing resampling strategies for imbalanced regression, including random under-sampling, random over-sampling, and the weighted relevance-based combination strategy (WERCS). These strategies aim to balance the distribution of data before the learning process begins. The study also highlights the importance of considering the relevance of continuous target values in imbalanced regression tasks. The paper concludes that further research is needed to improve the performance of regression models in imbalanced scenarios.This paper presents an extensive experimental study on resampling strategies for imbalanced regression tasks. It explores various balancing and predictive models, using metrics to evaluate the performance of predictive models in imbalanced regression contexts. The study proposes a taxonomy for imbalanced regression approaches based on three key criteria: regression model, learning process, and evaluation metrics. The paper highlights the advantages of these strategies for different models and suggests directions for further research. The code, data, and additional information related to the experiments are available on GitHub. Imbalanced datasets are common in real-world applications. In classification tasks, imbalanced data is typically addressed through resampling or balancing algorithms. However, imbalanced data also occurs in regression tasks, where the target variable is continuous. In regression, the target value is not limited to discrete categories, making the definition of imbalance more complex. The paper discusses the challenges of imbalanced regression, where rare and extreme values can have significant impacts on prediction performance. Standard regression methods often focus on the most frequent values, neglecting rare ones that may be more important for the user or the prediction process. The paper reviews existing resampling strategies for imbalanced regression, including random under-sampling, random over-sampling, and the weighted relevance-based combination strategy (WERCS). These strategies aim to balance the distribution of data before the learning process begins. The study also highlights the importance of considering the relevance of continuous target values in imbalanced regression tasks. The paper concludes that further research is needed to improve the performance of regression models in imbalanced scenarios.
Reach us at info@study.space