February 1, 2017 | Bartosz Krawczyk, Leandro L. Minku, João Gama, Jerzy Stefanowski, Michal Woźniak
This paper surveys research on ensemble learning for data stream classification and regression tasks. It discusses various ensemble approaches for data streams, including advanced learning concepts such as imbalanced data streams, novelty detection, active and semi-supervised learning, complex data representations, and structured outputs. The paper also addresses open research problems and future research directions in this area.
Data streams pose new challenges for machine learning due to their dynamic nature, limited computational resources, and concept drift. Concept drift refers to changes in the distribution of data over time, which can significantly affect the performance of prediction models. Ensemble methods are particularly useful for non-stationary environments, as they can adapt to changes in data distribution.
The paper discusses different types of data streams and learning frameworks, including supervised, semi-supervised, and unsupervised learning. It also covers drift detection methods, which are essential for identifying changes in data distribution. The paper evaluates classifiers and regression models using various metrics, including accuracy, mean square error, sensitivity, G-Mean, and Kappa statistic.
The paper also discusses the evaluation of streaming algorithms, including memory consumption, update time, decision time, and recovery time. It highlights the importance of using both real-world data streams and data streams with artificially induced drifts for evaluating predictive models and concept drift detectors.
The paper concludes with a discussion of open research problems and future research directions in ensemble learning for data streams. It emphasizes the need for further research on handling concept drift, improving the efficiency of ensemble methods, and developing new algorithms for data stream analysis.This paper surveys research on ensemble learning for data stream classification and regression tasks. It discusses various ensemble approaches for data streams, including advanced learning concepts such as imbalanced data streams, novelty detection, active and semi-supervised learning, complex data representations, and structured outputs. The paper also addresses open research problems and future research directions in this area.
Data streams pose new challenges for machine learning due to their dynamic nature, limited computational resources, and concept drift. Concept drift refers to changes in the distribution of data over time, which can significantly affect the performance of prediction models. Ensemble methods are particularly useful for non-stationary environments, as they can adapt to changes in data distribution.
The paper discusses different types of data streams and learning frameworks, including supervised, semi-supervised, and unsupervised learning. It also covers drift detection methods, which are essential for identifying changes in data distribution. The paper evaluates classifiers and regression models using various metrics, including accuracy, mean square error, sensitivity, G-Mean, and Kappa statistic.
The paper also discusses the evaluation of streaming algorithms, including memory consumption, update time, decision time, and recovery time. It highlights the importance of using both real-world data streams and data streams with artificially induced drifts for evaluating predictive models and concept drift detectors.
The paper concludes with a discussion of open research problems and future research directions in ensemble learning for data streams. It emphasizes the need for further research on handling concept drift, improving the efficiency of ensemble methods, and developing new algorithms for data stream analysis.