Mining High-Speed Data Streams

Mining High-Speed Data Streams

2000 | Pedro Domingos, Geoff Hulten
This paper introduces Hoeffding trees, a method for learning from high-volume data streams, which allows for constant-time learning per example and guarantees asymptotic similarity to batch learners. The authors propose VFDT (Very Fast Decision Tree learner), a system based on Hoeffding trees that can process tens of thousands of examples per second using off-the-shelf hardware. VFDT is designed to handle continuous data streams, incorporating examples as they arrive without storing them in memory. The paper evaluates VFDT's properties through extensive experiments on synthetic data and applies it to mining Web access data from the University of Washington main campus. The results demonstrate VFDT's effectiveness in handling large datasets and its ability to improve accuracy with additional data. The authors also discuss related work and future directions, including comparisons with other algorithms and applications to Web log data and other domains.This paper introduces Hoeffding trees, a method for learning from high-volume data streams, which allows for constant-time learning per example and guarantees asymptotic similarity to batch learners. The authors propose VFDT (Very Fast Decision Tree learner), a system based on Hoeffding trees that can process tens of thousands of examples per second using off-the-shelf hardware. VFDT is designed to handle continuous data streams, incorporating examples as they arrive without storing them in memory. The paper evaluates VFDT's properties through extensive experiments on synthetic data and applies it to mining Web access data from the University of Washington main campus. The results demonstrate VFDT's effectiveness in handling large datasets and its ability to improve accuracy with additional data. The authors also discuss related work and future directions, including comparisons with other algorithms and applications to Web log data and other domains.
Reach us at info@study.space