[slides and audio] Supervised and Unsupervised Discretization of Continuous Features

This paper reviews and evaluates various methods for discretizing continuous features in machine learning, focusing on their impact on the performance of supervised algorithms. The authors compare binning (an unsupervised method) with entropy-based and purity-based methods (supervised algorithms). They find that the Naive-Bayes algorithm significantly improves when features are discretized using an entropy-based method, outperforming C4.5 on average across 16 datasets. Additionally, they observe that C4.5's performance can be significantly enhanced by pre-discretizing features, though it does not degrade significantly. The study highlights the benefits of global, entropy-based discretization and suggests that dynamic discretization methods, which consider interdependencies between features, are promising avenues for future research.This paper reviews and evaluates various methods for discretizing continuous features in machine learning, focusing on their impact on the performance of supervised algorithms. The authors compare binning (an unsupervised method) with entropy-based and purity-based methods (supervised algorithms). They find that the Naive-Bayes algorithm significantly improves when features are discretized using an entropy-based method, outperforming C4.5 on average across 16 datasets. Additionally, they observe that C4.5's performance can be significantly enhanced by pre-discretizing features, though it does not degrade significantly. The study highlights the benefits of global, entropy-based discretization and suggests that dynamic discretization methods, which consider interdependencies between features, are promising avenues for future research.

Supervised and Unsupervised Discretization of Continuous Features

1995 | James Dougherty, Ron Kohavi, Mehran Sahami