[slides and audio] Very Simple Classification Rules Perform Well on Most Commonly Used Datasets

This article investigates the accuracy of very simple classification rules, specifically 1-rules (rules that classify examples based on a single attribute), and compares them to more complex rules induced by machine learning systems. The study uses 16 commonly used datasets and a program called IR, which learns 1-rules from examples. The results show that IR's 1-rules are only slightly less accurate (on average, 3.1 percentage points less accurate) than the pruned decision trees produced by C4, a state-of-the-art learning algorithm. This suggests that simple rules can achieve high accuracy on many datasets, challenging the traditional approach of focusing on complex hypotheses. The article explores the implications of this finding for machine learning research and applications, including the development of a practical prediction system based on I-rule accuracy and the consideration of a "simplicity first" research methodology. The practical significance of the findings is assessed by examining the representativeness of the datasets used in the study, concluding that most datasets are typical of real-world classification problems where simple rules can perform well.This article investigates the accuracy of very simple classification rules, specifically 1-rules (rules that classify examples based on a single attribute), and compares them to more complex rules induced by machine learning systems. The study uses 16 commonly used datasets and a program called IR, which learns 1-rules from examples. The results show that IR's 1-rules are only slightly less accurate (on average, 3.1 percentage points less accurate) than the pruned decision trees produced by C4, a state-of-the-art learning algorithm. This suggests that simple rules can achieve high accuracy on many datasets, challenging the traditional approach of focusing on complex hypotheses. The article explores the implications of this finding for machine learning research and applications, including the development of a practical prediction system based on I-rule accuracy and the consideration of a "simplicity first" research methodology. The practical significance of the findings is assessed by examining the representativeness of the datasets used in the study, concluding that most datasets are typical of real-world classification problems where simple rules can perform well.

Very Simple Classification Rules Perform Well on Most Commonly Used Datasets

1993 | ROBERT C. HOLTE