The Power of Decision Tables

The Power of Decision Tables

| Ron Kohavi
This paper evaluates the effectiveness of decision tables as a hypothesis space for supervised learning. Decision tables are simple and easy to understand, and experiments show that the IDTM algorithm, which induces decision tables, can outperform state-of-the-art algorithms like C4.5 on artificial and real-world datasets with discrete features. Surprisingly, IDTM also performs well on datasets with continuous features, suggesting that many real-world datasets do not require these features or that they have few values. The paper describes an incremental method for cross-validation applicable to incremental learning algorithms like IDTM, which allows cross-validation to be done in linear time relative to the number of instances, features, and label values. The time for incremental cross-validation is independent of the number of folds, so leave-one-out and ten-fold cross-validation take the same time. The paper investigates the power of decision tables with a default rule mapping to the majority class (DTM). A DTM consists of a schema (set of features) and a body (labelled instances). The algorithm searches for the optimal feature subset using best-first search and cross-validation to estimate accuracy. The goal is to find the feature subset that minimizes error with respect to the target function. The paper shows that IDTM can achieve high accuracy in discrete domains, outperforming C4.5 in some cases. On datasets with continuous features, IDTM performs similarly to C4.5, indicating that these features may not be useful or may have few values. The paper also discusses related work on decision tables and feature subset selection, and concludes that decision tables can be a useful hypothesis space for induction algorithms. The paper suggests that IDTM is practical due to its incremental cross-validation method, which allows efficient evaluation of feature subsets. The paper also notes that while IDTM may not always find the best feature subset, it often performs well and can be used as a starting point for more complex algorithms.This paper evaluates the effectiveness of decision tables as a hypothesis space for supervised learning. Decision tables are simple and easy to understand, and experiments show that the IDTM algorithm, which induces decision tables, can outperform state-of-the-art algorithms like C4.5 on artificial and real-world datasets with discrete features. Surprisingly, IDTM also performs well on datasets with continuous features, suggesting that many real-world datasets do not require these features or that they have few values. The paper describes an incremental method for cross-validation applicable to incremental learning algorithms like IDTM, which allows cross-validation to be done in linear time relative to the number of instances, features, and label values. The time for incremental cross-validation is independent of the number of folds, so leave-one-out and ten-fold cross-validation take the same time. The paper investigates the power of decision tables with a default rule mapping to the majority class (DTM). A DTM consists of a schema (set of features) and a body (labelled instances). The algorithm searches for the optimal feature subset using best-first search and cross-validation to estimate accuracy. The goal is to find the feature subset that minimizes error with respect to the target function. The paper shows that IDTM can achieve high accuracy in discrete domains, outperforming C4.5 in some cases. On datasets with continuous features, IDTM performs similarly to C4.5, indicating that these features may not be useful or may have few values. The paper also discusses related work on decision tables and feature subset selection, and concludes that decision tables can be a useful hypothesis space for induction algorithms. The paper suggests that IDTM is practical due to its incremental cross-validation method, which allows efficient evaluation of feature subsets. The paper also notes that while IDTM may not always find the best feature subset, it often performs well and can be used as a starting point for more complex algorithms.
Reach us at info@futurestudyspace.com
[slides and audio] The Power of Decision Tables