Toward Optimal Feature Selection

Toward Optimal Feature Selection

| Daphne Koller, Mehran Sahami
This paper presents a method for feature subset selection based on Information Theory. The authors propose a theoretical framework for optimal feature selection, where the goal is to eliminate features that provide little or no additional information beyond what is already captured by the remaining features. This includes both irrelevant and redundant features. They then present an efficient algorithm that approximates the optimal feature selection criterion. The algorithm is effective in handling datasets with a large number of features and is tolerant to inconsistencies in the training data. It is also a filter algorithm, which is computationally efficient compared to wrapper methods that require searching through the space of feature subsets. The theoretical framework is based on the idea of minimizing the cross-entropy between the true conditional distribution and the approximate distribution. The authors show that features that are conditionally independent of the class given the remaining features can be safely removed without increasing the distance from the true distribution. They also introduce the concept of Markov blankets, which are sets of features that subsume the information of a given feature about the class. The algorithm uses a heuristic approach to find approximate Markov blankets for each feature, based on correlation measures. The authors compare their approach to other feature selection methods, such as filter and wrapper methods, and show that their method is more effective in reducing the feature space while maintaining classification accuracy. They test their algorithm on a variety of datasets, including artificial and real-world data, and show that it performs well in reducing the number of features while improving classification accuracy. The algorithm is also computationally efficient, making it suitable for high-dimensional datasets. The results show that the algorithm can significantly reduce the feature space and improve classification performance, especially in domains with many features. The authors conclude that their method provides a theoretically justified approach to feature selection that is effective in many learning tasks.This paper presents a method for feature subset selection based on Information Theory. The authors propose a theoretical framework for optimal feature selection, where the goal is to eliminate features that provide little or no additional information beyond what is already captured by the remaining features. This includes both irrelevant and redundant features. They then present an efficient algorithm that approximates the optimal feature selection criterion. The algorithm is effective in handling datasets with a large number of features and is tolerant to inconsistencies in the training data. It is also a filter algorithm, which is computationally efficient compared to wrapper methods that require searching through the space of feature subsets. The theoretical framework is based on the idea of minimizing the cross-entropy between the true conditional distribution and the approximate distribution. The authors show that features that are conditionally independent of the class given the remaining features can be safely removed without increasing the distance from the true distribution. They also introduce the concept of Markov blankets, which are sets of features that subsume the information of a given feature about the class. The algorithm uses a heuristic approach to find approximate Markov blankets for each feature, based on correlation measures. The authors compare their approach to other feature selection methods, such as filter and wrapper methods, and show that their method is more effective in reducing the feature space while maintaining classification accuracy. They test their algorithm on a variety of datasets, including artificial and real-world data, and show that it performs well in reducing the number of features while improving classification accuracy. The algorithm is also computationally efficient, making it suitable for high-dimensional datasets. The results show that the algorithm can significantly reduce the feature space and improve classification performance, especially in domains with many features. The authors conclude that their method provides a theoretically justified approach to feature selection that is effective in many learning tasks.
Reach us at info@futurestudyspace.com