| AVRIM L. BLUM (AVRIM@CS.CMU.EDU), PAT LANGLEY° (LANGLEY@ISLE.ORG)
This survey reviews methods in machine learning for handling datasets with large amounts of irrelevant information, focusing on feature selection and example selection. It discusses advances in empirical and theoretical work, presenting a general framework for comparing methods. The paper addresses two key issues: selecting relevant features and selecting relevant examples. It concludes with challenges for future research.
The paper begins by discussing the problem of irrelevant features in concept learning, which involves deciding which features to use and how to combine them. It highlights the importance of selecting relevant features to reduce the number of training examples needed for good performance. It discusses the limitations of simple methods like nearest neighbor, which can be affected by irrelevant features, and contrasts them with methods that explicitly select relevant features.
The paper defines several notions of relevance, including relevance to the target concept, relevance to the sample/distribution, and relevance as a complexity measure. It also introduces the concept of incremental usefulness, which measures the usefulness of a feature based on its impact on the learning algorithm's performance.
The paper then discusses feature selection as heuristic search, describing different approaches such as embedded, filter, and wrapper methods. Embedded methods integrate feature selection within the induction algorithm, while filter methods select features before induction, and wrapper methods use the induction algorithm to evaluate feature subsets.
The paper also discusses feature weighting methods, which assign weights to features based on their relevance. It presents the WINNOW algorithm, which uses multiplicative updates to adjust weights based on errors, and shows that it performs well in the presence of irrelevant features.
The paper concludes by discussing the challenges and future directions in feature selection, emphasizing the need for further research on both empirical and theoretical aspects. It highlights the importance of selecting relevant features and examples in machine learning to improve the performance of learning algorithms.This survey reviews methods in machine learning for handling datasets with large amounts of irrelevant information, focusing on feature selection and example selection. It discusses advances in empirical and theoretical work, presenting a general framework for comparing methods. The paper addresses two key issues: selecting relevant features and selecting relevant examples. It concludes with challenges for future research.
The paper begins by discussing the problem of irrelevant features in concept learning, which involves deciding which features to use and how to combine them. It highlights the importance of selecting relevant features to reduce the number of training examples needed for good performance. It discusses the limitations of simple methods like nearest neighbor, which can be affected by irrelevant features, and contrasts them with methods that explicitly select relevant features.
The paper defines several notions of relevance, including relevance to the target concept, relevance to the sample/distribution, and relevance as a complexity measure. It also introduces the concept of incremental usefulness, which measures the usefulness of a feature based on its impact on the learning algorithm's performance.
The paper then discusses feature selection as heuristic search, describing different approaches such as embedded, filter, and wrapper methods. Embedded methods integrate feature selection within the induction algorithm, while filter methods select features before induction, and wrapper methods use the induction algorithm to evaluate feature subsets.
The paper also discusses feature weighting methods, which assign weights to features based on their relevance. It presents the WINNOW algorithm, which uses multiplicative updates to adjust weights based on errors, and shows that it performs well in the presence of irrelevant features.
The paper concludes by discussing the challenges and future directions in feature selection, emphasizing the need for further research on both empirical and theoretical aspects. It highlights the importance of selecting relevant features and examples in machine learning to improve the performance of learning algorithms.