This paper presents an analysis and extensions of the RELIEF algorithm for attribute estimation in machine learning. RELIEF is an efficient algorithm for estimating attribute quality, originally designed for two-class problems and capable of handling both discrete and continuous attributes. The paper extends RELIEF to handle noisy, incomplete, and multi-class data sets. The extensions are verified on various artificial and real-world data sets.
The original RELIEF algorithm estimates attribute quality based on how well their values distinguish between instances. It finds the nearest neighbor from the same class (nearest hit) and the nearest neighbor from a different class (nearest miss). The algorithm calculates the difference in probabilities of attribute values between these two neighbors.
The paper extends RELIEF to k-nearest neighbors, allowing for more reliable probability estimation. It also extends RELIEF to handle missing data by defining different methods for calculating differences between instances with missing values. These methods include RELIEF-B, RELIEF-C, and RELIEF-D, with RELIEF-D showing the best performance on incomplete data.
The paper further extends RELIEF to handle multi-class problems. Two extensions are proposed: RELIEF-E and RELIEF-F. RELIEF-F is a more effective method for multi-class problems, as it considers the contribution of each class separately and averages their impact on attribute estimation.
The paper evaluates the performance of these extensions on various data sets, including artificial and real-world data. The results show that RELIEF-F performs well on both noise-free and noisy data, and is particularly effective for multi-class problems. The paper also discusses the limitations of information gain and gini-index in handling multi-valued attributes, and suggests normalization techniques to address this issue.
The paper concludes that RELIEF is a promising heuristic for attribute estimation, especially in the presence of dependencies and noise. It can be used to guide the search in inductive learning algorithms, and its extensions make it suitable for handling multi-class problems. The paper also highlights the importance of using appropriate normalization techniques for multi-valued attributes.This paper presents an analysis and extensions of the RELIEF algorithm for attribute estimation in machine learning. RELIEF is an efficient algorithm for estimating attribute quality, originally designed for two-class problems and capable of handling both discrete and continuous attributes. The paper extends RELIEF to handle noisy, incomplete, and multi-class data sets. The extensions are verified on various artificial and real-world data sets.
The original RELIEF algorithm estimates attribute quality based on how well their values distinguish between instances. It finds the nearest neighbor from the same class (nearest hit) and the nearest neighbor from a different class (nearest miss). The algorithm calculates the difference in probabilities of attribute values between these two neighbors.
The paper extends RELIEF to k-nearest neighbors, allowing for more reliable probability estimation. It also extends RELIEF to handle missing data by defining different methods for calculating differences between instances with missing values. These methods include RELIEF-B, RELIEF-C, and RELIEF-D, with RELIEF-D showing the best performance on incomplete data.
The paper further extends RELIEF to handle multi-class problems. Two extensions are proposed: RELIEF-E and RELIEF-F. RELIEF-F is a more effective method for multi-class problems, as it considers the contribution of each class separately and averages their impact on attribute estimation.
The paper evaluates the performance of these extensions on various data sets, including artificial and real-world data. The results show that RELIEF-F performs well on both noise-free and noisy data, and is particularly effective for multi-class problems. The paper also discusses the limitations of information gain and gini-index in handling multi-valued attributes, and suggests normalization techniques to address this issue.
The paper concludes that RELIEF is a promising heuristic for attribute estimation, especially in the presence of dependencies and noise. It can be used to guide the search in inductive learning algorithms, and its extensions make it suitable for handling multi-class problems. The paper also highlights the importance of using appropriate normalization techniques for multi-valued attributes.