This paper by Igor Kononenko focuses on estimating the quality of attributes in machine learning, particularly addressing the issue of strong dependencies among attributes. The author analyzes and extends the RELIEF algorithm, originally developed by Kira and Rendell, to handle noisy, incomplete, and multi-class data sets. Original RELIEF is efficient for estimating attributes in two-class problems but is limited to discrete and continuous attributes. The extensions aim to improve the algorithm's performance in various scenarios:
1. **Estimating Probabilities with RELIEF**: The paper describes the original RELIEF algorithm and introduces RELIEF-A, which uses k-nearest neighbours to improve the reliability of probability estimates. Experiments show that increasing the number of nearest neighbours generally improves estimates, especially in noisy data sets.
2. **Incomplete Data Sets**: Three versions of RELIEF (RELIEF-B, C, and D) are compared to handle incomplete data. RELIEF-D, which calculates the probability of different attribute values given class labels, performs best in noisy and incomplete data sets.
3. **Multi-Class Problems**: Two extensions of RELIEF-D (RELIEF-E and F) are proposed to handle multi-class problems. RELIEF-F, which averages the contributions of near misses from different classes, outperforms other versions in both noise-free and noisy data sets.
The paper concludes that RELIEF-F is a promising heuristic function for attribute estimation, capable of handling complex data sets and guiding the learning process more effectively. Experiments on both artificial and real-world data sets, including a medical data set, support these conclusions.This paper by Igor Kononenko focuses on estimating the quality of attributes in machine learning, particularly addressing the issue of strong dependencies among attributes. The author analyzes and extends the RELIEF algorithm, originally developed by Kira and Rendell, to handle noisy, incomplete, and multi-class data sets. Original RELIEF is efficient for estimating attributes in two-class problems but is limited to discrete and continuous attributes. The extensions aim to improve the algorithm's performance in various scenarios:
1. **Estimating Probabilities with RELIEF**: The paper describes the original RELIEF algorithm and introduces RELIEF-A, which uses k-nearest neighbours to improve the reliability of probability estimates. Experiments show that increasing the number of nearest neighbours generally improves estimates, especially in noisy data sets.
2. **Incomplete Data Sets**: Three versions of RELIEF (RELIEF-B, C, and D) are compared to handle incomplete data. RELIEF-D, which calculates the probability of different attribute values given class labels, performs best in noisy and incomplete data sets.
3. **Multi-Class Problems**: Two extensions of RELIEF-D (RELIEF-E and F) are proposed to handle multi-class problems. RELIEF-F, which averages the contributions of near misses from different classes, outperforms other versions in both noise-free and noisy data sets.
The paper concludes that RELIEF-F is a promising heuristic function for attribute estimation, capable of handling complex data sets and guiding the learning process more effectively. Experiments on both artificial and real-world data sets, including a medical data set, support these conclusions.