This paper presents a method for feature subset selection based on Information Theory. The authors propose a framework for defining the theoretically optimal but computationally intractable method for feature subset selection, aiming to eliminate features that provide little or no additional information beyond what is already captured by other features. They introduce an efficient algorithm that approximates the optimal feature selection criterion, focusing on irrelevant and redundant features. The algorithm is designed to be a filter method, avoiding the high computational cost of wrapper methods. The paper discusses the theoretical foundation of the approach, including the use of cross-entropy to measure the loss of predictive information during feature elimination. Empirical results on various datasets demonstrate the effectiveness of the algorithm in reducing the feature space and improving classification accuracy, particularly in high-dimensional datasets. The authors conclude that their method provides a robust and efficient solution for feature selection, making it suitable for large-scale learning tasks.This paper presents a method for feature subset selection based on Information Theory. The authors propose a framework for defining the theoretically optimal but computationally intractable method for feature subset selection, aiming to eliminate features that provide little or no additional information beyond what is already captured by other features. They introduce an efficient algorithm that approximates the optimal feature selection criterion, focusing on irrelevant and redundant features. The algorithm is designed to be a filter method, avoiding the high computational cost of wrapper methods. The paper discusses the theoretical foundation of the approach, including the use of cross-entropy to measure the loss of predictive information during feature elimination. Empirical results on various datasets demonstrate the effectiveness of the algorithm in reducing the feature space and improving classification accuracy, particularly in high-dimensional datasets. The authors conclude that their method provides a robust and efficient solution for feature selection, making it suitable for large-scale learning tasks.