A Review of Feature Selection Methods Based on Mutual Information

A Review of Feature Selection Methods Based on Mutual Information

| Jorge R. Vergara · Pablo A. Estévez
This paper reviews the state of the art of information-theoretic feature selection methods. It defines key concepts such as feature relevance, redundancy, complementarity (synergy), and the Markov blanket. The problem of optimal feature selection is introduced, and a unifying theoretical framework is presented to explain the approximations made by various methods. The paper discusses the advantages and drawbacks of different feature selection approaches, including wrapper, embedded, and filter methods. It highlights the importance of mutual information (MI) as a measure of statistical independence, which can capture nonlinear relationships and is invariant under certain transformations. The paper also presents several open problems in the field, including the need for a more unified framework, improving the efficiency of feature selection in high-dimensional spaces, and further investigating the relationship between MI and Bayes error classification. Additionally, it explores the impact of finite samples on statistical criteria and MI estimation, and the need for new criteria of statistical dependence beyond correlation and MI. The paper concludes that modern feature selection methods must go beyond relevance and redundancy to include complementarity, and that a unifying framework can help retrofit successful heuristic criteria.This paper reviews the state of the art of information-theoretic feature selection methods. It defines key concepts such as feature relevance, redundancy, complementarity (synergy), and the Markov blanket. The problem of optimal feature selection is introduced, and a unifying theoretical framework is presented to explain the approximations made by various methods. The paper discusses the advantages and drawbacks of different feature selection approaches, including wrapper, embedded, and filter methods. It highlights the importance of mutual information (MI) as a measure of statistical independence, which can capture nonlinear relationships and is invariant under certain transformations. The paper also presents several open problems in the field, including the need for a more unified framework, improving the efficiency of feature selection in high-dimensional spaces, and further investigating the relationship between MI and Bayes error classification. Additionally, it explores the impact of finite samples on statistical criteria and MI estimation, and the need for new criteria of statistical dependence beyond correlation and MI. The paper concludes that modern feature selection methods must go beyond relevance and redundancy to include complementarity, and that a unifying framework can help retrofit successful heuristic criteria.
Reach us at info@study.space