Permutation importance: a corrected feature importance measure

Permutation importance: a corrected feature importance measure

April 12, 2010 | André Altmann*, Laura Toloşi*, Oliver Sander† and Thomas Lengauer
This paper introduces a method called permutation importance (PIMP) to correct the bias in feature importance measures used in machine learning models, particularly in the context of life sciences. The method is based on permutation tests to estimate the distribution of feature importance in a non-informative setting, allowing for the calculation of P-values that indicate the significance of each feature. The PIMP method is applied to both simulated and real-world datasets, demonstrating its effectiveness in correcting the bias in feature importance measures from random forest (RF) models and mutual information (MI) criteria. The method is shown to improve model interpretability by providing a corrected measure of feature importance, which helps in identifying truly informative variables. The PIMP method is also used to improve the performance of RF models by selecting only the significant features based on PIMP scores, leading to better prediction accuracy. The paper also discusses the application of PIMP to two real-world datasets: one for predicting C-to-U edited sites in plant mitochondrial RNA and another for determining the HIV coreceptor usage. The results show that PIMP significantly improves the accuracy of feature importance measures and enhances model interpretability. The PIMP method is general and can be applied to any learning method that assesses feature relevance, providing significance P-values for each predictor variable. The method is computationally efficient and can be parallelized to improve scalability. The paper concludes that PIMP is a valuable tool for correcting feature importance measures in machine learning models, particularly in the context of life sciences where interpretability is as important as prediction accuracy.This paper introduces a method called permutation importance (PIMP) to correct the bias in feature importance measures used in machine learning models, particularly in the context of life sciences. The method is based on permutation tests to estimate the distribution of feature importance in a non-informative setting, allowing for the calculation of P-values that indicate the significance of each feature. The PIMP method is applied to both simulated and real-world datasets, demonstrating its effectiveness in correcting the bias in feature importance measures from random forest (RF) models and mutual information (MI) criteria. The method is shown to improve model interpretability by providing a corrected measure of feature importance, which helps in identifying truly informative variables. The PIMP method is also used to improve the performance of RF models by selecting only the significant features based on PIMP scores, leading to better prediction accuracy. The paper also discusses the application of PIMP to two real-world datasets: one for predicting C-to-U edited sites in plant mitochondrial RNA and another for determining the HIV coreceptor usage. The results show that PIMP significantly improves the accuracy of feature importance measures and enhances model interpretability. The PIMP method is general and can be applied to any learning method that assesses feature relevance, providing significance P-values for each predictor variable. The method is computationally efficient and can be parallelized to improve scalability. The paper concludes that PIMP is a valuable tool for correcting feature importance measures in machine learning models, particularly in the context of life sciences where interpretability is as important as prediction accuracy.
Reach us at info@study.space
Understanding Permutation importance%3A a corrected feature importance measure