11 July 2008 | Carolin Strobl, Anne-Laure Boulesteix, Thomas Kneib, Thomas Augustin, Achim Zeileis
The article by Strobl et al. addresses the issue of variable importance measures in random forests, particularly focusing on the bias towards correlated predictor variables. The authors identify two mechanisms responsible for this bias: the preference for correlated predictors during tree building and the additional advantage given to these variables by the unconditional permutation scheme used in variable importance computation. To address these issues, they propose a new conditional permutation scheme that better reflects the true impact of each predictor variable. This scheme uses the partition of the feature space induced by the fitted model as a conditioning grid, allowing for a more accurate assessment of variable importance. The authors demonstrate the effectiveness of their proposed method through simulations and an application to peptide-binding data, showing that it provides a more reliable ranking of predictor variables, especially in the presence of correlated variables. The conditional permutation importance measure is made available in the party package for recursive partitioning in R.The article by Strobl et al. addresses the issue of variable importance measures in random forests, particularly focusing on the bias towards correlated predictor variables. The authors identify two mechanisms responsible for this bias: the preference for correlated predictors during tree building and the additional advantage given to these variables by the unconditional permutation scheme used in variable importance computation. To address these issues, they propose a new conditional permutation scheme that better reflects the true impact of each predictor variable. This scheme uses the partition of the feature space induced by the fitted model as a conditioning grid, allowing for a more accurate assessment of variable importance. The authors demonstrate the effectiveness of their proposed method through simulations and an application to peptide-binding data, showing that it provides a more reliable ranking of predictor variables, especially in the presence of correlated variables. The conditional permutation importance measure is made available in the party package for recursive partitioning in R.