[slides and audio] A Unified Approach to Interpreting Model Predictions

The paper introduces a unified framework for interpreting model predictions, called SHAP (Shapley Additive exPlanations). This framework aims to address the tension between model accuracy and interpretability, which is particularly acute in complex models like ensemble or deep learning models. SHAP assigns each feature an importance value for a specific prediction, unifying six existing methods and providing a unique solution with desirable properties: local accuracy, missingness, and consistency. The paper also proposes new SHAP value estimation methods, including Kernel SHAP and Deep SHAP, which are more computationally efficient and better aligned with human intuition compared to previous approaches. Experimental results show that SHAP values provide more accurate and consistent feature importance estimates, as demonstrated through user studies and comparisons with other methods like LIME and DeepLIFT. The framework's ability to unify and improve upon existing methods suggests a promising direction for future research in model interpretation.The paper introduces a unified framework for interpreting model predictions, called SHAP (Shapley Additive exPlanations). This framework aims to address the tension between model accuracy and interpretability, which is particularly acute in complex models like ensemble or deep learning models. SHAP assigns each feature an importance value for a specific prediction, unifying six existing methods and providing a unique solution with desirable properties: local accuracy, missingness, and consistency. The paper also proposes new SHAP value estimation methods, including Kernel SHAP and Deep SHAP, which are more computationally efficient and better aligned with human intuition compared to previous approaches. Experimental results show that SHAP values provide more accurate and consistent feature importance estimates, as demonstrated through user studies and comparisons with other methods like LIME and DeepLIFT. The framework's ability to unify and improve upon existing methods suggests a promising direction for future research in model interpretation.

A Unified Approach to Interpreting Model Predictions

25 Nov 2017 | Scott M. Lundberg, Su-In Lee