2020 January ; 2(1): 56–67 | Scott M. Lundberg, Gabriel Erion, Hugh Chen, Alex DeGrave, Jordan M. Prutkin, Bala Nair, Ronit Katz, Jonathan Himmelfarb, Nisha Bansal, Su-In Lee
The paper "From Local Explanations to Global Understanding with Explainable AI for Trees" by Scott M. Lundberg et al. addresses the challenge of improving the interpretability of tree-based machine learning models, such as random forests, decision trees, and gradient boosted trees. The authors introduce TreeExplainer, a novel method that enables the computation of optimal local explanations for these models based on game theory. Key contributions include:
1. **Exact Computation of Shapley Values**: TreeExplainer provides an exact algorithm to compute Shapley values, which are optimal for measuring feature importance in tree-based models. This method guarantees theoretical guarantees of local accuracy and consistency.
2. **Local Interaction Effects**: TreeExplainer extends local explanations to capture interaction effects among features, providing a richer understanding of model behavior.
3. **Global Model Structure Understanding**: By combining many local explanations, TreeExplainer enables the interpretation of global model structure, retaining local faithfulness to the original model.
The paper demonstrates the effectiveness of TreeExplainer through three medical datasets: mortality risk, chronic kidney disease, and hospital procedure duration. It highlights how TreeExplainer can identify high-magnitude but low-frequency non-linear mortality risk factors, highlight distinct population subgroups with shared risk characteristics, and monitor machine learning models deployed in hospitals by identifying which features are degrading the model's performance over time.
The authors also discuss the advantages of tree-based models, emphasizing their accuracy and interpretability, and provide a comprehensive evaluation of TreeExplainer using various metrics. The paper concludes by discussing the broader implications of improved interpretability in regulated domains such as healthcare, finance, and public services, and the potential for enhancing human-AI collaboration and model development, debugging, and monitoring.The paper "From Local Explanations to Global Understanding with Explainable AI for Trees" by Scott M. Lundberg et al. addresses the challenge of improving the interpretability of tree-based machine learning models, such as random forests, decision trees, and gradient boosted trees. The authors introduce TreeExplainer, a novel method that enables the computation of optimal local explanations for these models based on game theory. Key contributions include:
1. **Exact Computation of Shapley Values**: TreeExplainer provides an exact algorithm to compute Shapley values, which are optimal for measuring feature importance in tree-based models. This method guarantees theoretical guarantees of local accuracy and consistency.
2. **Local Interaction Effects**: TreeExplainer extends local explanations to capture interaction effects among features, providing a richer understanding of model behavior.
3. **Global Model Structure Understanding**: By combining many local explanations, TreeExplainer enables the interpretation of global model structure, retaining local faithfulness to the original model.
The paper demonstrates the effectiveness of TreeExplainer through three medical datasets: mortality risk, chronic kidney disease, and hospital procedure duration. It highlights how TreeExplainer can identify high-magnitude but low-frequency non-linear mortality risk factors, highlight distinct population subgroups with shared risk characteristics, and monitor machine learning models deployed in hospitals by identifying which features are degrading the model's performance over time.
The authors also discuss the advantages of tree-based models, emphasizing their accuracy and interpretability, and provide a comprehensive evaluation of TreeExplainer using various metrics. The paper concludes by discussing the broader implications of improved interpretability in regulated domains such as healthcare, finance, and public services, and the potential for enhancing human-AI collaboration and model development, debugging, and monitoring.