[slides and audio] Explainable AI%3A A Review of Machine Learning Interpretability Methods

This paper reviews and categorizes methods for explaining and interpreting machine learning models, particularly focusing on the field of Explainable Artificial Intelligence (XAI). The authors highlight the growing need for interpretable models in critical domains such as healthcare due to the increasing complexity of modern machine learning systems, which often operate as "black boxes." The paper provides a comprehensive taxonomy of interpretability methods, including both model-specific and model-agnostic approaches, and discusses their applications and limitations. Key methods discussed include gradient-based techniques for deep learning models, such as Gradients, Integrated Gradients, DeepLIFT, Guided BackPropagation, Deconvolution, Class Activation Maps (CAM), Grad-CAM, Grad-CAM++, Layer-wise Relevance Propagation (LRP), SmoothGrad, RISE, Concept Activation Vectors (CAVs), and Deep Taylor decomposition. For black-box models, methods like Local Interpretable Model-Agnostic Explanations (LIME), SHAP, Ancores, Contrastive Explanations (CEM), Counterfactual Explanations, ProtoDash, Permutation Importance (PIMP), L2X, Partial Dependence Plots (PDPs), and ICE plots are covered. The paper aims to serve as a reference for both researchers and practitioners by providing a detailed overview of existing methods and their programming implementations.This paper reviews and categorizes methods for explaining and interpreting machine learning models, particularly focusing on the field of Explainable Artificial Intelligence (XAI). The authors highlight the growing need for interpretable models in critical domains such as healthcare due to the increasing complexity of modern machine learning systems, which often operate as "black boxes." The paper provides a comprehensive taxonomy of interpretability methods, including both model-specific and model-agnostic approaches, and discusses their applications and limitations. Key methods discussed include gradient-based techniques for deep learning models, such as Gradients, Integrated Gradients, DeepLIFT, Guided BackPropagation, Deconvolution, Class Activation Maps (CAM), Grad-CAM, Grad-CAM++, Layer-wise Relevance Propagation (LRP), SmoothGrad, RISE, Concept Activation Vectors (CAVs), and Deep Taylor decomposition. For black-box models, methods like Local Interpretable Model-Agnostic Explanations (LIME), SHAP, Ancores, Contrastive Explanations (CEM), Counterfactual Explanations, ProtoDash, Permutation Importance (PIMP), L2X, Partial Dependence Plots (PDPs), and ICE plots are covered. The paper aims to serve as a reference for both researchers and practitioners by providing a detailed overview of existing methods and their programming implementations.

Explainable AI: A Review of Machine Learning Interpretability Methods

25 December 2020 | Pantelis Linardatos, Vasilis Papastefanopoulos, Sotiris Kotsiantis