Understanding Black-box Predictions via Influence Functions

Understanding Black-box Predictions via Influence Functions

29 Dec 2020 | Pang Wei Koh, Percy Liang
This paper introduces influence functions as a method to understand the predictions of black-box machine learning models by tracing their predictions back to the training data. Influence functions, a technique from robust statistics, allow us to estimate how changes in training data affect model predictions without retraining the model from scratch. The method involves computing the effect of perturbing a training point on the model's parameters and the loss function at a test point. This approach is efficient and can be applied to various models, including linear models and convolutional neural networks. The paper demonstrates that influence functions can be used for multiple purposes: understanding model behavior, debugging models, detecting dataset errors, and creating adversarial training examples that can flip neural network predictions. Despite their theoretical foundations in convex and differentiable models, influence functions are shown to be effective even in non-convex and non-differentiable settings. The method involves computing the Hessian matrix and its inverse, which can be approximated using second-order optimization techniques. The paper also discusses the computational efficiency of influence functions, showing that they can be calculated using implicit Hessian-vector products and conjugate gradient methods. These techniques allow for efficient computation of influence functions even for large models with millions of parameters. The results show that influence functions provide accurate approximations of the effects of removing training points and perturbing training inputs, even when the model assumptions are not strictly met. The paper validates the effectiveness of influence functions through experiments on logistic regression and convolutional neural networks, showing that they closely match the results of leave-one-out retraining. Influence functions are also shown to be useful in identifying influential training points, debugging domain mismatch issues, and fixing mislabeled examples. The method is applied to various real-world scenarios, including medical data and email spam classification, demonstrating its practical utility. Overall, the paper highlights the importance of understanding how models are derived from their training data and provides a versatile tool for analyzing and improving machine learning models. The influence function approach offers a way to interpret model predictions, debug models, and enhance the robustness of machine learning systems.This paper introduces influence functions as a method to understand the predictions of black-box machine learning models by tracing their predictions back to the training data. Influence functions, a technique from robust statistics, allow us to estimate how changes in training data affect model predictions without retraining the model from scratch. The method involves computing the effect of perturbing a training point on the model's parameters and the loss function at a test point. This approach is efficient and can be applied to various models, including linear models and convolutional neural networks. The paper demonstrates that influence functions can be used for multiple purposes: understanding model behavior, debugging models, detecting dataset errors, and creating adversarial training examples that can flip neural network predictions. Despite their theoretical foundations in convex and differentiable models, influence functions are shown to be effective even in non-convex and non-differentiable settings. The method involves computing the Hessian matrix and its inverse, which can be approximated using second-order optimization techniques. The paper also discusses the computational efficiency of influence functions, showing that they can be calculated using implicit Hessian-vector products and conjugate gradient methods. These techniques allow for efficient computation of influence functions even for large models with millions of parameters. The results show that influence functions provide accurate approximations of the effects of removing training points and perturbing training inputs, even when the model assumptions are not strictly met. The paper validates the effectiveness of influence functions through experiments on logistic regression and convolutional neural networks, showing that they closely match the results of leave-one-out retraining. Influence functions are also shown to be useful in identifying influential training points, debugging domain mismatch issues, and fixing mislabeled examples. The method is applied to various real-world scenarios, including medical data and email spam classification, demonstrating its practical utility. Overall, the paper highlights the importance of understanding how models are derived from their training data and provides a versatile tool for analyzing and improving machine learning models. The influence function approach offers a way to interpret model predictions, debug models, and enhance the robustness of machine learning systems.
Reach us at info@study.space
[slides] Understanding Black-box Predictions via Influence Functions | StudySpace