Interpretation of Neural Networks is Fragile

Interpretation of Neural Networks is Fragile

6 Nov 2018 | Amirata Ghorbani*, Abubakar Abid*, James Zou†
The paper "Interpretation of Neural Networks is Fragile" by Amirata Ghorbani explores the fragility of interpretations generated by neural networks, particularly in the context of feature importance maps and influence functions. The authors demonstrate that small, systematic perturbations to input data can lead to significantly different interpretations without changing the predicted label. They systematically evaluate the robustness of various interpretation methods (feature importance maps, integrated gradients, DeepLIFT, and influence functions) on the ImageNet and CIFAR-10 datasets. The results show that these methods are susceptible to adversarial attacks, where imperceptible perturbations can alter the interpretation while preserving the prediction. The paper also provides insights into the geometry of the Hessian matrix, explaining why current interpretation approaches struggle with robustness. The findings highlight the need for more robust interpretation methods to ensure trust and transparency in machine learning applications.The paper "Interpretation of Neural Networks is Fragile" by Amirata Ghorbani explores the fragility of interpretations generated by neural networks, particularly in the context of feature importance maps and influence functions. The authors demonstrate that small, systematic perturbations to input data can lead to significantly different interpretations without changing the predicted label. They systematically evaluate the robustness of various interpretation methods (feature importance maps, integrated gradients, DeepLIFT, and influence functions) on the ImageNet and CIFAR-10 datasets. The results show that these methods are susceptible to adversarial attacks, where imperceptible perturbations can alter the interpretation while preserving the prediction. The paper also provides insights into the geometry of the Hessian matrix, explaining why current interpretation approaches struggle with robustness. The findings highlight the need for more robust interpretation methods to ensure trust and transparency in machine learning applications.
Reach us at info@study.space
[slides and audio] Interpretation of Neural Networks is Fragile