Understanding Towards Evaluating the Robustness of Neural Networks

This paper evaluates the robustness of neural networks against adversarial examples, which are inputs that are slightly altered to cause incorrect classifications. Defensive distillation is a recently proposed method to enhance the robustness of neural networks by reducing the success rate of adversarial attacks from 95% to 0.5%. However, the authors of this paper demonstrate that defensive distillation does not significantly increase the robustness of neural networks by introducing three new attack algorithms that are successful on both distilled and undistilled neural networks with 100% probability. These attacks are tailored to three distance metrics used in the literature: $L_0$, $L_2$, and $L_\infty$. The authors also propose using high-confidence adversarial examples in a simple transferability test to evaluate the robustness of defenses, showing that this test breaks defensive distillation. The paper evaluates these attacks on three standard datasets: MNIST, CIFAR-10, and ImageNet. The results demonstrate that defensive distillation provides little security benefit over un-distilled networks and that the proposed attacks can be used as a benchmark for future defense attempts to create neural networks that resist adversarial examples.This paper evaluates the robustness of neural networks against adversarial examples, which are inputs that are slightly altered to cause incorrect classifications. Defensive distillation is a recently proposed method to enhance the robustness of neural networks by reducing the success rate of adversarial attacks from 95% to 0.5%. However, the authors of this paper demonstrate that defensive distillation does not significantly increase the robustness of neural networks by introducing three new attack algorithms that are successful on both distilled and undistilled neural networks with 100% probability. These attacks are tailored to three distance metrics used in the literature: $L_0$, $L_2$, and $L_\infty$. The authors also propose using high-confidence adversarial examples in a simple transferability test to evaluate the robustness of defenses, showing that this test breaks defensive distillation. The paper evaluates these attacks on three standard datasets: MNIST, CIFAR-10, and ImageNet. The results demonstrate that defensive distillation provides little security benefit over un-distilled networks and that the proposed attacks can be used as a benchmark for future defense attempts to create neural networks that resist adversarial examples.

Towards Evaluating the Robustness of Neural Networks

22 Mar 2017 | Nicholas Carlini David Wagner