Towards Evaluating the Robustness of Neural Networks

Towards Evaluating the Robustness of Neural Networks

22 Mar 2017 | Nicholas Carlini David Wagner
This paper evaluates the robustness of neural networks by introducing three new attack algorithms that successfully generate adversarial examples on both distilled and undistilled networks with 100% probability. The attacks are tailored to three distance metrics: $ L_0 $, $ L_2 $, and $ L_\infty $. These attacks are more effective than previous methods and can break defensive distillation, which was previously thought to significantly increase robustness. The authors also propose using high-confidence adversarial examples in a transferability test to evaluate defenses, showing that this test can break defensive distillation. The paper evaluates the effectiveness of these attacks on three standard datasets: MNIST, CIFAR-10, and ImageNet. The results show that defensive distillation does not eliminate adversarial examples and that the proposed attacks are more effective than existing methods. The paper also discusses the importance of evaluating the choice of objective function for finding adversarial examples, as it can significantly impact the efficacy of an attack. The authors conclude that their attacks provide a better baseline for evaluating candidate defenses and that defensive distillation does not provide strong security guarantees against these attacks.This paper evaluates the robustness of neural networks by introducing three new attack algorithms that successfully generate adversarial examples on both distilled and undistilled networks with 100% probability. The attacks are tailored to three distance metrics: $ L_0 $, $ L_2 $, and $ L_\infty $. These attacks are more effective than previous methods and can break defensive distillation, which was previously thought to significantly increase robustness. The authors also propose using high-confidence adversarial examples in a transferability test to evaluate defenses, showing that this test can break defensive distillation. The paper evaluates the effectiveness of these attacks on three standard datasets: MNIST, CIFAR-10, and ImageNet. The results show that defensive distillation does not eliminate adversarial examples and that the proposed attacks are more effective than existing methods. The paper also discusses the importance of evaluating the choice of objective function for finding adversarial examples, as it can significantly impact the efficacy of an attack. The authors conclude that their attacks provide a better baseline for evaluating candidate defenses and that defensive distillation does not provide strong security guarantees against these attacks.
Reach us at info@study.space
[slides] Towards Evaluating the Robustness of Neural Networks | StudySpace