Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods

Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods

1 Nov 2017 | Nicholas Carlini David Wagner
The paper "Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods" by Nicholas Carlini and David Wagner from the University of California, Berkeley, explores the vulnerability of neural networks to adversarial examples and evaluates ten recent detection methods designed to identify these examples. The authors find that all of these methods can be defeated by constructing new loss functions, demonstrating that adversarial examples are significantly harder to detect than previously thought. They conclude that the properties believed to be intrinsic to adversarial examples are not and propose guidelines for evaluating future defenses. The study uses three threat models—generic attacks, white-box attacks, and black-box attacks—to assess the effectiveness of the defenses. The results show that many defenses are ineffective against white-box attacks, and even some defenses that work on simple datasets like MNIST fail on more complex datasets like CIFAR. The authors also highlight the importance of choosing the right loss function for defeating defenses and suggest that the transferability property of adversarial examples can be leveraged to improve detection methods.The paper "Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods" by Nicholas Carlini and David Wagner from the University of California, Berkeley, explores the vulnerability of neural networks to adversarial examples and evaluates ten recent detection methods designed to identify these examples. The authors find that all of these methods can be defeated by constructing new loss functions, demonstrating that adversarial examples are significantly harder to detect than previously thought. They conclude that the properties believed to be intrinsic to adversarial examples are not and propose guidelines for evaluating future defenses. The study uses three threat models—generic attacks, white-box attacks, and black-box attacks—to assess the effectiveness of the defenses. The results show that many defenses are ineffective against white-box attacks, and even some defenses that work on simple datasets like MNIST fail on more complex datasets like CIFAR. The authors also highlight the importance of choosing the right loss function for defeating defenses and suggest that the transferability property of adversarial examples can be leveraged to improve detection methods.
Reach us at info@study.space