Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples

Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples

2018 | Anish Athalye, Nicholas Carlini, David Wagner
The paper identifies "obfuscated gradients," a form of gradient masking, as a phenomenon that creates a false sense of security in defenses against adversarial examples. While defenses that cause obfuscated gradients appear to defeat iterative optimization-based attacks, the authors find that these defenses can be circumvented. They describe characteristic behaviors of defenses exhibiting this effect and develop attack techniques to overcome three types of obfuscated gradients: shattered gradients, stochastic gradients, and vanishing/exploding gradients. In a case study examining non-certified white-box-secure defenses at ICLR 2018, they find that 7 out of 9 defenses rely on obfuscated gradients. Their new attacks successfully circumvent 6 of these defenses completely and 1 partially under the original threat model. The paper also provides an analysis of the evaluations performed in the papers and offers insights into common evaluation pitfalls to help future defenses avoid similar vulnerabilities.The paper identifies "obfuscated gradients," a form of gradient masking, as a phenomenon that creates a false sense of security in defenses against adversarial examples. While defenses that cause obfuscated gradients appear to defeat iterative optimization-based attacks, the authors find that these defenses can be circumvented. They describe characteristic behaviors of defenses exhibiting this effect and develop attack techniques to overcome three types of obfuscated gradients: shattered gradients, stochastic gradients, and vanishing/exploding gradients. In a case study examining non-certified white-box-secure defenses at ICLR 2018, they find that 7 out of 9 defenses rely on obfuscated gradients. Their new attacks successfully circumvent 6 of these defenses completely and 1 partially under the original threat model. The paper also provides an analysis of the evaluations performed in the papers and offers insights into common evaluation pitfalls to help future defenses avoid similar vulnerabilities.
Reach us at info@study.space