2018 | Anish Athalye, Nicholas Carlini, David Wagner
Obfuscated gradients, a form of gradient masking, create a false sense of security in defenses against adversarial examples. While defenses that cause obfuscated gradients appear to resist iterative optimization-based attacks, they can be circumvented. The paper identifies three types of obfuscated gradients: shattered gradients (nonexistent or incorrect gradients due to non-differentiable operations or numerical instability), stochastic gradients (randomized defenses leading to randomized gradients), and vanishing/exploding gradients (deep networks causing unusable gradients). The authors develop attack techniques to overcome these, including Backward Pass Differentiable Approximation (BPDA) for shattered gradients, Expectation Over Transformation (EOT) for stochastic gradients, and reparameterization for vanishing/exploding gradients.
In a case study of ICLR 2018 non-certified white-box-secure defenses, seven out of nine defenses rely on obfuscated gradients. The authors' new attacks successfully circumvent six of them completely and one partially under the original threat model. The paper also highlights the importance of defining realistic threat models, making specific and testable claims, and evaluating defenses against adaptive attacks. The findings suggest that many defenses are vulnerable to attacks that bypass obfuscated gradients, emphasizing the need for more robust and reliable defense mechanisms against adversarial examples.Obfuscated gradients, a form of gradient masking, create a false sense of security in defenses against adversarial examples. While defenses that cause obfuscated gradients appear to resist iterative optimization-based attacks, they can be circumvented. The paper identifies three types of obfuscated gradients: shattered gradients (nonexistent or incorrect gradients due to non-differentiable operations or numerical instability), stochastic gradients (randomized defenses leading to randomized gradients), and vanishing/exploding gradients (deep networks causing unusable gradients). The authors develop attack techniques to overcome these, including Backward Pass Differentiable Approximation (BPDA) for shattered gradients, Expectation Over Transformation (EOT) for stochastic gradients, and reparameterization for vanishing/exploding gradients.
In a case study of ICLR 2018 non-certified white-box-secure defenses, seven out of nine defenses rely on obfuscated gradients. The authors' new attacks successfully circumvent six of them completely and one partially under the original threat model. The paper also highlights the importance of defining realistic threat models, making specific and testable claims, and evaluating defenses against adaptive attacks. The findings suggest that many defenses are vulnerable to attacks that bypass obfuscated gradients, emphasizing the need for more robust and reliable defense mechanisms against adversarial examples.