Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods

Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods

1 Nov 2017 | Nicholas Carlini David Wagner
Adversarial examples are not easily detected: bypassing ten detection methods. Nicholas Carlini and David Wagner, University of California, Berkeley. Neural networks are vulnerable to adversarial examples, which are inputs close to natural inputs but classified incorrectly. This paper surveys ten recent detection methods and shows they can all be defeated by constructing new loss functions. Adversarial examples are harder to detect than previously thought, and properties believed to be intrinsic are not. The authors propose guidelines for evaluating future defenses. Recent work has focused on detecting adversarial examples rather than defending against them. The authors study ten detection schemes from seven papers and compare their effectiveness. They show that all can be evaded by an adversary targeting the specific defense. On simple datasets, attacks slightly increase distortion, but on complex datasets, adversarial examples remain indistinguishable from original images. The authors challenge the assumption that adversarial examples have intrinsic differences from natural images. They evaluate defenses under three threat models: zero-knowledge, perfect-knowledge, and limited-knowledge. Under the zero-knowledge model, six of ten defenses are less effective than believed. Under the perfect-knowledge model, five defenses provide no increase in robustness, three increase slightly, and two increase on simple datasets. The authors use a special attacker-loss function to optimize for evasion. The transferability property allows adversarial examples to work even when the adversary does not know the defense's model parameters. The authors suggest better ways to evaluate potential defenses and provide recommendations for evaluating defenses. The paper discusses various detection methods, including secondary classification, PCA, and distributional detection. The authors show that many defenses are ineffective, and that adversarial examples can be constructed to evade them. They conclude that adversarial examples are not easily detected and that current defenses are not robust. The authors provide a baseline for evaluating future defenses and suggest that better methods are needed.Adversarial examples are not easily detected: bypassing ten detection methods. Nicholas Carlini and David Wagner, University of California, Berkeley. Neural networks are vulnerable to adversarial examples, which are inputs close to natural inputs but classified incorrectly. This paper surveys ten recent detection methods and shows they can all be defeated by constructing new loss functions. Adversarial examples are harder to detect than previously thought, and properties believed to be intrinsic are not. The authors propose guidelines for evaluating future defenses. Recent work has focused on detecting adversarial examples rather than defending against them. The authors study ten detection schemes from seven papers and compare their effectiveness. They show that all can be evaded by an adversary targeting the specific defense. On simple datasets, attacks slightly increase distortion, but on complex datasets, adversarial examples remain indistinguishable from original images. The authors challenge the assumption that adversarial examples have intrinsic differences from natural images. They evaluate defenses under three threat models: zero-knowledge, perfect-knowledge, and limited-knowledge. Under the zero-knowledge model, six of ten defenses are less effective than believed. Under the perfect-knowledge model, five defenses provide no increase in robustness, three increase slightly, and two increase on simple datasets. The authors use a special attacker-loss function to optimize for evasion. The transferability property allows adversarial examples to work even when the adversary does not know the defense's model parameters. The authors suggest better ways to evaluate potential defenses and provide recommendations for evaluating defenses. The paper discusses various detection methods, including secondary classification, PCA, and distributional detection. The authors show that many defenses are ineffective, and that adversarial examples can be constructed to evade them. They conclude that adversarial examples are not easily detected and that current defenses are not robust. The authors provide a baseline for evaluating future defenses and suggest that better methods are needed.
Reach us at info@study.space