Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks

Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks

18–21 February 2018, San Diego, CA, USA | Weilin Xu, David Evans, Yanjun Qi
Feature squeezing is a method to detect adversarial examples in deep neural networks (DNNs) by reducing the input space available to adversaries. This approach involves coalescing samples with many different feature vectors into a single sample, thereby reducing the search space for adversarial perturbations. By comparing a DNN's predictions on original and squeezed inputs, feature squeezing can detect adversarial examples with high accuracy and few false positives. Two methods are explored: reducing color bit depth and spatial smoothing. These methods are inexpensive and complementary to other defenses, and can be combined in a joint detection framework to achieve high detection rates against state-of-the-art attacks. The paper evaluates the effectiveness of feature squeezing against eleven different adversarial attacks, showing that it significantly enhances model robustness while preserving accuracy on legitimate inputs. Feature squeezing is particularly effective against $ L_0 $ attacks, such as JSMA and $ CW_0 $, due to its ability to reduce pixel variations and eliminate salt-and-pepper noise. The method is also effective against $ L_\infty $ and $ L_2 $ attacks, especially when combined with spatial smoothing. Experiments show that feature squeezing achieves high detection rates for static adversarial examples, with 98% detection on MNIST and 85% on CIFAR-10 and ImageNet, with low false positive rates. While feature squeezing may not be foolproof against adaptive adversaries, it significantly complicates their task even with full knowledge of the model and defense. The method is less expensive and more accurate than previous defenses and can be combined with other techniques like adversarial training to enhance robustness. The results demonstrate that feature squeezing is a promising approach for detecting adversarial examples in DNNs.Feature squeezing is a method to detect adversarial examples in deep neural networks (DNNs) by reducing the input space available to adversaries. This approach involves coalescing samples with many different feature vectors into a single sample, thereby reducing the search space for adversarial perturbations. By comparing a DNN's predictions on original and squeezed inputs, feature squeezing can detect adversarial examples with high accuracy and few false positives. Two methods are explored: reducing color bit depth and spatial smoothing. These methods are inexpensive and complementary to other defenses, and can be combined in a joint detection framework to achieve high detection rates against state-of-the-art attacks. The paper evaluates the effectiveness of feature squeezing against eleven different adversarial attacks, showing that it significantly enhances model robustness while preserving accuracy on legitimate inputs. Feature squeezing is particularly effective against $ L_0 $ attacks, such as JSMA and $ CW_0 $, due to its ability to reduce pixel variations and eliminate salt-and-pepper noise. The method is also effective against $ L_\infty $ and $ L_2 $ attacks, especially when combined with spatial smoothing. Experiments show that feature squeezing achieves high detection rates for static adversarial examples, with 98% detection on MNIST and 85% on CIFAR-10 and ImageNet, with low false positive rates. While feature squeezing may not be foolproof against adaptive adversaries, it significantly complicates their task even with full knowledge of the model and defense. The method is less expensive and more accurate than previous defenses and can be combined with other techniques like adversarial training to enhance robustness. The results demonstrate that feature squeezing is a promising approach for detecting adversarial examples in DNNs.
Reach us at info@study.space