18–21 February 2018, San Diego, CA, USA | Weilin Xu, David Evans, Yanjun Qi
The paper introduces a new strategy called feature squeezing to detect adversarial examples in deep neural networks (DNNs). Adversarial examples are small but purposeful distortions added to natural inputs, which can cause DNNs to misclassify the input. Previous defenses against adversarial examples have either shown limited success or required expensive computation. Feature squeezing reduces the search space for adversaries by coalescing samples with similar feature vectors into a single sample. By comparing the DNN's prediction on the original input with that on the squeezed input, feature squeezing can detect adversarial examples with high accuracy and few false positives. The paper explores two feature squeezing methods: reducing the color bit depth of each pixel and spatial smoothing. These methods are simple, inexpensive, and complementary to other defenses. The authors demonstrate that feature squeezing significantly enhances the robustness of DNN models by accurately detecting adversarial examples while preserving accuracy on legitimate inputs. The method is effective against state-of-the-art attacks and can be combined with other defenses, such as adversarial training, to achieve high detection rates.The paper introduces a new strategy called feature squeezing to detect adversarial examples in deep neural networks (DNNs). Adversarial examples are small but purposeful distortions added to natural inputs, which can cause DNNs to misclassify the input. Previous defenses against adversarial examples have either shown limited success or required expensive computation. Feature squeezing reduces the search space for adversaries by coalescing samples with similar feature vectors into a single sample. By comparing the DNN's prediction on the original input with that on the squeezed input, feature squeezing can detect adversarial examples with high accuracy and few false positives. The paper explores two feature squeezing methods: reducing the color bit depth of each pixel and spatial smoothing. These methods are simple, inexpensive, and complementary to other defenses. The authors demonstrate that feature squeezing significantly enhances the robustness of DNN models by accurately detecting adversarial examples while preserving accuracy on legitimate inputs. The method is effective against state-of-the-art attacks and can be combined with other defenses, such as adversarial training, to achieve high detection rates.