25 Mar 2019 | Cihang Xie1,2*, Yuxin Wu2, Laurens van der Maaten2, Alan Yuille1, Kaiming He2
This paper proposes a feature denoising approach to improve the adversarial robustness of convolutional networks. Adversarial attacks on image classification systems introduce small perturbations that lead to incorrect predictions, despite being imperceptible to humans. The authors observe that these perturbations create significant noise in the feature maps of convolutional networks. To address this, they develop new network architectures with blocks that denoise features using non-local means or other filters. These networks are trained end-to-end with adversarial training, leading to substantial improvements in adversarial robustness.
The proposed method achieves 55.7% accuracy on ImageNet under 10-iteration PGD attacks, surpassing prior art by a large margin. Even under extreme 2000-iteration PGD attacks, the method achieves 42.6% accuracy. It also performs well in black-box settings, winning the CAAD 2018 competition with 50.6% accuracy against 48 unknown attackers. The method outperforms other approaches, including those using non-local means, bilateral filters, mean filters, and median filters.
The study shows that feature denoising improves adversarial robustness by reducing noise in feature maps, which can overwhelm true signals and lead to incorrect predictions. The authors find that non-local means, particularly the Gaussian version, perform best. They also show that feature denoising blocks are effective in both adversarial and non-adversarial settings, although they may not improve accuracy in clean settings.
The paper demonstrates that feature denoising is a promising approach for improving adversarial robustness in convolutional networks. The method is effective in both white-box and black-box attack settings, and the results suggest that feature denoising is a general approach that can be useful for adversarial robustness. The study also highlights the importance of proper feature combination in denoising blocks to ensure that useful signals are retained while noise is removed.This paper proposes a feature denoising approach to improve the adversarial robustness of convolutional networks. Adversarial attacks on image classification systems introduce small perturbations that lead to incorrect predictions, despite being imperceptible to humans. The authors observe that these perturbations create significant noise in the feature maps of convolutional networks. To address this, they develop new network architectures with blocks that denoise features using non-local means or other filters. These networks are trained end-to-end with adversarial training, leading to substantial improvements in adversarial robustness.
The proposed method achieves 55.7% accuracy on ImageNet under 10-iteration PGD attacks, surpassing prior art by a large margin. Even under extreme 2000-iteration PGD attacks, the method achieves 42.6% accuracy. It also performs well in black-box settings, winning the CAAD 2018 competition with 50.6% accuracy against 48 unknown attackers. The method outperforms other approaches, including those using non-local means, bilateral filters, mean filters, and median filters.
The study shows that feature denoising improves adversarial robustness by reducing noise in feature maps, which can overwhelm true signals and lead to incorrect predictions. The authors find that non-local means, particularly the Gaussian version, perform best. They also show that feature denoising blocks are effective in both adversarial and non-adversarial settings, although they may not improve accuracy in clean settings.
The paper demonstrates that feature denoising is a promising approach for improving adversarial robustness in convolutional networks. The method is effective in both white-box and black-box attack settings, and the results suggest that feature denoising is a general approach that can be useful for adversarial robustness. The study also highlights the importance of proper feature combination in denoising blocks to ensure that useful signals are retained while noise is removed.