8 May 2018 | Fangzhou Liao*, Ming Liang*, Yinpeng Dong, Tianyu Pang, Xiaolin Hu† Jun Zhu
This paper proposes a defense method called High-Level Representation Guided Denoiser (HGD) against adversarial attacks in image classification. Adversarial examples, which are slightly perturbed versions of clean images, can mislead neural networks. Standard denoisers fail to suppress adversarial noise effectively due to the error amplification effect, where small perturbations are amplified in high-level features, leading to incorrect classifications. HGD addresses this by using a loss function based on the difference in outputs of the target model when activated by clean and denoised images. This approach improves robustness against both white-box and black-box attacks, requires less training data and time, and can be transferred to defend other models.
Compared to ensemble adversarial training, HGD has three main advantages: it is more robust to adversarial attacks, requires less training data and time, and can be transferred to defend other models. HGD was tested in the NIPS adversarial defense competition, where it achieved first place and outperformed other methods significantly. The method is also transferable across different models and classes, as demonstrated in experiments.
HGD is based on a denoising approach that uses high-level features to guide the training of a denoiser. It includes two variants: feature-guided denoiser (FGD) and logits-guided denoiser (LGD). FGD uses high-level features for supervision, while LGD uses the output of the target model's final layer (logits) as the loss function. Both variants outperform standard denoisers in terms of robustness against adversarial attacks.
The paper also evaluates the effectiveness of HGD on different datasets and models, showing that it can defend against both white-box and black-box attacks. HGD is more efficient than adversarial training, requiring less training data and time, and is effective in reducing adversarial noise. The results show that HGD is a promising defense method against adversarial attacks, with good generalization and transferability.This paper proposes a defense method called High-Level Representation Guided Denoiser (HGD) against adversarial attacks in image classification. Adversarial examples, which are slightly perturbed versions of clean images, can mislead neural networks. Standard denoisers fail to suppress adversarial noise effectively due to the error amplification effect, where small perturbations are amplified in high-level features, leading to incorrect classifications. HGD addresses this by using a loss function based on the difference in outputs of the target model when activated by clean and denoised images. This approach improves robustness against both white-box and black-box attacks, requires less training data and time, and can be transferred to defend other models.
Compared to ensemble adversarial training, HGD has three main advantages: it is more robust to adversarial attacks, requires less training data and time, and can be transferred to defend other models. HGD was tested in the NIPS adversarial defense competition, where it achieved first place and outperformed other methods significantly. The method is also transferable across different models and classes, as demonstrated in experiments.
HGD is based on a denoising approach that uses high-level features to guide the training of a denoiser. It includes two variants: feature-guided denoiser (FGD) and logits-guided denoiser (LGD). FGD uses high-level features for supervision, while LGD uses the output of the target model's final layer (logits) as the loss function. Both variants outperform standard denoisers in terms of robustness against adversarial attacks.
The paper also evaluates the effectiveness of HGD on different datasets and models, showing that it can defend against both white-box and black-box attacks. HGD is more efficient than adversarial training, requiring less training data and time, and is effective in reducing adversarial noise. The results show that HGD is a promising defense method against adversarial attacks, with good generalization and transferability.