ADVERSARIAL TRAINING ON PURIFICATION (AToP): ADVANCING BOTH ROBUSTNESS AND GENERALIZATION

ADVERSARIAL TRAINING ON PURIFICATION (AToP): ADVANCING BOTH ROBUSTNESS AND GENERALIZATION

23 Aug 2024 | Guang Lin, Chao Li, Jianhai Zhang, Toshihisa Tanaka, Qibin Zhao
The paper introduces a novel defense technique called Adversarial Training on Purification (ATop), which combines adversarial training (AT) and adversarial purification (AP) to enhance the robustness and generalization of deep neural networks against adversarial attacks. ATop consists of two main components: perturbation destruction by random transforms (RT) and fine-tuning the purifier model using adversarial loss (FT). RTs, including binary masks, Gaussian noise, and repeated transformations, are used to destruct adversarial perturbations, while FT uses the classifier model's output to fine-tune the purifier model, ensuring accurate classification of both clean and adversarial examples. The method is evaluated on CIFAR-10, CIFAR-100, and ImageNet datasets, demonstrating superior robustness and generalization against various attacks, including FGSM, PGD, CW, AutoAttack, and StAdv. ATop outperforms existing methods in both standard accuracy and robust accuracy, achieving state-of-the-art results and showing strong resistance to unseen attacks. The code for ATop is available at https://github.com/glin2022/atop.The paper introduces a novel defense technique called Adversarial Training on Purification (ATop), which combines adversarial training (AT) and adversarial purification (AP) to enhance the robustness and generalization of deep neural networks against adversarial attacks. ATop consists of two main components: perturbation destruction by random transforms (RT) and fine-tuning the purifier model using adversarial loss (FT). RTs, including binary masks, Gaussian noise, and repeated transformations, are used to destruct adversarial perturbations, while FT uses the classifier model's output to fine-tune the purifier model, ensuring accurate classification of both clean and adversarial examples. The method is evaluated on CIFAR-10, CIFAR-100, and ImageNet datasets, demonstrating superior robustness and generalization against various attacks, including FGSM, PGD, CW, AutoAttack, and StAdv. ATop outperforms existing methods in both standard accuracy and robust accuracy, achieving state-of-the-art results and showing strong resistance to unseen attacks. The code for ATop is available at https://github.com/glin2022/atop.
Reach us at info@study.space
[slides] Adversarial Training on Purification (AToP)%3A Advancing Both Robustness and Generalization | StudySpace