Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models

Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models

18 May 2018 | Pouya Samangouei*, Maya Kabkab*, and Rama Chellappa
Defense-GAN is a novel defense mechanism against adversarial attacks in classification tasks, leveraging generative models to protect deep neural networks. The method uses a Generative Adversarial Network (GAN) to model the distribution of unperturbed images. At inference time, it projects input images onto the GAN's generator range to find a close output that does not contain adversarial changes, which is then fed to the classifier. This approach is effective against both black-box and white-box attacks and does not require modifying the classifier structure or training procedure. Defense-GAN can be applied to any classification model and is not dependent on knowledge of the adversarial generation process. Empirical results show that Defense-GAN consistently improves upon existing defense strategies across different attack methods. The method is implemented using TensorFlow and is publicly available at https://github.com/kabkabm/defensegan. The paper discusses various attack models, defense mechanisms, and GANs, and presents experimental results on benchmark image datasets, demonstrating the effectiveness of Defense-GAN in reducing adversarial noise and improving classification accuracy. The method is also shown to be effective in detecting adversarial examples through the use of reconstruction error metrics.Defense-GAN is a novel defense mechanism against adversarial attacks in classification tasks, leveraging generative models to protect deep neural networks. The method uses a Generative Adversarial Network (GAN) to model the distribution of unperturbed images. At inference time, it projects input images onto the GAN's generator range to find a close output that does not contain adversarial changes, which is then fed to the classifier. This approach is effective against both black-box and white-box attacks and does not require modifying the classifier structure or training procedure. Defense-GAN can be applied to any classification model and is not dependent on knowledge of the adversarial generation process. Empirical results show that Defense-GAN consistently improves upon existing defense strategies across different attack methods. The method is implemented using TensorFlow and is publicly available at https://github.com/kabkabm/defensegan. The paper discusses various attack models, defense mechanisms, and GANs, and presents experimental results on benchmark image datasets, demonstrating the effectiveness of Defense-GAN in reducing adversarial noise and improving classification accuracy. The method is also shown to be effective in detecting adversarial examples through the use of reconstruction error metrics.
Reach us at info@study.space