Understanding Detecting Adversarial Samples from Artifacts

This paper explores the ability of deep neural networks (DNNs) to distinguish adversarial samples from normal and noisy inputs. The authors investigate model confidence on adversarial samples using Bayesian uncertainty estimates from dropout neural networks and density estimation in the subspace of deep features learned by the model. They propose a method for implicit adversarial detection that is independent of the attack algorithm. The method is evaluated on standard datasets such as MNIST and CIFAR-10, achieving an ROC-AUC of 85-93% on various classification tasks. The findings suggest that the proposed method can effectively detect adversarial samples across different architectures and attacks. The paper also discusses the background of DNNs, adversarial attacks, and the intuition behind the proposed features, providing a comprehensive overview of the research.This paper explores the ability of deep neural networks (DNNs) to distinguish adversarial samples from normal and noisy inputs. The authors investigate model confidence on adversarial samples using Bayesian uncertainty estimates from dropout neural networks and density estimation in the subspace of deep features learned by the model. They propose a method for implicit adversarial detection that is independent of the attack algorithm. The method is evaluated on standard datasets such as MNIST and CIFAR-10, achieving an ROC-AUC of 85-93% on various classification tasks. The findings suggest that the proposed method can effectively detect adversarial samples across different architectures and attacks. The paper also discusses the background of DNNs, adversarial attacks, and the intuition behind the proposed features, providing a comprehensive overview of the research.

Detecting Adversarial Samples from Artifacts

15 Nov 2017 | Reuben Feinman, Ryan R. Curtin, Saurabh Shintre, Andrew B. Gardner