Detecting Adversarial Samples from Artifacts

Detecting Adversarial Samples from Artifacts

15 Nov 2017 | Reuben Feinman, Ryan R. Curtin, Saurabh Shintre, Andrew B. Gardner
This paper presents a method for detecting adversarial samples in deep neural networks (DNNs) by leveraging two features: density estimates in the subspace of the last hidden layer and Bayesian uncertainty estimates from dropout neural networks. Adversarial samples are crafted to fool DNNs by making small, targeted changes to input data. The proposed method aims to distinguish these adversarial samples from normal and noisy samples without requiring knowledge of the attack algorithm. The method uses density estimation in the feature space of the last hidden layer to detect points that lie far from the data manifold. It also uses Bayesian uncertainty estimates from dropout networks to detect points in low-confidence regions of the input space. When both features are used as inputs to a logistic regression model, the method achieves high detection accuracy, with an ROC-AUC of 92.6% on the MNIST dataset. The paper evaluates the method on standard datasets including MNIST and CIFAR-10, showing that it generalizes well across different architectures and attacks. The results show that the method can achieve ROC-AUC scores of 85-93% on various classification tasks with both normal and noisy samples as the negative class. The paper also discusses the vulnerability of DNNs to adversarial attacks and the importance of detecting these samples for security and performance reasons. It reviews several state-of-the-art adversarial attacks, including Fast Gradient Sign Method (FGSM), Basic Iterative Method (BIM), Jacobian-based Saliency Map Attack (JSMA), and Carlini & Wagner (C&W) attacks. The paper shows that the proposed method is effective in detecting adversarial samples across these attacks. The paper concludes that the two features—density estimates and Bayesian uncertainty estimates—can be combined to create an effective defense mechanism against adversarial samples. The method is robust to random noise and can detect adversarial samples from a wide range of attacks on various datasets. The paper also suggests that the approach can be extended to other neural network architectures, including recurrent neural networks (RNNs).This paper presents a method for detecting adversarial samples in deep neural networks (DNNs) by leveraging two features: density estimates in the subspace of the last hidden layer and Bayesian uncertainty estimates from dropout neural networks. Adversarial samples are crafted to fool DNNs by making small, targeted changes to input data. The proposed method aims to distinguish these adversarial samples from normal and noisy samples without requiring knowledge of the attack algorithm. The method uses density estimation in the feature space of the last hidden layer to detect points that lie far from the data manifold. It also uses Bayesian uncertainty estimates from dropout networks to detect points in low-confidence regions of the input space. When both features are used as inputs to a logistic regression model, the method achieves high detection accuracy, with an ROC-AUC of 92.6% on the MNIST dataset. The paper evaluates the method on standard datasets including MNIST and CIFAR-10, showing that it generalizes well across different architectures and attacks. The results show that the method can achieve ROC-AUC scores of 85-93% on various classification tasks with both normal and noisy samples as the negative class. The paper also discusses the vulnerability of DNNs to adversarial attacks and the importance of detecting these samples for security and performance reasons. It reviews several state-of-the-art adversarial attacks, including Fast Gradient Sign Method (FGSM), Basic Iterative Method (BIM), Jacobian-based Saliency Map Attack (JSMA), and Carlini & Wagner (C&W) attacks. The paper shows that the proposed method is effective in detecting adversarial samples across these attacks. The paper concludes that the two features—density estimates and Bayesian uncertainty estimates—can be combined to create an effective defense mechanism against adversarial samples. The method is robust to random noise and can detect adversarial samples from a wide range of attacks on various datasets. The paper also suggests that the approach can be extended to other neural network architectures, including recurrent neural networks (RNNs).
Reach us at info@study.space
[slides and audio] Detecting Adversarial Samples from Artifacts