21 Feb 2017 | Jan Hendrik Metzen & Tim Genewein & Volker Fischer & Bastian Bischoff
This paper presents a method for detecting adversarial perturbations in deep neural networks. The approach involves adding a small "detector" subnetwork to the main classification network, which is trained to distinguish between genuine data and data containing adversarial perturbations. The detector is trained on a binary classification task, distinguishing between regular examples and adversarial examples. The method is orthogonal to previous approaches that focus on making the classification network itself more robust. The results show that adversarial perturbations can be detected surprisingly well, even though they are quasi-imperceptible to humans. Moreover, the detectors generalize to similar and weaker adversaries. The paper also proposes an adversarial attack that fools both the classifier and the detector, and a novel training procedure for the detector that counteracts this attack.
The paper discusses the problem of adversarial examples, which are inputs that are designed to fool a machine learning model while being imperceptible to humans. Adversarial examples have been shown to transfer between different network architectures and to be robust to real-world conditions. The paper also discusses various methods for generating adversarial examples, including the fast method, basic iterative method, and DeepFool method. The paper presents a method for detecting adversarial examples by training a detector network to classify inputs as regular or adversarial. The detector is trained on a balanced dataset consisting of original data and adversarial examples. The paper also discusses the challenge of dynamic adversaries, which have access to both the classifier and the detector. The paper proposes a method for training the detector to resist dynamic adversaries by using a dynamic adversary training approach.
The paper presents experimental results on the CIFAR10 and ImageNet datasets, showing that the detector can successfully detect adversarial examples. The results show that the detector can detect adversarial examples with high accuracy, even when the perturbations are small and imperceptible to humans. The paper also shows that the detector generalizes well to similar and weaker adversaries. The results also show that the detector is more robust to dynamic adversaries when trained with a dynamic adversary training approach. The paper concludes that the proposed method for detecting adversarial perturbations is effective and can be used to improve the robustness of machine learning systems.This paper presents a method for detecting adversarial perturbations in deep neural networks. The approach involves adding a small "detector" subnetwork to the main classification network, which is trained to distinguish between genuine data and data containing adversarial perturbations. The detector is trained on a binary classification task, distinguishing between regular examples and adversarial examples. The method is orthogonal to previous approaches that focus on making the classification network itself more robust. The results show that adversarial perturbations can be detected surprisingly well, even though they are quasi-imperceptible to humans. Moreover, the detectors generalize to similar and weaker adversaries. The paper also proposes an adversarial attack that fools both the classifier and the detector, and a novel training procedure for the detector that counteracts this attack.
The paper discusses the problem of adversarial examples, which are inputs that are designed to fool a machine learning model while being imperceptible to humans. Adversarial examples have been shown to transfer between different network architectures and to be robust to real-world conditions. The paper also discusses various methods for generating adversarial examples, including the fast method, basic iterative method, and DeepFool method. The paper presents a method for detecting adversarial examples by training a detector network to classify inputs as regular or adversarial. The detector is trained on a balanced dataset consisting of original data and adversarial examples. The paper also discusses the challenge of dynamic adversaries, which have access to both the classifier and the detector. The paper proposes a method for training the detector to resist dynamic adversaries by using a dynamic adversary training approach.
The paper presents experimental results on the CIFAR10 and ImageNet datasets, showing that the detector can successfully detect adversarial examples. The results show that the detector can detect adversarial examples with high accuracy, even when the perturbations are small and imperceptible to humans. The paper also shows that the detector generalizes well to similar and weaker adversaries. The results also show that the detector is more robust to dynamic adversaries when trained with a dynamic adversary training approach. The paper concludes that the proposed method for detecting adversarial perturbations is effective and can be used to improve the robustness of machine learning systems.