Adversarial Patch

Adversarial Patch

17 May 2018 | Tom B. Brown, Dandelion Mané, Aurko Roy, Martin Abadi, Justin Gilmer
This paper presents a method to create universal, robust, targeted adversarial image patches that can be used to attack any scene. These patches are universal because they work on any scene, robust because they function under various transformations, and targeted because they can cause a classifier to output any target class. The patches can be printed, added to any scene, photographed, and presented to image classifiers. Even when small, they cause classifiers to ignore other items in the scene and report a chosen target class. The paper introduces an attack that does not attempt to subtly transform an existing item into another. Instead, it generates an image-independent patch that is extremely salient to a neural network. This patch can be placed anywhere within the field of view of the classifier and causes the classifier to output a targeted class. Because this patch is scene-independent, it allows attackers to create a physical-world attack without prior knowledge of the lighting conditions, camera angle, type of classifier being attacked, or even the other items within the scene. The attack is significant because the attacker does not need to know what image they are attacking when constructing the attack. After generating an adversarial patch, the patch could be widely distributed across the Internet for other attackers to print out and use. Additionally, because the attack uses a large perturbation, the existing defense techniques which focus on defending against small perturbations may not be robust to larger perturbations such as these. The paper describes an approach that involves training a patch to optimize the expected probability of a target class, where the expectation is taken over random images, locations, and transformations. The patch is trained to work regardless of what is in the background, making it universal. The paper also considers camouflaged patches which are forced to look like a given starting image. The paper presents experimental results showing that the attack is effective in the real world. The results demonstrate that the attack can fool classifiers even when there are other objects within the scene. The paper also discusses the transferability of the attack into the physical world and the effectiveness of the patch in different scenarios. The paper concludes that the attack exploits the way image classification tasks are constructed. While images may contain several items, only one target label is considered true, and thus the network must learn to detect the most "salient" item in the frame. The adversarial patch exploits this feature by producing inputs much more salient than objects in the real world. Thus, when attacking object detection or image segmentation models, we expect a targeted toaster patch to be classified as a toaster, and not to affect other portions of the image.This paper presents a method to create universal, robust, targeted adversarial image patches that can be used to attack any scene. These patches are universal because they work on any scene, robust because they function under various transformations, and targeted because they can cause a classifier to output any target class. The patches can be printed, added to any scene, photographed, and presented to image classifiers. Even when small, they cause classifiers to ignore other items in the scene and report a chosen target class. The paper introduces an attack that does not attempt to subtly transform an existing item into another. Instead, it generates an image-independent patch that is extremely salient to a neural network. This patch can be placed anywhere within the field of view of the classifier and causes the classifier to output a targeted class. Because this patch is scene-independent, it allows attackers to create a physical-world attack without prior knowledge of the lighting conditions, camera angle, type of classifier being attacked, or even the other items within the scene. The attack is significant because the attacker does not need to know what image they are attacking when constructing the attack. After generating an adversarial patch, the patch could be widely distributed across the Internet for other attackers to print out and use. Additionally, because the attack uses a large perturbation, the existing defense techniques which focus on defending against small perturbations may not be robust to larger perturbations such as these. The paper describes an approach that involves training a patch to optimize the expected probability of a target class, where the expectation is taken over random images, locations, and transformations. The patch is trained to work regardless of what is in the background, making it universal. The paper also considers camouflaged patches which are forced to look like a given starting image. The paper presents experimental results showing that the attack is effective in the real world. The results demonstrate that the attack can fool classifiers even when there are other objects within the scene. The paper also discusses the transferability of the attack into the physical world and the effectiveness of the patch in different scenarios. The paper concludes that the attack exploits the way image classification tasks are constructed. While images may contain several items, only one target label is considered true, and thus the network must learn to detect the most "salient" item in the frame. The adversarial patch exploits this feature by producing inputs much more salient than objects in the real world. Thus, when attacking object detection or image segmentation models, we expect a targeted toaster patch to be classified as a toaster, and not to affect other portions of the image.
Reach us at info@study.space
[slides] Adversarial Patch | StudySpace