2018 | Andrew Ilyas, Logan Engstrom, Anish Athalye, Jessy Lin
This paper introduces three new black-box threat models for adversarial attacks on neural network classifiers: query-limited, partial-information, and label-only settings. These models reflect real-world constraints on access to classifier information and query budgets. The authors propose new algorithms for generating adversarial examples under these more restrictive threat models, which are more effective than previous methods that rely on substitute networks or gradient estimation. The methods are demonstrated against an ImageNet classifier and a commercial classifier, the Google Cloud Vision API, showing their effectiveness in real-world scenarios.
The query-limited setting restricts the number of queries an attacker can make to the classifier. The authors use Natural Evolutionary Strategies (NES) to estimate gradients from queries, enabling efficient generation of adversarial examples. The partial-information setting allows the attacker only access to the top-k class probabilities, and the authors develop an algorithm that alternates between projecting onto $\ell_{\infty}$ boxes and maximizing the probability of the target class. The label-only setting provides only the top-k labels in order of predicted confidence, and the authors use a proxy score based on label ranking to generate adversarial examples.
The authors evaluate their methods on ImageNet and demonstrate their effectiveness in generating targeted adversarial examples under the three threat models. They also show that their methods can successfully attack the Google Cloud Vision API, which is a commercial classifier with limited information access. The results show that even with limited queries and information, machine learning systems remain vulnerable to adversarial attacks.This paper introduces three new black-box threat models for adversarial attacks on neural network classifiers: query-limited, partial-information, and label-only settings. These models reflect real-world constraints on access to classifier information and query budgets. The authors propose new algorithms for generating adversarial examples under these more restrictive threat models, which are more effective than previous methods that rely on substitute networks or gradient estimation. The methods are demonstrated against an ImageNet classifier and a commercial classifier, the Google Cloud Vision API, showing their effectiveness in real-world scenarios.
The query-limited setting restricts the number of queries an attacker can make to the classifier. The authors use Natural Evolutionary Strategies (NES) to estimate gradients from queries, enabling efficient generation of adversarial examples. The partial-information setting allows the attacker only access to the top-k class probabilities, and the authors develop an algorithm that alternates between projecting onto $\ell_{\infty}$ boxes and maximizing the probability of the target class. The label-only setting provides only the top-k labels in order of predicted confidence, and the authors use a proxy score based on label ranking to generate adversarial examples.
The authors evaluate their methods on ImageNet and demonstrate their effectiveness in generating targeted adversarial examples under the three threat models. They also show that their methods can successfully attack the Google Cloud Vision API, which is a commercial classifier with limited information access. The results show that even with limited queries and information, machine learning systems remain vulnerable to adversarial attacks.