Adversarial Robustness Limits via Scaling-Law and Human-Alignment Studies

Adversarial Robustness Limits via Scaling-Law and Human-Alignment Studies

2024 | Brian R. Bartoldson, James Diffenderfer, Konstantinos Parasiris, Bhavya Kailkhura
This paper investigates the limitations of adversarial robustness in image classifiers, focusing on CIFAR10. Despite state-of-the-art (SOTA) models achieving 100% clean accuracy, robustness to $ \ell_{\infty} $-norm bounded perturbations remains around 70%. The authors develop scaling laws to understand how model size, dataset size, and synthetic data quality affect robustness. These laws reveal inefficiencies in prior methods and provide actionable insights for improving robustness. For instance, SOTA methods use excess compute for their level of robustness, and a compute-efficient setup surpasses prior SOTA with 20% fewer training FLOPs and 74% AutoAttack accuracy. However, the scaling laws also predict that robustness grows slowly and plateaus at 90%, which is impractical to achieve. A small-scale human evaluation on AutoAttack data shows that human performance also plateaus near 90%, attributed to invalid adversarial images not consistent with their original labels. The authors argue that current attack formulations generate invalid data that humans also misclassify, and that solving this problem requires fixing the attack formulation to account for image validity. The paper also presents three approaches to derive scaling laws, showing that model size and dataset size should scale at similar rates when synthetic data quality is high. The authors find that their scaling laws align with human performance limits and suggest that future research should focus on improving attack formulations to account for image validity.This paper investigates the limitations of adversarial robustness in image classifiers, focusing on CIFAR10. Despite state-of-the-art (SOTA) models achieving 100% clean accuracy, robustness to $ \ell_{\infty} $-norm bounded perturbations remains around 70%. The authors develop scaling laws to understand how model size, dataset size, and synthetic data quality affect robustness. These laws reveal inefficiencies in prior methods and provide actionable insights for improving robustness. For instance, SOTA methods use excess compute for their level of robustness, and a compute-efficient setup surpasses prior SOTA with 20% fewer training FLOPs and 74% AutoAttack accuracy. However, the scaling laws also predict that robustness grows slowly and plateaus at 90%, which is impractical to achieve. A small-scale human evaluation on AutoAttack data shows that human performance also plateaus near 90%, attributed to invalid adversarial images not consistent with their original labels. The authors argue that current attack formulations generate invalid data that humans also misclassify, and that solving this problem requires fixing the attack formulation to account for image validity. The paper also presents three approaches to derive scaling laws, showing that model size and dataset size should scale at similar rates when synthetic data quality is high. The authors find that their scaling laws align with human performance limits and suggest that future research should focus on improving attack formulations to account for image validity.
Reach us at info@study.space
Understanding Adversarial Robustness Limits via Scaling-Law and Human-Alignment Studies