Black-Box Access is Insufficient for Rigorous AI Audits

Black-Box Access is Insufficient for Rigorous AI Audits

June 3–6, 2024 | Stephen Casper, Carson Ezell, Charlotte Siegmann, Noam Kolt, Taylor Lynn Curtis, Benjamin Bucknall, Andreas Haupt, Kevin Wei, Jérémy Scheurer, Marius Hobbhahn, Lee Sharkey, Satyapriya Krishna, Marvin Von Hagen, Silas Alberti, Alan Chan, Qinyi Sun, Michael Gerovitch, David Bau, Max Tegmark, David Krueger, Dylan Hadfield-Menell
The paper "Black-Box Access is Insufficient for Rigorous AI Audits" by Stephen Casper et al. examines the limitations of black-box audits and the advantages of white-box and outside-the-box audits in evaluating AI systems. Black-box audits, which only allow auditors to query the system and observe outputs, are found to be insufficient for identifying and addressing various issues, such as anomalous failures and dataset biases. In contrast, white-box audits, which provide unrestricted access to the system's internal workings, enable more powerful attacks, better model interpretability, and fine-tuning to reveal hidden risks. Outside-the-box audits, which include access to training and deployment information, allow auditors to scrutinize the development process and design more targeted evaluations. The authors conclude that transparency about the access and methods used by auditors is crucial for interpreting audit results, and that white-box and outside-the-box access offer significantly more scrutiny than black-box access alone. They also discuss technical, physical, and legal safeguards to minimize security risks when conducting these audits.The paper "Black-Box Access is Insufficient for Rigorous AI Audits" by Stephen Casper et al. examines the limitations of black-box audits and the advantages of white-box and outside-the-box audits in evaluating AI systems. Black-box audits, which only allow auditors to query the system and observe outputs, are found to be insufficient for identifying and addressing various issues, such as anomalous failures and dataset biases. In contrast, white-box audits, which provide unrestricted access to the system's internal workings, enable more powerful attacks, better model interpretability, and fine-tuning to reveal hidden risks. Outside-the-box audits, which include access to training and deployment information, allow auditors to scrutinize the development process and design more targeted evaluations. The authors conclude that transparency about the access and methods used by auditors is crucial for interpreting audit results, and that white-box and outside-the-box access offer significantly more scrutiny than black-box access alone. They also discuss technical, physical, and legal safeguards to minimize security risks when conducting these audits.
Reach us at info@study.space
Understanding Black-Box Access is Insufficient for Rigorous AI Audits