Black-Box Access is Insufficient for Rigorous AI Audits

Black-Box Access is Insufficient for Rigorous AI Audits

June 3–6, 2024 | Stephen Casper, Carson Ezell, Charlotte Siegmann, Noam Kolt, Taylor Lynn Curtis, Benjamin Bucknall, Andreas Haupt, Kevin Wei, Jérémy Scheurer, Marius Hobbhahn, Lee Sharkey, Satyapriya Krishna, Marvin Von Hagen, Silas Alberti, Alan Chan, Qinyi Sun, Michael Gerovitch, David Bau, Max Tegmark, David Krueger, Dylan Hadfield-Menell
Black-box access is insufficient for rigorous AI audits. This paper argues that black-box access, which allows auditors to only query a system and observe its outputs, is limited in its ability to identify risks, interpret models, and assess their fairness or safety. White-box access, which allows auditors to examine the system's internal workings (e.g., weights, activations, gradients), and outside-the-box access, which provides auditors with additional contextual information (e.g., methodology, code, documentation, data, deployment details), offer more comprehensive evaluations. White-box access enables stronger attacks, more thorough interpretation of models, and fine-tuning to reveal risks from latent knowledge or post-deployment modifications. Outside-the-box access allows auditors to scrutinize the development process and design more targeted evaluations. The paper also discusses technical, physical, and legal safeguards to minimize security risks when conducting audits with higher levels of access. It concludes that transparency regarding access and evaluation methods is necessary for proper interpretation of audit results, and that white- and outside-the-box access allow for substantially more scrutiny than black-box access alone. The paper emphasizes that while white-box and outside-the-box access are necessary, they are not sufficient for rigorous audits, and that careful institutional design is needed to ensure audits align with public interest.Black-box access is insufficient for rigorous AI audits. This paper argues that black-box access, which allows auditors to only query a system and observe its outputs, is limited in its ability to identify risks, interpret models, and assess their fairness or safety. White-box access, which allows auditors to examine the system's internal workings (e.g., weights, activations, gradients), and outside-the-box access, which provides auditors with additional contextual information (e.g., methodology, code, documentation, data, deployment details), offer more comprehensive evaluations. White-box access enables stronger attacks, more thorough interpretation of models, and fine-tuning to reveal risks from latent knowledge or post-deployment modifications. Outside-the-box access allows auditors to scrutinize the development process and design more targeted evaluations. The paper also discusses technical, physical, and legal safeguards to minimize security risks when conducting audits with higher levels of access. It concludes that transparency regarding access and evaluation methods is necessary for proper interpretation of audit results, and that white- and outside-the-box access allow for substantially more scrutiny than black-box access alone. The paper emphasizes that while white-box and outside-the-box access are necessary, they are not sufficient for rigorous audits, and that careful institutional design is needed to ensure audits align with public interest.
Reach us at info@study.space