5 Jul 2024 | Rudolf Laine, Bilal Chughtai, Jan Betley, Kaivalya Hariharan, Jérémy Scheurer, Mikita Balesni, Marius Hobbhahn, Alexander Meinke, Owain Evans
The Situational Awareness Dataset (SAD) is a benchmark designed to measure the self-awareness of large language models (LLMs). It includes 7 task categories and over 13,000 questions, testing various aspects of situational awareness, such as recognizing one's own generated text, predicting behavior, distinguishing between internal evaluation and real-world deployment, and following instructions based on self-knowledge. The dataset evaluates 16 LLMs, including both base and chat models, and shows that while all models perform better than chance, even the highest-scoring model (Claude 3 Opus) falls short of human performance on certain tasks. Performance on SAD is only partially predicted by general knowledge metrics like MMLU. Chat models outperform base models on SAD but not on general knowledge tasks. The SAD benchmark aims to facilitate scientific understanding of situational awareness in LLMs by breaking it down into quantifiable abilities. Situational awareness is important for autonomous planning and action but also introduces risks related to AI safety and control. The dataset includes tasks such as distinguishing between evaluation and deployment, self-recognition, and introspection. Results show that adding a situating prompt or chat finetuning improves performance. The Long Monologue task, which tests self-knowledge and inference, is included in SAD but does not count toward the overall score. The study highlights the importance of situational awareness in LLMs and the need for further research to understand its implications for AI safety and control.The Situational Awareness Dataset (SAD) is a benchmark designed to measure the self-awareness of large language models (LLMs). It includes 7 task categories and over 13,000 questions, testing various aspects of situational awareness, such as recognizing one's own generated text, predicting behavior, distinguishing between internal evaluation and real-world deployment, and following instructions based on self-knowledge. The dataset evaluates 16 LLMs, including both base and chat models, and shows that while all models perform better than chance, even the highest-scoring model (Claude 3 Opus) falls short of human performance on certain tasks. Performance on SAD is only partially predicted by general knowledge metrics like MMLU. Chat models outperform base models on SAD but not on general knowledge tasks. The SAD benchmark aims to facilitate scientific understanding of situational awareness in LLMs by breaking it down into quantifiable abilities. Situational awareness is important for autonomous planning and action but also introduces risks related to AI safety and control. The dataset includes tasks such as distinguishing between evaluation and deployment, self-recognition, and introspection. Results show that adding a situating prompt or chat finetuning improves performance. The Long Monologue task, which tests self-knowledge and inference, is included in SAD but does not count toward the overall score. The study highlights the importance of situational awareness in LLMs and the need for further research to understand its implications for AI safety and control.