5 Jul 2024 | Rudolf Laine, Bilal Chughtai, Jan Betley, Kaivalya Hariharan, Jérémy Scheurer, Mikita Balesni, Marius Hobbhahn, Alexander Meinke, Owain Evans
The paper introduces the Situational Awareness Dataset (SAD), a benchmark designed to quantify the situational awareness of large language models (LLMs). Situational awareness is defined as a model's knowledge of itself and its circumstances, including self-knowledge, inferences about its situation, and actions based on this knowledge. SAD consists of 7 task categories with over 13,000 questions, testing various aspects of situational awareness such as recognizing generated text, predicting behavior, distinguishing evaluation from deployment, and following instructions that depend on self-knowledge. The benchmark evaluates 16 LLMs, including both base and chat models, and finds that while all models perform better than chance, even the highest-scoring model (Claude 3 Opus) falls short of human baseline performance. The study also shows that chat models outperform their corresponding base models and that performance improves with the use of a situating prompt. Additionally, performance on SAD is only partially predicted by general knowledge metrics like MMLU, suggesting that SAD captures distinct abilities. The paper discusses the importance of situational awareness for enhancing autonomous planning and action in AI assistants, while also highlighting potential risks related to AI safety and control.The paper introduces the Situational Awareness Dataset (SAD), a benchmark designed to quantify the situational awareness of large language models (LLMs). Situational awareness is defined as a model's knowledge of itself and its circumstances, including self-knowledge, inferences about its situation, and actions based on this knowledge. SAD consists of 7 task categories with over 13,000 questions, testing various aspects of situational awareness such as recognizing generated text, predicting behavior, distinguishing evaluation from deployment, and following instructions that depend on self-knowledge. The benchmark evaluates 16 LLMs, including both base and chat models, and finds that while all models perform better than chance, even the highest-scoring model (Claude 3 Opus) falls short of human baseline performance. The study also shows that chat models outperform their corresponding base models and that performance improves with the use of a situating prompt. Additionally, performance on SAD is only partially predicted by general knowledge metrics like MMLU, suggesting that SAD captures distinct abilities. The paper discusses the importance of situational awareness for enhancing autonomous planning and action in AI assistants, while also highlighting potential risks related to AI safety and control.