Evaluating the World Model Implicit in a Generative Model

Evaluating the World Model Implicit in a Generative Model

22 Jun 2024 | Keyon Vafa, Justin Y. Chen, Jon Kleinberg, Sendhil Mullainathan, Ashesh Rambachan
Large language models (LLMs) may implicitly learn world models, but assessing this remains challenging. This paper evaluates whether generative models recover the true underlying structure of domains, such as navigation, games, and logic puzzles. The authors propose new metrics inspired by the Myhill-Nerode theorem from language theory to assess world model recovery. These metrics evaluate whether a model correctly compresses sequences leading to the same state and distinguishes sequences leading to different states. The paper demonstrates that while LLMs perform well on existing diagnostics for world model recovery, they often fail to capture the true underlying structure. For example, in a taxi ride dataset, LLMs predict valid directions but fail to recover the true street map of New York City. This incoherence leads to fragility: models can fail when faced with slightly different tasks. The authors test their metrics on three domains: navigation, Othello, and logic puzzles. In navigation, they show that LLMs trained on taxi ride data fail to recover the true street map, leading to poor performance when detours are introduced. In Othello, models trained on real games perform poorly on their metrics, while those trained on synthetic games perform better. In logic puzzles, LLMs can solve tasks when fully specified but fail on compression and distinction metrics, indicating they do not have a coherent world model. The paper argues that current diagnostics are insufficient and that the proposed metrics provide a more accurate assessment of world model recovery. These metrics are model-agnostic and can be applied to various domains. The authors release their benchmark dataset and evaluation tools. The results suggest that building generative models that meaningfully capture the underlying logic of their domains is crucial, and the proposed metrics offer a new way to assess how close a model is to this goal.Large language models (LLMs) may implicitly learn world models, but assessing this remains challenging. This paper evaluates whether generative models recover the true underlying structure of domains, such as navigation, games, and logic puzzles. The authors propose new metrics inspired by the Myhill-Nerode theorem from language theory to assess world model recovery. These metrics evaluate whether a model correctly compresses sequences leading to the same state and distinguishes sequences leading to different states. The paper demonstrates that while LLMs perform well on existing diagnostics for world model recovery, they often fail to capture the true underlying structure. For example, in a taxi ride dataset, LLMs predict valid directions but fail to recover the true street map of New York City. This incoherence leads to fragility: models can fail when faced with slightly different tasks. The authors test their metrics on three domains: navigation, Othello, and logic puzzles. In navigation, they show that LLMs trained on taxi ride data fail to recover the true street map, leading to poor performance when detours are introduced. In Othello, models trained on real games perform poorly on their metrics, while those trained on synthetic games perform better. In logic puzzles, LLMs can solve tasks when fully specified but fail on compression and distinction metrics, indicating they do not have a coherent world model. The paper argues that current diagnostics are insufficient and that the proposed metrics provide a more accurate assessment of world model recovery. These metrics are model-agnostic and can be applied to various domains. The authors release their benchmark dataset and evaluation tools. The results suggest that building generative models that meaningfully capture the underlying logic of their domains is crucial, and the proposed metrics offer a new way to assess how close a model is to this goal.
Reach us at info@study.space