Understanding Evaluating the World Model Implicit in a Generative Model

The paper evaluates whether large language models (LLMs) implicitly learn world models, particularly in the context of deterministic finite automata (DFAs). The authors propose new evaluation metrics inspired by the Myhill-Nerode theorem to assess the coherence of these world models. These metrics are applied to three domains: game playing, logic puzzles, and navigation. Despite performing well on existing diagnostics, the models are found to have incoherent world models, which leads to fragility in solving related but subtly different tasks. The paper demonstrates that using these new evaluation metrics can reveal the limitations of current models and suggests ways to improve their ability to capture the underlying logic of the domains they model. The results highlight the importance of developing more accurate and coherent world models in LLMs.The paper evaluates whether large language models (LLMs) implicitly learn world models, particularly in the context of deterministic finite automata (DFAs). The authors propose new evaluation metrics inspired by the Myhill-Nerode theorem to assess the coherence of these world models. These metrics are applied to three domains: game playing, logic puzzles, and navigation. Despite performing well on existing diagnostics, the models are found to have incoherent world models, which leads to fragility in solving related but subtly different tasks. The paper demonstrates that using these new evaluation metrics can reveal the limitations of current models and suggests ways to improve their ability to capture the underlying logic of the domains they model. The results highlight the importance of developing more accurate and coherent world models in LLMs.

Evaluating the World Model Implicit in a Generative Model

22 Jun 2024 | Keyon Vafa, Justin Y. Chen, Jon Kleinberg, Sendhil Mullainathan, Ashesh Rambachan