This paper investigates whether language models can develop internal representations of complex systems like chess, and whether they can estimate latent variables such as player skill. The authors train a language model on real chess games and use linear probes to recover the model's internal board state and perform interventions on its activations. They find that the model can learn to represent the board state and estimate player skill, which improves its chess playing ability. The model is trained on real human games, not synthetic ones, and shows that it can learn to play legal and strategic chess moves. The authors also find that the model can estimate player skill, which they use to improve the model's win rate by up to 2.6 times. They validate their findings by performing interventions on the model's activations, showing that the model can be edited to change its internal board state and skill level. The results suggest that language models can develop world models of complex systems and learn to estimate latent variables to better predict the next character. The study also highlights the potential of language models for applications such as detecting and reducing hallucinations in natural language processing.This paper investigates whether language models can develop internal representations of complex systems like chess, and whether they can estimate latent variables such as player skill. The authors train a language model on real chess games and use linear probes to recover the model's internal board state and perform interventions on its activations. They find that the model can learn to represent the board state and estimate player skill, which improves its chess playing ability. The model is trained on real human games, not synthetic ones, and shows that it can learn to play legal and strategic chess moves. The authors also find that the model can estimate player skill, which they use to improve the model's win rate by up to 2.6 times. They validate their findings by performing interventions on the model's activations, showing that the model can be edited to change its internal board state and skill level. The results suggest that language models can develop world models of complex systems and learn to estimate latent variables to better predict the next character. The study also highlights the potential of language models for applications such as detecting and reducing hallucinations in natural language processing.