Grandmaster-Level Chess Without Search

Grandmaster-Level Chess Without Search

2024 | Anian Ruoss, Grégoire Delétang, Sourabh Medapati, Jordi Grau-Moya, Li Kevin Wenliang, Elliot Catt, John Reid and Tim Genewein
This paper presents a grandmaster-level chess policy trained using supervised learning without explicit search or domain-specific heuristics. The model is trained on a dataset of 10 million chess games, annotated with action-values from the Stockfish 16 engine, resulting in approximately 15 billion data points. The model, with 270 million parameters, achieves a Lichess blitz Elo of 2895 against humans and successfully solves challenging chess puzzles without any explicit search algorithms. It outperforms AlphaZero's policy and value networks and GPT-3.5-turbo-instruct. The results show that strong chess performance is achieved at sufficient scale, with larger models and datasets leading to better performance. The model is trained using a transformer architecture and supervised learning, with the ability to predict action-values and state-values for chess boards. The paper also discusses the limitations of the approach, including the inability to detect threefold repetition and the potential for tactical mistakes. The study demonstrates that complex algorithms like Stockfish can be approximated by feed-forward neural networks through supervised learning, suggesting a shift in the view of large transformers from statistical pattern recognizers to powerful general algorithm approximation techniques.This paper presents a grandmaster-level chess policy trained using supervised learning without explicit search or domain-specific heuristics. The model is trained on a dataset of 10 million chess games, annotated with action-values from the Stockfish 16 engine, resulting in approximately 15 billion data points. The model, with 270 million parameters, achieves a Lichess blitz Elo of 2895 against humans and successfully solves challenging chess puzzles without any explicit search algorithms. It outperforms AlphaZero's policy and value networks and GPT-3.5-turbo-instruct. The results show that strong chess performance is achieved at sufficient scale, with larger models and datasets leading to better performance. The model is trained using a transformer architecture and supervised learning, with the ability to predict action-values and state-values for chess boards. The paper also discusses the limitations of the approach, including the inability to detect threefold repetition and the potential for tactical mistakes. The study demonstrates that complex algorithms like Stockfish can be approximated by feed-forward neural networks through supervised learning, suggesting a shift in the view of large transformers from statistical pattern recognizers to powerful general algorithm approximation techniques.
Reach us at info@study.space