Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping

26 Apr 2024 | Lucas Lehnert, Sainbayar Sukhbaatar, DiJia Su, Qinqing Zheng, Paul Mcvay, Michael Rabbat, Yuandong Tian
This paper presents a novel approach to training Transformers to solve complex planning tasks, specifically by predicting the search dynamics of the A* algorithm. The authors introduce Searchformer, a Transformer model that optimizes the solution of Sokoban puzzles, achieving 93.7% accuracy while using 26.8% fewer search steps compared to the A* implementation used for training. The training process involves expressing planning tasks and their optimal solutions as token sequences, and logging the A* search dynamics into an execution trace. This trace is then used to train a Transformer model to generate token sequences that encode both the task and the optimal plan. The model is further fine-tuned to generate shorter token sequences while maintaining optimal solutions, resulting in the Searchformer model. Experiments demonstrate that Searchformer outperforms baselines that predict optimal plans directly, with a significantly smaller model size and training dataset. The paper also explores the scalability of Searchformer to larger and more complex tasks, showing improved performance and reduced search dynamics. The work highlights the potential of Transformers in solving complex planning tasks and provides insights into how to improve their reasoning capabilities.This paper presents a novel approach to training Transformers to solve complex planning tasks, specifically by predicting the search dynamics of the A* algorithm. The authors introduce Searchformer, a Transformer model that optimizes the solution of Sokoban puzzles, achieving 93.7% accuracy while using 26.8% fewer search steps compared to the A* implementation used for training. The training process involves expressing planning tasks and their optimal solutions as token sequences, and logging the A* search dynamics into an execution trace. This trace is then used to train a Transformer model to generate token sequences that encode both the task and the optimal plan. The model is further fine-tuned to generate shorter token sequences while maintaining optimal solutions, resulting in the Searchformer model. Experiments demonstrate that Searchformer outperforms baselines that predict optimal plans directly, with a significantly smaller model size and training dataset. The paper also explores the scalability of Searchformer to larger and more complex tasks, showing improved performance and reduced search dynamics. The work highlights the potential of Transformers in solving complex planning tasks and provides insights into how to improve their reasoning capabilities.
Reach us at info@study.space
[slides] Beyond A*%3A Better Planning with Transformers via Search Dynamics Bootstrapping | StudySpace