Stream of Search (SoS): Learning to Search in Language

Stream of Search (SoS): Learning to Search in Language

1 Apr 2024 | Kanishk Gandhi, Denise Lee, Gabriel Grand, Muxin Liu, Winson Cheng, Archit Sharma, Noah D. Goodman
This paper introduces a novel approach to teach language models (LMs) to search by representing the search process as a "Stream of Search" (SoS), a flattened string that captures the sequence of actions and states during search. The authors demonstrate that by training LM on search trajectories generated by heuristic solvers, they can improve the model's ability to solve complex problems, including those that previous heuristic solvers cannot solve. They evaluate their approach on the Countdown game, where the goal is to combine input numbers with arithmetic operations to reach a target number. The SoS approach outperforms models trained only on optimal paths, and further improves when combined with policy improvement techniques such as Advantage-Induced Policy Alignment (APA) and Self-Taught Reasoner (STaR). The SoS models solve 36% of previously unsolved problems, including those that no heuristic solvers can solve. The results show that language models can learn to solve problems through search, self-improve to flexibly use different search strategies, and potentially discover new ones. The paper also discusses the importance of exposing models to the messy process of problem solving, including exploration and backtracking, rather than just the ideal solution steps. The SoS framework enables LM to learn an internal 'world model' for search, allowing them to simulate state transitions themselves and learn to search without relying on external structures. The authors conclude that the SoS approach has the potential to address criticisms of language models for planning and problem solving, as it allows models to explore alternative paths, overcome failures in lookahead tasks, and learn more adaptable and generalizable search strategies.This paper introduces a novel approach to teach language models (LMs) to search by representing the search process as a "Stream of Search" (SoS), a flattened string that captures the sequence of actions and states during search. The authors demonstrate that by training LM on search trajectories generated by heuristic solvers, they can improve the model's ability to solve complex problems, including those that previous heuristic solvers cannot solve. They evaluate their approach on the Countdown game, where the goal is to combine input numbers with arithmetic operations to reach a target number. The SoS approach outperforms models trained only on optimal paths, and further improves when combined with policy improvement techniques such as Advantage-Induced Policy Alignment (APA) and Self-Taught Reasoner (STaR). The SoS models solve 36% of previously unsolved problems, including those that no heuristic solvers can solve. The results show that language models can learn to solve problems through search, self-improve to flexibly use different search strategies, and potentially discover new ones. The paper also discusses the importance of exposing models to the messy process of problem solving, including exploration and backtracking, rather than just the ideal solution steps. The SoS framework enables LM to learn an internal 'world model' for search, allowing them to simulate state transitions themselves and learn to search without relying on external structures. The authors conclude that the SoS approach has the potential to address criticisms of language models for planning and problem solving, as it allows models to explore alternative paths, overcome failures in lookahead tasks, and learn more adaptable and generalizable search strategies.
Reach us at info@study.space
Understanding Stream of Search (SoS)%3A Learning to Search in Language