Understanding Stream of Search (SoS)%3A Learning to Search in Language

The paper "Stream of Search (SoS): Learning to Search in Language" by Kanishk Gandhi, Denise Lee, Gabriel Grand, Muxin Liu, Winson Cheng, Archit Sharma, and Noah D. Goodman explores how language models can be trained to perform search tasks, particularly in the context of the game "Countdown." The authors propose a unified language for search, representing the search process as a flattened string, or a "stream of search" (SoS). This approach allows the model to learn from both optimal and suboptimal search strategies, including exploration, backtracking, and pruning. Key findings include: 1. **Performance Improvement**: The SoS model trained on diverse search trajectories outperforms models trained solely on optimal paths by 25%. 2. **Policy Improvement**: The model can be further improved using techniques like Advantage-Induced Policy Alignment (APA) and Self-Taught Reasoner (STaR), solving 36% of previously unsolved problems. 3. **Self-Improvement**: The SoS model can self-improve by optimizing for correctness, leading to more efficient and flexible search strategies. 4. **Discovery of New Strategies**: The model can potentially discover new search strategies during training. The paper highlights the importance of exposing models to the messy process of problem-solving, including mistakes and backtracking, to enhance their ability to handle complex tasks. The SoS framework provides a structured way to represent and learn from different search strategies, potentially addressing limitations in current language models for planning and problem-solving.The paper "Stream of Search (SoS): Learning to Search in Language" by Kanishk Gandhi, Denise Lee, Gabriel Grand, Muxin Liu, Winson Cheng, Archit Sharma, and Noah D. Goodman explores how language models can be trained to perform search tasks, particularly in the context of the game "Countdown." The authors propose a unified language for search, representing the search process as a flattened string, or a "stream of search" (SoS). This approach allows the model to learn from both optimal and suboptimal search strategies, including exploration, backtracking, and pruning. Key findings include: 1. **Performance Improvement**: The SoS model trained on diverse search trajectories outperforms models trained solely on optimal paths by 25%. 2. **Policy Improvement**: The model can be further improved using techniques like Advantage-Induced Policy Alignment (APA) and Self-Taught Reasoner (STaR), solving 36% of previously unsolved problems. 3. **Self-Improvement**: The SoS model can self-improve by optimizing for correctness, leading to more efficient and flexible search strategies. 4. **Discovery of New Strategies**: The model can potentially discover new search strategies during training. The paper highlights the importance of exposing models to the messy process of problem-solving, including mistakes and backtracking, to enhance their ability to handle complex tasks. The SoS framework provides a structured way to represent and learn from different search strategies, potentially addressing limitations in current language models for planning and problem-solving.

Stream of Search (SoS): Learning to Search in Language

1 Apr 2024 | Kanishk Gandhi, Denise Lee, Gabriel Grand, Muxin Liu, Winson Cheng, Archit Sharma, Noah D. Goodman