Recursive Introspection: Teaching Language Model Agents How to Self-Improve

Recursive Introspection: Teaching Language Model Agents How to Self-Improve

26 Jul 2024 | Yuxiao Qu, Tianjun Zhang, Naman Garg, Aviral Kumar
The paper "Recursive Introspection: Teaching Language Model Agents How to Self-Improve" by Yuxiao Qu, Tianjun Zhang, Naman Garg, and Aviral Kumar introduces RISE (Recursive IntroSpEction), an approach to fine-tune large language models (LLMs) to enable them to self-improve over multiple turns. The authors argue that even the strongest LLMs do not exhibit the ability to sequentially improve their responses, even when explicitly told they are making mistakes. RISE is designed to teach models how to alter their responses after unsuccessful attempts to solve a problem, using iterative fine-tuning and optionally additional environment feedback. The approach is formulated as solving a multi-turn Markov decision process (MDP), where the initial state is the prompt. Inspired by online imitation learning and reinforcement learning, RISE proposes strategies for multi-turn data collection and training to enable LLMs to recursively detect and correct their mistakes. Experiments show that RISE enables LLaMa2, LLaMa3, and Mistral models to improve their performance on math reasoning tasks, outperforming single-turn strategies with equal inference-time computation. RISE also scales well and generalizes to out-of-distribution prompts, demonstrating its effectiveness in enhancing mathematical reasoning capabilities.The paper "Recursive Introspection: Teaching Language Model Agents How to Self-Improve" by Yuxiao Qu, Tianjun Zhang, Naman Garg, and Aviral Kumar introduces RISE (Recursive IntroSpEction), an approach to fine-tune large language models (LLMs) to enable them to self-improve over multiple turns. The authors argue that even the strongest LLMs do not exhibit the ability to sequentially improve their responses, even when explicitly told they are making mistakes. RISE is designed to teach models how to alter their responses after unsuccessful attempts to solve a problem, using iterative fine-tuning and optionally additional environment feedback. The approach is formulated as solving a multi-turn Markov decision process (MDP), where the initial state is the prompt. Inspired by online imitation learning and reinforcement learning, RISE proposes strategies for multi-turn data collection and training to enable LLMs to recursively detect and correct their mistakes. Experiments show that RISE enables LLaMa2, LLaMa3, and Mistral models to improve their performance on math reasoning tasks, outperforming single-turn strategies with equal inference-time computation. RISE also scales well and generalizes to out-of-distribution prompts, demonstrating its effectiveness in enhancing mathematical reasoning capabilities.
Reach us at info@study.space
[slides and audio] Recursive Introspection%3A Teaching Language Model Agents How to Self-Improve