Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

18 Apr 2024 | Ye Tian; Baolin Peng, Linfeng Song, Lifeng Jin, Dian Yu, Haitao Mi, Dong Yu
This paper introduces ALPHALLM, a self-improvement framework for large language models (LLMs) that integrates Monte Carlo Tree Search (MCTS) to enhance their reasoning and planning capabilities without requiring additional annotations. The framework consists of three key components: an imagination component for prompt synthesis, an efficient MCTS tailored for language tasks, and a trio of critic models for precise feedback. ALPHALLM addresses the challenges of data scarcity, vast search spaces, and subjective feedback in language tasks by leveraging MCTS to explore better responses and refine LLM outputs through self-assessment. The experimental results on mathematical reasoning tasks demonstrate that ALPHALLM significantly improves LLM performance, achieving results comparable to GPT-4. The framework enables LLMs to self-improve through a loop of imagination, searching, and criticizing, making it a promising approach for enhancing LLM capabilities in complex problem-solving tasks.This paper introduces ALPHALLM, a self-improvement framework for large language models (LLMs) that integrates Monte Carlo Tree Search (MCTS) to enhance their reasoning and planning capabilities without requiring additional annotations. The framework consists of three key components: an imagination component for prompt synthesis, an efficient MCTS tailored for language tasks, and a trio of critic models for precise feedback. ALPHALLM addresses the challenges of data scarcity, vast search spaces, and subjective feedback in language tasks by leveraging MCTS to explore better responses and refine LLM outputs through self-assessment. The experimental results on mathematical reasoning tasks demonstrate that ALPHALLM significantly improves LLM performance, achieving results comparable to GPT-4. The framework enables LLMs to self-improve through a loop of imagination, searching, and criticizing, making it a promising approach for enhancing LLM capabilities in complex problem-solving tasks.
Reach us at info@study.space
[slides and audio] Toward Self-Improvement of LLMs via Imagination%2C Searching%2C and Criticizing