[slides and audio] Toward Self-Improvement of LLMs via Imagination%2C Searching%2C and Criticizing

This paper introduces ALPHALLM, a framework designed to enhance the self-improvement capabilities of Large Language Models (LLMs) through imagination, searching, and criticizing. ALPHALLM integrates Monte Carlo Tree Search (MCTS) with LLMs to establish a self-improving loop, aiming to improve LLMs' performance without additional annotations. The framework consists of three key components: an imagination component that synthesizes prompts, an efficient MCTS approach tailored for language tasks, and a trio of critic models for precise feedback. The authors address the challenges of data scarcity, the vast search spaces in language tasks, and the subjective nature of feedback in natural language processing. Experimental results on mathematical reasoning tasks demonstrate that ALPHALLM significantly enhances LLMs' performance, showing potential for self-improvement in complex reasoning and planning tasks. The framework outperforms existing models, including GPT-4, on datasets like GSM8K and MATH, highlighting its effectiveness and broad applicability.This paper introduces ALPHALLM, a framework designed to enhance the self-improvement capabilities of Large Language Models (LLMs) through imagination, searching, and criticizing. ALPHALLM integrates Monte Carlo Tree Search (MCTS) with LLMs to establish a self-improving loop, aiming to improve LLMs' performance without additional annotations. The framework consists of three key components: an imagination component that synthesizes prompts, an efficient MCTS approach tailored for language tasks, and a trio of critic models for precise feedback. The authors address the challenges of data scarcity, the vast search spaces in language tasks, and the subjective nature of feedback in natural language processing. Experimental results on mathematical reasoning tasks demonstrate that ALPHALLM significantly enhances LLMs' performance, showing potential for self-improvement in complex reasoning and planning tasks. The framework outperforms existing models, including GPT-4, on datasets like GSM8K and MATH, highlighting its effectiveness and broad applicability.

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

18 Apr 2024 | Ye Tian; Baolin Peng, Linfeng Song, Lifeng Jin, Dian Yu, Haitao Mi*, Dong Yu

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

18 Apr 2024 | Ye Tian; Baolin Peng*, Linfeng Song*, Lifeng Jin, Dian Yu, Haitao Mi*, Dong Yu

18 Apr 2024 | Ye Tian; Baolin Peng, Linfeng Song, Lifeng Jin, Dian Yu, Haitao Mi*, Dong Yu