MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time

MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time

26 Jun 2024 | Jikun Kang*,1, Derek Li*,1, Xi Chen1, Amirreza Kazemi1, Qianyi Sun1, Boxing Chen1, Dong Li1, Xu He1, Quan He1, Feng Wen1, Jianye Hao1, Jun Yao1
MindStar (M*) is a novel framework designed to enhance the reasoning capabilities of pre-trained Large Language Models (LLMs) during inference time. Inspired by the observation that LLMs can produce correct answers but struggle to select the optimal reasoning path, M* formulates reasoning tasks as search problems and proposes two search algorithms—beam search and Levin Tree Search (LevinTS)—to identify the best reasoning paths. The framework uses a process-supervised reward model (PRM) to evaluate the correctness of intermediate steps, guiding the LLM to select the most faithful reasoning paths. Evaluations on the GSM8K and MATH datasets show that M* significantly improves the reasoning performance of open-source models like LLaMA-2-13B, achieving comparable results to closed-source models like GPT-3.5 and Grok-1, while reducing computational costs. The framework demonstrates the potential for enhancing LLMs' reasoning abilities without the need for extensive fine-tuning, making these models more accessible and efficient in real-world applications.MindStar (M*) is a novel framework designed to enhance the reasoning capabilities of pre-trained Large Language Models (LLMs) during inference time. Inspired by the observation that LLMs can produce correct answers but struggle to select the optimal reasoning path, M* formulates reasoning tasks as search problems and proposes two search algorithms—beam search and Levin Tree Search (LevinTS)—to identify the best reasoning paths. The framework uses a process-supervised reward model (PRM) to evaluate the correctness of intermediate steps, guiding the LLM to select the most faithful reasoning paths. Evaluations on the GSM8K and MATH datasets show that M* significantly improves the reasoning performance of open-source models like LLaMA-2-13B, achieving comparable results to closed-source models like GPT-3.5 and Grok-1, while reducing computational costs. The framework demonstrates the potential for enhancing LLMs' reasoning abilities without the need for extensive fine-tuning, making these models more accessible and efficient in real-world applications.
Reach us at info@study.space
Understanding MindStar%3A Enhancing Math Reasoning in Pre-trained LLMs at Inference Time