MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time

MindStar: Enhancing Math Reasoning in Pre-trained LLMs at Inference Time

2024-06-26 | Jikun Kang*, Derek Li*, Xi Chen, Amirreza Kazemi, Qianyi Sun, Boxing Chen, Dong Li, Xu He, Quan He, Feng Wen, Jianye Hao, Jun Yao
MindStar is a novel framework designed to enhance the reasoning capabilities of large language models (LLMs) during inference time. It addresses the challenge of complex reasoning tasks, such as mathematical problem-solving, by formulating reasoning as a search problem. The framework uses a process-supervised reward model (PRM) to evaluate the likelihood of correctness of each reasoning step, enabling the model to select the most promising paths. MindStar employs two search algorithms—beam search and Levin tree search—to efficiently navigate the reasoning tree and find optimal solutions. The framework is evaluated on the GSM8K and MATH datasets, demonstrating significant improvements in reasoning performance. For instance, using LLaMA-2-13B as the base model, MindStar improves performance on the MATH dataset from 8% to 33%, achieving results comparable to GPT-3.5 but with substantially reduced computational resources. The method also shows effectiveness in scaling with tree size and model size, and it outperforms existing methods in terms of accuracy and efficiency. MindStar's approach is distinct from traditional methods that rely on extensive training data or fine-tuning. Instead, it leverages inference-time search and feedback from a reward model to guide the reasoning process. This approach allows for enhanced reasoning without requiring additional training, making it a promising solution for improving the reasoning abilities of LLMs. The framework's results highlight the benefits of shifting computational resources from training to inference, offering a more efficient and effective way to enhance LLM reasoning capabilities.MindStar is a novel framework designed to enhance the reasoning capabilities of large language models (LLMs) during inference time. It addresses the challenge of complex reasoning tasks, such as mathematical problem-solving, by formulating reasoning as a search problem. The framework uses a process-supervised reward model (PRM) to evaluate the likelihood of correctness of each reasoning step, enabling the model to select the most promising paths. MindStar employs two search algorithms—beam search and Levin tree search—to efficiently navigate the reasoning tree and find optimal solutions. The framework is evaluated on the GSM8K and MATH datasets, demonstrating significant improvements in reasoning performance. For instance, using LLaMA-2-13B as the base model, MindStar improves performance on the MATH dataset from 8% to 33%, achieving results comparable to GPT-3.5 but with substantially reduced computational resources. The method also shows effectiveness in scaling with tree size and model size, and it outperforms existing methods in terms of accuracy and efficiency. MindStar's approach is distinct from traditional methods that rely on extensive training data or fine-tuning. Instead, it leverages inference-time search and feedback from a reward model to guide the reasoning process. This approach allows for enhanced reasoning without requiring additional training, making it a promising solution for improving the reasoning abilities of LLMs. The framework's results highlight the benefits of shifting computational resources from training to inference, offering a more efficient and effective way to enhance LLM reasoning capabilities.
Reach us at info@study.space