AutoPRM: Automating Procedural Supervision for Multi-Step Reasoning via Controllable Question Decomposition

AutoPRM: Automating Procedural Supervision for Multi-Step Reasoning via Controllable Question Decomposition

18 Feb 2024 | Zhaorun Chen, Zhuokai Zhao, Zhihong Zhu, Ruiqi Zhang, Xiang Li, Bhiksha Raj, Huaxiu Yao
AutoPRM is a self-supervised framework that automates procedural supervision for multi-step reasoning by decomposing complex problems into manageable subquestions and using reinforcement learning to iteratively improve subquestion solvers. The framework employs a unified model for question decomposition (QD) and question answering (QA), reducing the need for extensive human annotations. AutoPRM first breaks down problems into subquestions using a trained QD model, then solves each subquestion with an RL-optimized QA model. The system uses context-guided decoding to ensure subquestion solvers align with the overall problem solution. Through extensive experiments on arithmetic and commonsense reasoning tasks, AutoPRM outperforms state-of-the-art methods, demonstrating improved performance on tasks such as GSM8K and MATH. AutoPRM can be easily integrated with other reasoning pipelines and shows promise in enhancing reasoning capabilities for smaller models. The framework addresses challenges in multi-step reasoning by automating question decomposition and providing precise, unbiased feedback to improve reasoning accuracy.AutoPRM is a self-supervised framework that automates procedural supervision for multi-step reasoning by decomposing complex problems into manageable subquestions and using reinforcement learning to iteratively improve subquestion solvers. The framework employs a unified model for question decomposition (QD) and question answering (QA), reducing the need for extensive human annotations. AutoPRM first breaks down problems into subquestions using a trained QD model, then solves each subquestion with an RL-optimized QA model. The system uses context-guided decoding to ensure subquestion solvers align with the overall problem solution. Through extensive experiments on arithmetic and commonsense reasoning tasks, AutoPRM outperforms state-of-the-art methods, demonstrating improved performance on tasks such as GSM8K and MATH. AutoPRM can be easily integrated with other reasoning pipelines and shows promise in enhancing reasoning capabilities for smaller models. The framework addresses challenges in multi-step reasoning by automating question decomposition and providing precise, unbiased feedback to improve reasoning accuracy.
Reach us at info@study.space