AutoPRM is a novel self-supervised framework designed to enhance the fine-tuning of large language models (LLMs) for complex reasoning tasks. The framework addresses the challenge of extensive manual labeling required for procedural feedback by automating the process of question decomposition and using reinforcement learning to iteratively improve the subquestion solver. Specifically, AutoPRM decomposes complex problems into manageable subquestions with controllable granularity, and then applies reinforcement learning to optimize the subquestion solver. Additionally, it introduces context-guided decoding to avoid reward tampering and guide the subquestion solver towards solving the holistic problem. Extensive experiments on arithmetic and commonsense reasoning datasets demonstrate that AutoPRM significantly improves performance over state-of-the-art (SOTA) methods, while being more efficient and scalable. The framework can be easily integrated with other reasoning pipelines, making it a versatile tool for enhancing LLMs' reasoning capabilities.AutoPRM is a novel self-supervised framework designed to enhance the fine-tuning of large language models (LLMs) for complex reasoning tasks. The framework addresses the challenge of extensive manual labeling required for procedural feedback by automating the process of question decomposition and using reinforcement learning to iteratively improve the subquestion solver. Specifically, AutoPRM decomposes complex problems into manageable subquestions with controllable granularity, and then applies reinforcement learning to optimize the subquestion solver. Additionally, it introduces context-guided decoding to avoid reward tampering and guide the subquestion solver towards solving the holistic problem. Extensive experiments on arithmetic and commonsense reasoning datasets demonstrate that AutoPRM significantly improves performance over state-of-the-art (SOTA) methods, while being more efficient and scalable. The framework can be easily integrated with other reasoning pipelines, making it a versatile tool for enhancing LLMs' reasoning capabilities.