5 Feb 2024 | Shihan Dou, Yan Liu, Haoxiang Jia, Limao Xiong, Enyu Zhou, Wei Shen, Junjie Shan, Caishuang Huang, Xiao Wang, Xiaoran Fan, Zhiheng Xi, Yuhao Zhou, Tao Ji, Rui Zheng, Qi Zhang, Xuanjing Huang, Tao Gui
The paper "StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback" introduces a novel reinforcement learning (RL) framework, StepCoder, designed to enhance code generation quality. The framework consists of two main components: Curriculum of Code Completion Subtasks (CCCS) and Fine-Grained Optimization (FGO). CCCS breaks down complex code generation tasks into a curriculum of simpler sub-tasks, making exploration more manageable. FGO optimizes the model by masking unexecuted code segments, ensuring that only relevant code is used for training. The authors also developed APPS+, a high-quality dataset specifically curated for code generation, which includes 7,456 instances. Experimental results show that StepCoder significantly improves the exploration efficiency and effectiveness of code generation, outperforming other RL-based methods on benchmarks such as MBPP and HumanEval. The paper highlights the effectiveness of using compiler feedback to guide RL in code generation, demonstrating that StepCoder can effectively navigate the output space and improve the quality of generated code.The paper "StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback" introduces a novel reinforcement learning (RL) framework, StepCoder, designed to enhance code generation quality. The framework consists of two main components: Curriculum of Code Completion Subtasks (CCCS) and Fine-Grained Optimization (FGO). CCCS breaks down complex code generation tasks into a curriculum of simpler sub-tasks, making exploration more manageable. FGO optimizes the model by masking unexecuted code segments, ensuring that only relevant code is used for training. The authors also developed APPS+, a high-quality dataset specifically curated for code generation, which includes 7,456 instances. Experimental results show that StepCoder significantly improves the exploration efficiency and effectiveness of code generation, outperforming other RL-based methods on benchmarks such as MBPP and HumanEval. The paper highlights the effectiveness of using compiler feedback to guide RL in code generation, demonstrating that StepCoder can effectively navigate the output space and improve the quality of generated code.