CYCLE: Learning to Self-Refine the Code Generation

CYCLE: Learning to Self-Refine the Code Generation

April 2024 | YANGRUIBO DING, MARCUS J. MIN, GAIL KAISER, BAISHAKHI RAY
CYCLE: Learning to Self-Refine the Code Generation Pre-trained code language models (code LMs) have achieved promising performance in code generation and improved the programming efficiency of human developers. However, their self-refinement capability is typically overlooked by existing evaluations, which focus only on the accuracy of one-time predictions. When code LMs fail to implement the correct program, developers find it hard to debug and fix the faulty prediction since it is not written by them. Our study reveals that code LMs cannot efficiently self-refine their faulty generations. In this paper, we propose CYCLE, a framework that teaches code LMs to self-refine their code generation based on available feedback, such as execution results from test suites. We evaluate CYCLE on three popular code generation benchmarks: HumanEval, MBPP, and APPS. Results show that CYCLE maintains or improves the quality of one-time code generation while significantly enhancing the self-refinement capability of code LMs. We implement four variants of CYCLE with parameter sizes of 350M, 1B, 2.7B, and 3B, and experiments show that CYCLE consistently boosts code generation performance by up to 63.5% across benchmarks and model sizes. CYCLE also outperforms code LMs with 3× more parameters in self-refinement. CYCLE's approach involves three phases: data preparation for self-refinement, learning to refine faulty code, and iterative self-refinement with execution feedback. In the first phase, we collect data by prompting pre-trained code LMs to generate code and then refining their faulty predictions based on execution feedback. In the second phase, we train the model to refine code by jointly attending to problem descriptions, faulty code, and execution feedback. In the third phase, we implement an iterative self-refinement workflow that mimics human developers' iterative programming practice. Our results show that CYCLE significantly improves code generation performance compared to existing code LMs. CYCLE-350M outperforms StarCoder-1B across all three benchmarks, and CYCLE-1B matches StarCoder-3B. CYCLE is effective at capturing execution feedback and has great potential to assist human developers with iterative programming. We make the following contributions: (1) We shed light on the weaknesses of code LMs in self-refinement, revealing that these models are not effective at understanding execution feedback and correcting their own mistakes. (2) We propose CYCLE, a framework that enhances code LMs' generation performance by learning to refine their own code. (3) We conduct extensive experiments on three popular code generation benchmarks and show that CYCLE consistently increases code generation performance by up to 63.5%. (4) We perform in-depth analysis to discuss CYCLE's design and performance, providing insights to motivate further research in improving code LMs' self-refinement capability.CYCLE: Learning to Self-Refine the Code Generation Pre-trained code language models (code LMs) have achieved promising performance in code generation and improved the programming efficiency of human developers. However, their self-refinement capability is typically overlooked by existing evaluations, which focus only on the accuracy of one-time predictions. When code LMs fail to implement the correct program, developers find it hard to debug and fix the faulty prediction since it is not written by them. Our study reveals that code LMs cannot efficiently self-refine their faulty generations. In this paper, we propose CYCLE, a framework that teaches code LMs to self-refine their code generation based on available feedback, such as execution results from test suites. We evaluate CYCLE on three popular code generation benchmarks: HumanEval, MBPP, and APPS. Results show that CYCLE maintains or improves the quality of one-time code generation while significantly enhancing the self-refinement capability of code LMs. We implement four variants of CYCLE with parameter sizes of 350M, 1B, 2.7B, and 3B, and experiments show that CYCLE consistently boosts code generation performance by up to 63.5% across benchmarks and model sizes. CYCLE also outperforms code LMs with 3× more parameters in self-refinement. CYCLE's approach involves three phases: data preparation for self-refinement, learning to refine faulty code, and iterative self-refinement with execution feedback. In the first phase, we collect data by prompting pre-trained code LMs to generate code and then refining their faulty predictions based on execution feedback. In the second phase, we train the model to refine code by jointly attending to problem descriptions, faulty code, and execution feedback. In the third phase, we implement an iterative self-refinement workflow that mimics human developers' iterative programming practice. Our results show that CYCLE significantly improves code generation performance compared to existing code LMs. CYCLE-350M outperforms StarCoder-1B across all three benchmarks, and CYCLE-1B matches StarCoder-3B. CYCLE is effective at capturing execution feedback and has great potential to assist human developers with iterative programming. We make the following contributions: (1) We shed light on the weaknesses of code LMs in self-refinement, revealing that these models are not effective at understanding execution feedback and correcting their own mistakes. (2) We propose CYCLE, a framework that enhances code LMs' generation performance by learning to refine their own code. (3) We conduct extensive experiments on three popular code generation benchmarks and show that CYCLE consistently increases code generation performance by up to 63.5%. (4) We perform in-depth analysis to discuss CYCLE's design and performance, providing insights to motivate further research in improving code LMs' self-refinement capability.
Reach us at info@study.space
[slides] CYCLE%3A Learning to Self-Refine the Code Generation | StudySpace