Iterative Refinement of Project-Level Code Context for Precise Code Generation with Compiler Feedback

Iterative Refinement of Project-Level Code Context for Precise Code Generation with Compiler Feedback

11 Jun 2024 | Zhangqian Bi, Yao Wan, Zheng Wang, Hongyu Zhang, Batu Guan, Fangxin Lu, Zili Zhang, Yulei Sui, Hai Jin, Xuanhua Shi
This paper introduces CoCoGEN, a novel code generation approach that uses compiler feedback to improve the quality of code generated by large language models (LLMs). The method addresses the issue of LLM-generated code containing errors due to a lack of project-specific context, which is often too large to fit into the model's prompt. CoCoGEN first performs static analysis to identify mismatches between the generated code and the project's context. It then iteratively aligns and fixes these errors using information extracted from the code repository. The approach combines established compiler techniques with emerging generative methods, allowing developers to leverage LLMs without being overwhelmed by frequent compilation and semantic errors. CoCoGEN is evaluated on the CoderEval benchmark, which includes tasks that require project-specific context. The results show that CoCoGEN significantly improves the performance of LLMs in generating code dependent on project context, outperforming existing retrieval-based baselines by over 80% in pass rates. The method is also effective in reducing compilation errors and improving code quality. The paper also presents an empirical study of error distribution in code generation, highlighting the importance of precise and grounded program context in generating code at the project level. The results demonstrate that CoCoGEN effectively addresses compilation errors, which constitute a significant portion of errors in project-level code generation. The method is shown to be effective across various levels of context dependency, including function-level and project-level tasks. The paper also discusses the limitations of the approach, including the inability to address execution errors despite successful compilation. Overall, CoCoGEN provides a promising solution for improving the accuracy and reliability of code generated by LLMs in real-world software projects.This paper introduces CoCoGEN, a novel code generation approach that uses compiler feedback to improve the quality of code generated by large language models (LLMs). The method addresses the issue of LLM-generated code containing errors due to a lack of project-specific context, which is often too large to fit into the model's prompt. CoCoGEN first performs static analysis to identify mismatches between the generated code and the project's context. It then iteratively aligns and fixes these errors using information extracted from the code repository. The approach combines established compiler techniques with emerging generative methods, allowing developers to leverage LLMs without being overwhelmed by frequent compilation and semantic errors. CoCoGEN is evaluated on the CoderEval benchmark, which includes tasks that require project-specific context. The results show that CoCoGEN significantly improves the performance of LLMs in generating code dependent on project context, outperforming existing retrieval-based baselines by over 80% in pass rates. The method is also effective in reducing compilation errors and improving code quality. The paper also presents an empirical study of error distribution in code generation, highlighting the importance of precise and grounded program context in generating code at the project level. The results demonstrate that CoCoGEN effectively addresses compilation errors, which constitute a significant portion of errors in project-level code generation. The method is shown to be effective across various levels of context dependency, including function-level and project-level tasks. The paper also discusses the limitations of the approach, including the inability to address execution errors despite successful compilation. Overall, CoCoGEN provides a promising solution for improving the accuracy and reliability of code generated by LLMs in real-world software projects.
Reach us at info@study.space
[slides and audio] Iterative Refinement of Project-Level Code Context for Precise Code Generation with Compiler Feedback