CODES: Natural Language to Code Repository via Multi-Layer Sketch

CODES: Natural Language to Code Repository via Multi-Layer Sketch

25 Mar 2024 | Daoguang Zan, Ailun Yu, Wei Liu, Dong Chen, Bo Shen, Wei Li, Yafen Yao, Yongshun Gong, Xiaolin Chen, Bei Guan, Zhiguang Yang, Yongji Wang, Qianxiang Wang, Lizhen Cui
This paper introduces a new software engineering task, Natural Language to Code Repository (NL2Repo), which aims to generate an entire code repository from natural language requirements. To address this task, the authors propose a multi-layer sketch-based framework called CODES, which decomposes NL2Repo into three phases: RepoSketcher, FileSketcher, and SketchFiller. RepoSketcher generates a repository's directory structure, FileSketcher creates a file sketch with empty function bodies, and SketchFiller fills in the function details. The framework is implemented using prompt engineering and supervised fine-tuning. To evaluate CODES, the authors create a benchmark called SketchEval, which includes 19 real-world repositories and a metric called SketchBLEU to assess repository similarity. The results show that CODES significantly outperforms baselines in generating code repositories, particularly in complex tasks. The authors also develop a VSCode plugin for CODES and conduct empirical studies with 30 participants, demonstrating the practicality of the framework. The study highlights the effectiveness of multi-layer sketching in improving code generation and the importance of instruction fine-tuning for better performance. The results indicate that CODES can effectively generate code repositories, making it a promising approach for automated software development.This paper introduces a new software engineering task, Natural Language to Code Repository (NL2Repo), which aims to generate an entire code repository from natural language requirements. To address this task, the authors propose a multi-layer sketch-based framework called CODES, which decomposes NL2Repo into three phases: RepoSketcher, FileSketcher, and SketchFiller. RepoSketcher generates a repository's directory structure, FileSketcher creates a file sketch with empty function bodies, and SketchFiller fills in the function details. The framework is implemented using prompt engineering and supervised fine-tuning. To evaluate CODES, the authors create a benchmark called SketchEval, which includes 19 real-world repositories and a metric called SketchBLEU to assess repository similarity. The results show that CODES significantly outperforms baselines in generating code repositories, particularly in complex tasks. The authors also develop a VSCode plugin for CODES and conduct empirical studies with 30 participants, demonstrating the practicality of the framework. The study highlights the effectiveness of multi-layer sketching in improving code generation and the importance of instruction fine-tuning for better performance. The results indicate that CODES can effectively generate code repositories, making it a promising approach for automated software development.
Reach us at info@study.space