Understanding CodeS%3A Natural Language to Code Repository via Multi-Layer Sketch

The paper introduces a new software engineering task called *Natural Language to Code Repository* (NL2Repo), which aims to generate an entire code repository from natural language requirements. To address this complex task, the authors propose a framework named CODES, which decomposes NL2Repo into multiple sub-tasks using a multi-layer sketch approach. CODES consists of three modules: RepoSketcher, FileSketcher, and SketchFiller, each responsible for generating the directory structure, file sketches, and function bodies, respectively. The framework is implemented through prompt engineering and supervised fine-tuning, leveraging off-the-shelf code LLMs such as CodeLlama, DeepSeekCoder, and GPT-3.5. To evaluate CODES, the authors create a benchmark called SketchEval, which includes 19 real-world GitHub repositories of varying complexity and introduces a metric called SketchBLEU to measure the similarity between generated and reference repositories. Extensive experiments and empirical studies, including a VSCode plugin for CODES, demonstrate the effectiveness and practicality of CODES in generating complex code repositories. The results show that CODES outperforms baselines in both benchmark and empirical evaluations, highlighting its potential in fully automated software development.The paper introduces a new software engineering task called *Natural Language to Code Repository* (NL2Repo), which aims to generate an entire code repository from natural language requirements. To address this complex task, the authors propose a framework named CODES, which decomposes NL2Repo into multiple sub-tasks using a multi-layer sketch approach. CODES consists of three modules: RepoSketcher, FileSketcher, and SketchFiller, each responsible for generating the directory structure, file sketches, and function bodies, respectively. The framework is implemented through prompt engineering and supervised fine-tuning, leveraging off-the-shelf code LLMs such as CodeLlama, DeepSeekCoder, and GPT-3.5. To evaluate CODES, the authors create a benchmark called SketchEval, which includes 19 real-world GitHub repositories of varying complexity and introduces a metric called SketchBLEU to measure the similarity between generated and reference repositories. Extensive experiments and empirical studies, including a VSCode plugin for CODES, demonstrate the effectiveness and practicality of CODES in generating complex code repositories. The results show that CODES outperforms baselines in both benchmark and empirical evaluations, highlighting its potential in fully automated software development.

CODES: Natural Language to Code Repository via Multi-Layer Sketch

25 Mar 2024 | Daoguang Zan, Ailun Yu, Wei Liu, Dong Chen, Bo Shen, Wei Li, Yafen Yao, Yongshun Gong, Xiaolin Chen, Bei Guan, Zhiguang Yang, Yongji Wang, Qianxiang Wang, Lizhen Cui