27 Feb 2023 | Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong
CODEGEN is an open-source large language model for code, trained on natural language and programming data, with a parameter size up to 16.1B. It is accompanied by the open-source training library JAXFORMER. The model is evaluated on the HumanEval benchmark, where it performs competitively with previous state-of-the-art models. The paper introduces a multi-turn programming benchmark (MTPB), consisting of 115 diverse problem sets that are factorized into multi-turn prompts. Analysis shows that multi-turn prompts significantly improve program synthesis compared to single-turn prompts. The model is trained on three datasets: THEPILE, BIGQUERY, and BIGPYTHON, with varying sizes and programming languages. The model's performance improves with increasing size and data size, and multi-turn program synthesis capacity emerges as the model scales. The paper also demonstrates that multi-turn factorization enhances user intent understanding, leading to better program synthesis. The model is open-sourced, allowing further research and practical applications. The study shows that multi-step program synthesis capacity scales with model and data size, and that multi-turn prompts lead to higher program synthesis accuracy. The paper also discusses related work in program synthesis and large language models, and highlights the broader impact and ethical considerations of the model.CODEGEN is an open-source large language model for code, trained on natural language and programming data, with a parameter size up to 16.1B. It is accompanied by the open-source training library JAXFORMER. The model is evaluated on the HumanEval benchmark, where it performs competitively with previous state-of-the-art models. The paper introduces a multi-turn programming benchmark (MTPB), consisting of 115 diverse problem sets that are factorized into multi-turn prompts. Analysis shows that multi-turn prompts significantly improve program synthesis compared to single-turn prompts. The model is trained on three datasets: THEPILE, BIGQUERY, and BIGPYTHON, with varying sizes and programming languages. The model's performance improves with increasing size and data size, and multi-turn program synthesis capacity emerges as the model scales. The paper also demonstrates that multi-turn factorization enhances user intent understanding, leading to better program synthesis. The model is open-sourced, allowing further research and practical applications. The study shows that multi-step program synthesis capacity scales with model and data size, and that multi-turn prompts lead to higher program synthesis accuracy. The paper also discusses related work in program synthesis and large language models, and highlights the broader impact and ethical considerations of the model.