A Survey on Large Language Models for Code Generation

A Survey on Large Language Models for Code Generation

September 2024 | Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, and Sunghun Kim
A survey on large language models for code generation is presented, aiming to fill the gap in comprehensive literature reviews on this topic. The paper provides a systematic review of recent developments in large language models (LLMs) for code generation, including data curation, performance evaluation, and real-world applications. It introduces a taxonomy to categorize and discuss these developments, covering aspects such as data curation, advanced topics, evaluation methods, and practical applications. The paper also presents a historical overview of the evolution of LLMs for code generation and offers an empirical comparison using the HumanEval and MBPP benchmarks to highlight the progress in LLM capabilities for code generation. It identifies critical challenges and promising opportunities in bridging the gap between academia and practical development. A dedicated resource website is established to continuously document and disseminate the most recent advances in the field. The paper discusses the architecture of code LLMs, including encoder-only models, decoder-only models, and encoder-decoder models. It also explores the key modules of the Transformer layers in code LLMs, such as multi-head self-attention modules, position-wise feed-forward networks, residual connections and normalization, and positional encoding. The paper also addresses the task of code generation, which involves generating source code from natural language descriptions. It discusses the use of in-context learning to enhance code generation performance and the various decoding strategies for code generation, including deterministic-based strategies and sampling-based strategies. The paper presents a taxonomy of LLMs for code generation, categorizing them into different types based on their application and performance. It discusses the development of code generation models, including the use of synthetic data to address the challenges of data scarcity and privacy concerns. The paper also explores the use of benchmark datasets to evaluate the performance of code generation models, including HumanEval, MBPP, CoNaLa, Spider, CONCODE, ODEX, CoderEval, ReCode, StudentEval, APPS, CodeContests, DSP, DS-1000, ExeDS, MBXP, Multilingual HumanEval, HumanEval-X, MultiPL-E, xCodeEval, MathQA-X, MathQA-Python, GSM8K, and GSM-HARD. The paper also discusses the use of repository-level code completion benchmarks, such as RepoEval, Stack-Repo, Repobench, EvoCodeBench, SWE-bench, CrossCodeEval, and SketchEval. The paper highlights the importance of synthetic data in enhancing the performance of code generation models and the potential of LLMs in practical applications.A survey on large language models for code generation is presented, aiming to fill the gap in comprehensive literature reviews on this topic. The paper provides a systematic review of recent developments in large language models (LLMs) for code generation, including data curation, performance evaluation, and real-world applications. It introduces a taxonomy to categorize and discuss these developments, covering aspects such as data curation, advanced topics, evaluation methods, and practical applications. The paper also presents a historical overview of the evolution of LLMs for code generation and offers an empirical comparison using the HumanEval and MBPP benchmarks to highlight the progress in LLM capabilities for code generation. It identifies critical challenges and promising opportunities in bridging the gap between academia and practical development. A dedicated resource website is established to continuously document and disseminate the most recent advances in the field. The paper discusses the architecture of code LLMs, including encoder-only models, decoder-only models, and encoder-decoder models. It also explores the key modules of the Transformer layers in code LLMs, such as multi-head self-attention modules, position-wise feed-forward networks, residual connections and normalization, and positional encoding. The paper also addresses the task of code generation, which involves generating source code from natural language descriptions. It discusses the use of in-context learning to enhance code generation performance and the various decoding strategies for code generation, including deterministic-based strategies and sampling-based strategies. The paper presents a taxonomy of LLMs for code generation, categorizing them into different types based on their application and performance. It discusses the development of code generation models, including the use of synthetic data to address the challenges of data scarcity and privacy concerns. The paper also explores the use of benchmark datasets to evaluate the performance of code generation models, including HumanEval, MBPP, CoNaLa, Spider, CONCODE, ODEX, CoderEval, ReCode, StudentEval, APPS, CodeContests, DSP, DS-1000, ExeDS, MBXP, Multilingual HumanEval, HumanEval-X, MultiPL-E, xCodeEval, MathQA-X, MathQA-Python, GSM8K, and GSM-HARD. The paper also discusses the use of repository-level code completion benchmarks, such as RepoEval, Stack-Repo, Repobench, EvoCodeBench, SWE-bench, CrossCodeEval, and SketchEval. The paper highlights the importance of synthetic data in enhancing the performance of code generation models and the potential of LLMs in practical applications.
Reach us at info@study.space
Understanding A Survey on Large Language Models for Code Generation