On Evaluating the Efficiency of Source Code Generated by LLMs

On Evaluating the Efficiency of Source Code Generated by LLMs

April 14, 2024, Lisbon, Portugal | Changan Niu, Ting Zhang, Chuanyi Li, Bin Luo, Vincent Ng
This paper evaluates the efficiency of code generated by large language models (LLMs) and proposes methods to improve it. Unlike previous work that focuses on the correctness of LLM-generated code, this study emphasizes the importance of code efficiency, which can enhance program performance and execution speed. The authors use two benchmarks, HumanEval and MBPP, to measure the efficiency of LLM-generated code and conduct a more extensive evaluation on LeetCode problems. They find that the efficiency of LLM-generated code is not positively correlated with the model's performance in generating correct code or its size. Instead, training strategy and data distribution play a crucial role. The study also explores different prompts to guide LLMs in generating more efficient code, finding that step-by-step prompts are particularly effective for complex problems. The paper contributes to the field by providing a benchmark for comparing code correctness and efficiency, proposing methods to improve LLM-generated code efficiency, and offering insights into the factors influencing code efficiency.This paper evaluates the efficiency of code generated by large language models (LLMs) and proposes methods to improve it. Unlike previous work that focuses on the correctness of LLM-generated code, this study emphasizes the importance of code efficiency, which can enhance program performance and execution speed. The authors use two benchmarks, HumanEval and MBPP, to measure the efficiency of LLM-generated code and conduct a more extensive evaluation on LeetCode problems. They find that the efficiency of LLM-generated code is not positively correlated with the model's performance in generating correct code or its size. Instead, training strategy and data distribution play a crucial role. The study also explores different prompts to guide LLMs in generating more efficient code, finding that step-by-step prompts are particularly effective for complex problems. The paper contributes to the field by providing a benchmark for comparing code correctness and efficiency, proposing methods to improve LLM-generated code efficiency, and offering insights into the factors influencing code efficiency.
Reach us at info@study.space