EvoCodeBench: An Evolving Code Generation Benchmark Aligned with Real-World Code Repositories

EvoCodeBench: An Evolving Code Generation Benchmark Aligned with Real-World Code Repositories

31 Mar 2024 | Jia Li, Ge Li, Xuanming Zhang, Yihong Dong, Zhi Jin
The paper introduces EvoCodeBench, a new benchmark for evaluating Large Language Models (LLMs) in code generation tasks. EvoCodeBench addresses the limitations of existing benchmarks by aligning with real-world code repositories in multiple dimensions, offering comprehensive annotations, and providing robust evaluation metrics. The benchmark is evolving to avoid data leakage and includes 275 samples from 25 real-world repositories. The authors propose repository-level code generation, which simulates the coding process in a working repository, and evaluate 10 popular LLMs using Pass@k and Recall@k metrics. The results reveal the coding abilities of these LLMs in real-world repositories, highlighting the importance of context and domain knowledge. The paper also discusses empirical lessons, limitations, and future work, emphasizing the need for multilingual support, improved auto-generated requirements, and further exploration of context utilization.The paper introduces EvoCodeBench, a new benchmark for evaluating Large Language Models (LLMs) in code generation tasks. EvoCodeBench addresses the limitations of existing benchmarks by aligning with real-world code repositories in multiple dimensions, offering comprehensive annotations, and providing robust evaluation metrics. The benchmark is evolving to avoid data leakage and includes 275 samples from 25 real-world repositories. The authors propose repository-level code generation, which simulates the coding process in a working repository, and evaluate 10 popular LLMs using Pass@k and Recall@k metrics. The results reveal the coding abilities of these LLMs in real-world repositories, highlighting the importance of context and domain knowledge. The paper also discusses empirical lessons, limitations, and future work, emphasizing the need for multilingual support, improved auto-generated requirements, and further exploration of context utilization.
Reach us at info@study.space
Understanding EvoCodeBench%3A An Evolving Code Generation Benchmark Aligned with Real-World Code Repositories