17 Jun 2024 | Egor Bogomolov, Aleksandra Eliseeva, Timur Galimzyanov, Evgeniy Glukhov, Anton Shapkin, Maria Tigina, Yaroslav Golubev, Alexander Kovrigin, Arie van Deursen, Maliheh Izadi, Timofey Bryksin
The paper introduces Long Code Arena, a suite of six benchmarks designed to evaluate machine learning models for code processing tasks that require project-wide context. These tasks include library-based code generation, CI builds repair, project-level code completion, commit message generation, bug localization, and module summarization. Each task is accompanied by a manually verified dataset, an evaluation suite, and open-source baseline solutions based on popular large language models (LLMs). The benchmarks aim to address the limitations of existing datasets, which often have short context sizes and limited practical relevance. The datasets are collected from open-source GitHub repositories, ensuring diversity and quality. The paper details the collection methodology, evaluation setup, and baseline solutions for each task, providing a comprehensive resource for researchers in the field of Machine Learning for Software Engineering (ML4SE). The Long Code Arena is published on HuggingFace Spaces, with a leaderboard, links to datasets, and a GitHub repository for baselines.The paper introduces Long Code Arena, a suite of six benchmarks designed to evaluate machine learning models for code processing tasks that require project-wide context. These tasks include library-based code generation, CI builds repair, project-level code completion, commit message generation, bug localization, and module summarization. Each task is accompanied by a manually verified dataset, an evaluation suite, and open-source baseline solutions based on popular large language models (LLMs). The benchmarks aim to address the limitations of existing datasets, which often have short context sizes and limited practical relevance. The datasets are collected from open-source GitHub repositories, ensuring diversity and quality. The paper details the collection methodology, evaluation setup, and baseline solutions for each task, providing a comprehensive resource for researchers in the field of Machine Learning for Software Engineering (ML4SE). The Long Code Arena is published on HuggingFace Spaces, with a leaderboard, links to datasets, and a GitHub repository for baselines.