11 Jun 2024 | Linzhen Chai, Shukai Liu, Jian Yang, Yuwei Yin, Ke Jin, Jiaheng Liu, Tao Sun, Ge Zhang, Changyu Ren, Hongcheng Guo, Zekun Wang, Boyang Wang, Xianjie Wu, Bing Wang, Tongliang Li, Liguang Yang, Sufeng Duan, Zhoujun Li
McEVAL: Massively Multilingual Code Evaluation
McEVAL is a new benchmark for evaluating code large language models (LLMs) in multilingual scenarios, covering 40 programming languages with 16,000 test samples. It includes three main tasks: code generation, code completion, and code explanation. The benchmark is based on a multilingual instruction corpus, MCEVAL-INSTRUCT, which contains 40 languages and is used to train a multilingual coder, MCoDETR. The benchmark also includes a leaderboard for evaluating models across 40 programming languages.
The benchmark was developed to address the limitations of existing code benchmarks, which are primarily focused on Python and lack diversity in programming languages. McEVAL provides a comprehensive evaluation of code LLMs in multilingual scenarios, including code generation, completion, and explanation tasks. The benchmark includes a large-scale instruction corpus, MCEVAL-INSTRUCT, which is used to train a multilingual coder, MCoDETR, to support multilingual programming language generation.
Extensive experiments on McEVAL show that there is still a significant gap between open-source models and closed-source LLMs (e.g., GPT-series models) in numerous languages. The instruction corpus, evaluation benchmark, and leaderboard are available at https://mceval.github.io/.
The benchmark includes three main tasks: code generation, code completion, and code explanation. The code generation task involves generating code based on problem descriptions and test cases. The code completion task involves completing code based on a prefix and suffix code snippet. The code explanation task involves explaining code in natural language and then restoring the original code.
The benchmark is based on a multilingual instruction corpus, MCEVAL-INSTRUCT, which is used to train a multilingual coder, MCoDETR. The benchmark also includes a leaderboard for evaluating models across 40 programming languages.
The benchmark was developed to address the limitations of existing code benchmarks, which are primarily focused on Python and lack diversity in programming languages. McEVAL provides a comprehensive evaluation of code LLMs in multilingual scenarios, including code generation, completion, and explanation tasks. The benchmark includes a large-scale instruction corpus, MCEVAL-INSTRUCT, which is used to train a multilingual coder, MCoDETR, to support multilingual programming language generation.McEVAL: Massively Multilingual Code Evaluation
McEVAL is a new benchmark for evaluating code large language models (LLMs) in multilingual scenarios, covering 40 programming languages with 16,000 test samples. It includes three main tasks: code generation, code completion, and code explanation. The benchmark is based on a multilingual instruction corpus, MCEVAL-INSTRUCT, which contains 40 languages and is used to train a multilingual coder, MCoDETR. The benchmark also includes a leaderboard for evaluating models across 40 programming languages.
The benchmark was developed to address the limitations of existing code benchmarks, which are primarily focused on Python and lack diversity in programming languages. McEVAL provides a comprehensive evaluation of code LLMs in multilingual scenarios, including code generation, completion, and explanation tasks. The benchmark includes a large-scale instruction corpus, MCEVAL-INSTRUCT, which is used to train a multilingual coder, MCoDETR, to support multilingual programming language generation.
Extensive experiments on McEVAL show that there is still a significant gap between open-source models and closed-source LLMs (e.g., GPT-series models) in numerous languages. The instruction corpus, evaluation benchmark, and leaderboard are available at https://mceval.github.io/.
The benchmark includes three main tasks: code generation, code completion, and code explanation. The code generation task involves generating code based on problem descriptions and test cases. The code completion task involves completing code based on a prefix and suffix code snippet. The code explanation task involves explaining code in natural language and then restoring the original code.
The benchmark is based on a multilingual instruction corpus, MCEVAL-INSTRUCT, which is used to train a multilingual coder, MCoDETR. The benchmark also includes a leaderboard for evaluating models across 40 programming languages.
The benchmark was developed to address the limitations of existing code benchmarks, which are primarily focused on Python and lack diversity in programming languages. McEVAL provides a comprehensive evaluation of code LLMs in multilingual scenarios, including code generation, completion, and explanation tasks. The benchmark includes a large-scale instruction corpus, MCEVAL-INSTRUCT, which is used to train a multilingual coder, MCoDETR, to support multilingual programming language generation.