11 Jun 2024 | Linzhen Chai, Shukai Liu, Jian Yang, Yuwei Yin, Ke Jin, Jiaheng Liu, Tao Sun, Ge Zhang, Changyu Ren, Hongcheng Guo, Zekun Wang, Boyang Wang, Xianjie Wu, Bing Wang, Tongliang Li, Liquan Yang, Sufeng Duan, Zhoujun Li
The paper introduces McEval, a massively multilingual code evaluation benchmark that covers 40 programming languages and includes 16K test samples. McEval aims to address the limitations of existing benchmarks, which primarily focus on Python and lack diversity in other languages. The benchmark comprises three key tasks: code generation, code explanation, and code completion. To create McEval, a comprehensive human annotation process was conducted, ensuring accuracy and consistency. The benchmark is accompanied by McEval-INSTRUCT, a multilingual instruction corpus, and mCODER, a multilingual coder trained on McEval-INSTRUCT. Extensive experiments on McEval show significant performance gaps between open-source and closed-source models, highlighting the need for more robust multilingual code evaluation frameworks. The paper also discusses the challenges and future directions in multilingual code evaluation, emphasizing the importance of comprehensive and realistic evaluations to advance the field of code LLMs.The paper introduces McEval, a massively multilingual code evaluation benchmark that covers 40 programming languages and includes 16K test samples. McEval aims to address the limitations of existing benchmarks, which primarily focus on Python and lack diversity in other languages. The benchmark comprises three key tasks: code generation, code explanation, and code completion. To create McEval, a comprehensive human annotation process was conducted, ensuring accuracy and consistency. The benchmark is accompanied by McEval-INSTRUCT, a multilingual instruction corpus, and mCODER, a multilingual coder trained on McEval-INSTRUCT. Extensive experiments on McEval show significant performance gaps between open-source and closed-source models, highlighting the need for more robust multilingual code evaluation frameworks. The paper also discusses the challenges and future directions in multilingual code evaluation, emphasizing the importance of comprehensive and realistic evaluations to advance the field of code LLMs.