Exploring Multi-Lingual Bias of Large Code Models in Code Generation

Exploring Multi-Lingual Bias of Large Code Models in Code Generation

30 Apr 2024 | Chaozheng Wang†, Zongjie Li‡, Cuiyun Gao†*, Wenxuan Wang† Ting Peng§, Hailiang Huang§, Yuetang Deng§, Shuai Wang†, Michael R. Lyu†
This paper investigates the multi-lingual bias in large code models (LCMs) for code generation tasks. The authors construct the first multi-lingual evaluation benchmark, HumanEval-X, to systematically assess the multi-lingual bias in LCMs. They find that LCMs exhibit significant multi-lingual bias, particularly in generating solutions for instructions in Chinese compared to English, with a performance drop of at least 13% in the Pass@1 metric. Additionally, LCMs perform differently across various programming languages (PLs), with a performance gap of up to 20.9% between Python and C++. The paper explores methods to mitigate this bias, including translation strategies and instruction tuning. Translation strategies, such as one-step and multi-step translation, effectively reduce the multi-lingual bias, while instruction tuning with a multi-lingual dataset (MEIC) significantly improves LCMs' performance and reduces the bias. The study concludes with insights for researchers and developers to improve LCMs' multi-lingual capabilities and reduce bias.This paper investigates the multi-lingual bias in large code models (LCMs) for code generation tasks. The authors construct the first multi-lingual evaluation benchmark, HumanEval-X, to systematically assess the multi-lingual bias in LCMs. They find that LCMs exhibit significant multi-lingual bias, particularly in generating solutions for instructions in Chinese compared to English, with a performance drop of at least 13% in the Pass@1 metric. Additionally, LCMs perform differently across various programming languages (PLs), with a performance gap of up to 20.9% between Python and C++. The paper explores methods to mitigate this bias, including translation strategies and instruction tuning. Translation strategies, such as one-step and multi-step translation, effectively reduce the multi-lingual bias, while instruction tuning with a multi-lingual dataset (MEIC) significantly improves LCMs' performance and reduces the bias. The study concludes with insights for researchers and developers to improve LCMs' multi-lingual capabilities and reduce bias.
Reach us at info@study.space