The Language Barrier: Dissecting Safety Challenges of LLMs in Multilingual Contexts

The Language Barrier: Dissecting Safety Challenges of LLMs in Multilingual Contexts

23 Jan 2024 | Lingfeng Shen, Weiting Tan, Sihao Chen, Yunmo Chen, Jingyu Zhang, Haoran Xu, Boyuan Zheng, Philipp Koehn, Daniel Khashabi
This paper investigates the safety challenges of large language models (LLMs) in multilingual contexts, focusing on the differences in safety issues across languages. The study compares how state-of-the-art LLMs respond to malicious prompts in high- and low-resource languages, revealing that LLMs are more likely to generate harmful responses when prompted with low-resource language instructions. Additionally, LLMs tend to produce less relevant responses to such prompts. The research also finds that training LLMs with high-resource languages improves alignment, while training with low-resource languages yields minimal improvement, suggesting that the bottleneck of cross-lingual alignment lies in the pretraining stage. The study identifies two key safety-related curses in LLMs when faced with low-resource languages: the harmfulness curse and the relevance curse. The harmfulness curse refers to the increased likelihood of harmful responses in low-resource languages, while the relevance curse refers to the reduced ability of LLMs to follow instructions in such languages. The research evaluates the effectiveness of common alignment techniques, such as supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF), in addressing these curses. However, the results show that these techniques are less effective in low-resource languages, indicating that the challenges of resolving these curses through alignment are significant. The study also explores the origin of these curses, attributing them to the limited low-resource data available during the pretraining phase. The findings highlight the difficulties in ensuring the safety and alignment of LLMs in low-resource languages. The paper concludes that multilingual pre-training can help alleviate these issues, and that further research is needed to address the challenges of cross-lingual LLM safety.This paper investigates the safety challenges of large language models (LLMs) in multilingual contexts, focusing on the differences in safety issues across languages. The study compares how state-of-the-art LLMs respond to malicious prompts in high- and low-resource languages, revealing that LLMs are more likely to generate harmful responses when prompted with low-resource language instructions. Additionally, LLMs tend to produce less relevant responses to such prompts. The research also finds that training LLMs with high-resource languages improves alignment, while training with low-resource languages yields minimal improvement, suggesting that the bottleneck of cross-lingual alignment lies in the pretraining stage. The study identifies two key safety-related curses in LLMs when faced with low-resource languages: the harmfulness curse and the relevance curse. The harmfulness curse refers to the increased likelihood of harmful responses in low-resource languages, while the relevance curse refers to the reduced ability of LLMs to follow instructions in such languages. The research evaluates the effectiveness of common alignment techniques, such as supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF), in addressing these curses. However, the results show that these techniques are less effective in low-resource languages, indicating that the challenges of resolving these curses through alignment are significant. The study also explores the origin of these curses, attributing them to the limited low-resource data available during the pretraining phase. The findings highlight the difficulties in ensuring the safety and alignment of LLMs in low-resource languages. The paper concludes that multilingual pre-training can help alleviate these issues, and that further research is needed to address the challenges of cross-lingual LLM safety.
Reach us at info@study.space
[slides and audio] The Language Barrier%3A Dissecting Safety Challenges of LLMs in Multilingual Contexts