The Language Barrier: Dissecting Safety Challenges of LLMs in Multilingual Contexts

The Language Barrier: Dissecting Safety Challenges of LLMs in Multilingual Contexts

23 Jan 2024 | Lingfeng Shen, Weiting Tan, Sihao Chen, Yunmo Chen, Jingyu Zhang, Haoran Xu, Boyuan Zheng, Philipp Koehn, Daniel Khashabi
This paper examines the safety challenges faced by large language models (LLMs) in multilingual contexts, particularly focusing on the differences in response quality when prompted with malicious content in high- and low-resource languages. The authors observe that LLMs are more likely to generate harmful responses and less relevant responses when the malicious prompts are in lower-resource languages. They investigate the effectiveness of common alignment techniques, such as supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF), in addressing these issues. The results show that while SFT and RLHF can improve safety in high-resource languages, they have minimal impact on low-resource languages, indicating that the pre-training stage is a critical bottleneck for cross-lingual alignment. The study highlights the need for dedicated resources and pre-training to address the limitations of LLMs in low-resource languages, emphasizing the importance of inclusive and safe LLMs in global communities.This paper examines the safety challenges faced by large language models (LLMs) in multilingual contexts, particularly focusing on the differences in response quality when prompted with malicious content in high- and low-resource languages. The authors observe that LLMs are more likely to generate harmful responses and less relevant responses when the malicious prompts are in lower-resource languages. They investigate the effectiveness of common alignment techniques, such as supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF), in addressing these issues. The results show that while SFT and RLHF can improve safety in high-resource languages, they have minimal impact on low-resource languages, indicating that the pre-training stage is a critical bottleneck for cross-lingual alignment. The study highlights the need for dedicated resources and pre-training to address the limitations of LLMs in low-resource languages, emphasizing the importance of inclusive and safe LLMs in global communities.
Reach us at info@study.space