[slides] How do Large Language Models Handle Multilingualism%3F

This paper explores how large language models (LLMs) handle multilingualism, proposing a multilingual workflow (MWork) to explain their processing mechanism. MWork hypothesizes that LLMs initially convert multilingual inputs into English, think in English using self-attention, incorporate multilingual knowledge through feed-forward structures, and generate responses in the original language. To validate MWork, the authors introduce Parallel Language-specific Neuron Detection (PLND), a method to identify language-specific neurons without labeled data. Extensive experiments using PLND show that deactivating these neurons significantly impacts multilingual performance, confirming the role of language-specific neurons. Additionally, fine-tuning these neurons with a small dataset enhances multilingual capabilities in specific languages without affecting others, achieving average improvements of 3.6% for high-resource languages and 2.3% for low-resource languages. The findings provide insights into the multilingual processing mechanisms of LLMs and offer effective methods for enhancing multilingual capabilities.This paper explores how large language models (LLMs) handle multilingualism, proposing a multilingual workflow (MWork) to explain their processing mechanism. MWork hypothesizes that LLMs initially convert multilingual inputs into English, think in English using self-attention, incorporate multilingual knowledge through feed-forward structures, and generate responses in the original language. To validate MWork, the authors introduce Parallel Language-specific Neuron Detection (PLND), a method to identify language-specific neurons without labeled data. Extensive experiments using PLND show that deactivating these neurons significantly impacts multilingual performance, confirming the role of language-specific neurons. Additionally, fine-tuning these neurons with a small dataset enhances multilingual capabilities in specific languages without affecting others, achieving average improvements of 3.6% for high-resource languages and 2.3% for low-resource languages. The findings provide insights into the multilingual processing mechanisms of LLMs and offer effective methods for enhancing multilingual capabilities.

How do Large Language Models Handle Multilingualism?

24 May 2024 | Yiran Zhao, Wenxuan Zhang, Guizhen Chen, Kenji Kawaguchi, Lidong Bing