Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers

Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers

7 Apr 2024 | Libo Qin, Qiguang Chen, Yuhang Zhou, Zhi Chen, Yinghui Li, Lizi Liao, Min Li, Wanxiang Che, Philip S. Yu
This paper presents a comprehensive survey of multilingual large language models (MLLMs), focusing on their resources, taxonomy, and future directions. The authors provide a detailed review of recent advancements in MLLMs, highlighting two main alignment strategies: parameter-tuning alignment and parameter-frozen alignment. Parameter-tuning alignment involves adjusting model parameters during pre-training, supervised fine-tuning, reinforcement learning from human feedback, and downstream fine-tuning to improve cross-lingual performance. Parameter-frozen alignment, on the other hand, relies on prompting strategies without parameter tuning to achieve alignment across languages. The paper categorizes MLLMs into two types based on alignment strategies: parameter-tuning alignment and parameter-frozen alignment. It discusses various data resources used in pre-training, supervised fine-tuning, and reinforcement learning from human feedback, including multilingual corpora, benchmark datasets, and instruction-based data. The authors also introduce a new taxonomy for MLLMs, offering a unified perspective for understanding the field. The paper highlights emerging trends and challenges in MLLMs, including hallucination, knowledge editing, safety, fairness, language extension, and multi-modality extension. It emphasizes the need for further research in these areas to improve the performance and applicability of MLLMs. The authors also provide a curated list of open-source resources, including papers, data corpora, and leaderboards, to support the research community. The contributions of this work include the first survey of MLLMs based on multi-lingual alignment, a new taxonomy for MLLMs, identification of emerging frontiers and challenges, and a collection of abundant open-source resources. The paper aims to serve as a valuable resource for researchers and inspire more breakthroughs in the field of MLLMs.This paper presents a comprehensive survey of multilingual large language models (MLLMs), focusing on their resources, taxonomy, and future directions. The authors provide a detailed review of recent advancements in MLLMs, highlighting two main alignment strategies: parameter-tuning alignment and parameter-frozen alignment. Parameter-tuning alignment involves adjusting model parameters during pre-training, supervised fine-tuning, reinforcement learning from human feedback, and downstream fine-tuning to improve cross-lingual performance. Parameter-frozen alignment, on the other hand, relies on prompting strategies without parameter tuning to achieve alignment across languages. The paper categorizes MLLMs into two types based on alignment strategies: parameter-tuning alignment and parameter-frozen alignment. It discusses various data resources used in pre-training, supervised fine-tuning, and reinforcement learning from human feedback, including multilingual corpora, benchmark datasets, and instruction-based data. The authors also introduce a new taxonomy for MLLMs, offering a unified perspective for understanding the field. The paper highlights emerging trends and challenges in MLLMs, including hallucination, knowledge editing, safety, fairness, language extension, and multi-modality extension. It emphasizes the need for further research in these areas to improve the performance and applicability of MLLMs. The authors also provide a curated list of open-source resources, including papers, data corpora, and leaderboards, to support the research community. The contributions of this work include the first survey of MLLMs based on multi-lingual alignment, a new taxonomy for MLLMs, identification of emerging frontiers and challenges, and a collection of abundant open-source resources. The paper aims to serve as a valuable resource for researchers and inspire more breakthroughs in the field of MLLMs.
Reach us at info@study.space
Understanding Multilingual Large Language Model%3A A Survey of Resources%2C Taxonomy and Frontiers