A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers

A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers

17 May 2024 | Kaiyu Huang, Fengran Mo, Hongliang Li, You Li, Yuanchi Zhang, Weijian Yi, Yulong Mao, Jinchen Liu, Yuzhuang Xu, Jianan Xu, Jian-Yun Nie, Yang Liu
This chapter provides a comprehensive survey of large language models (LLMs) with multilingual capabilities, focusing on recent advances, challenges, and future directions. The rapid development of LLMs, such as GPT-3.5, GPT-4, and LLaMA, has revolutionized natural language processing (NLP) by enabling advanced tasks like machine translation, text summarization, and sentiment analysis. However, the multilingual capabilities of these models remain underexplored, particularly in low-resource languages. The survey begins by rethinking the transitions between previous and current research on pre-trained language models (PLMs), highlighting the shift from "Pre-train, Fine-tune" to "Pre-train, Prompt, Predict" paradigms. It then delves into the multilingual capabilities of LLMs, covering training and inference methods, model security, multi-domain considerations, and dataset usage. Key challenges include knowledge transfer, knowledge accumulation, and domain adaptation, which are addressed with various strategies. The chapter also discusses the limitations of current multilingual models, such as the need for large-scale data, the impact of catastrophic forgetting, and the lack of cultural and domain-specific knowledge. Future research directions are proposed to enhance multilingual capabilities, including improved training strategies, architectural modifications, and sustainable learning paradigms. In the section on multilingual inference strategies, the survey explores direct inference, pre-translation inference, Chain of Thought (CoT), and Retrieval-Augmented Generation (RAG). Each method's performance and limitations are evaluated, with a focus on preserving linguistic authenticity and efficiency. The security chapter addresses the growing concerns in deploying LLMs, particularly in multilingual scenarios. It examines attack methods like Greedy Coordinate Gradient (GCG), prompt-based jailbreaks, and multilingual jailbreaks, and discusses defense mechanisms. The survey highlights the vulnerabilities in low-resource languages and the need for more robust security measures. Overall, the survey aims to provide a structured taxonomy, comprehensive reviews, and future directions to enhance the multilingual capabilities of LLMs, addressing key challenges and offering practical recommendations for real-world applications.This chapter provides a comprehensive survey of large language models (LLMs) with multilingual capabilities, focusing on recent advances, challenges, and future directions. The rapid development of LLMs, such as GPT-3.5, GPT-4, and LLaMA, has revolutionized natural language processing (NLP) by enabling advanced tasks like machine translation, text summarization, and sentiment analysis. However, the multilingual capabilities of these models remain underexplored, particularly in low-resource languages. The survey begins by rethinking the transitions between previous and current research on pre-trained language models (PLMs), highlighting the shift from "Pre-train, Fine-tune" to "Pre-train, Prompt, Predict" paradigms. It then delves into the multilingual capabilities of LLMs, covering training and inference methods, model security, multi-domain considerations, and dataset usage. Key challenges include knowledge transfer, knowledge accumulation, and domain adaptation, which are addressed with various strategies. The chapter also discusses the limitations of current multilingual models, such as the need for large-scale data, the impact of catastrophic forgetting, and the lack of cultural and domain-specific knowledge. Future research directions are proposed to enhance multilingual capabilities, including improved training strategies, architectural modifications, and sustainable learning paradigms. In the section on multilingual inference strategies, the survey explores direct inference, pre-translation inference, Chain of Thought (CoT), and Retrieval-Augmented Generation (RAG). Each method's performance and limitations are evaluated, with a focus on preserving linguistic authenticity and efficiency. The security chapter addresses the growing concerns in deploying LLMs, particularly in multilingual scenarios. It examines attack methods like Greedy Coordinate Gradient (GCG), prompt-based jailbreaks, and multilingual jailbreaks, and discusses defense mechanisms. The survey highlights the vulnerabilities in low-resource languages and the need for more robust security measures. Overall, the survey aims to provide a structured taxonomy, comprehensive reviews, and future directions to enhance the multilingual capabilities of LLMs, addressing key challenges and offering practical recommendations for real-world applications.
Reach us at info@study.space