3 Jun 2024 | Wen Lai, Mohsen Mesgar, Alexander Fraser
This paper introduces xLLaMA-100 and xBLOOM-100 (collectively xLLMs-100), which extend the multilingual capabilities of LLaMA and BLOOM to 100 languages. To achieve this, the authors construct two datasets: a multilingual instruction dataset covering 100 languages and a cross-lingual human feedback dataset covering 30 languages. They perform multilingual instruction tuning on the instruction data and align the LLMs with human feedback using the DPO algorithm on the cross-lingual feedback dataset. The xLLMs-100 models are evaluated on five multilingual benchmarks, demonstrating significant improvements in both understanding and generating capabilities across all benchmarks. The results show that xLLMs-100 outperforms existing models, defining a new state-of-the-art multilingual LLM that supports 100 languages. The paper also discusses the challenges of scaling multilingual LLMs, including the scarcity of multilingual instruction data and the need to align LLMs with human preferences. The authors propose a two-step training process: supervised fine-tuning and alignment with human preferences. They also conduct an ablation study to evaluate the effectiveness of their approach, showing that cross-lingual human feedback data significantly improves performance. The paper highlights the importance of language democratization, showing that xLLMs-100 achieves a higher degree of linguistic democratization compared to other models. The authors also discuss the limitations of their work, including the size of the models and the coverage of the human feedback dataset. Overall, the paper presents a comprehensive approach to scaling the multilingual capabilities of LLMs, demonstrating the effectiveness of their method in improving both understanding and generating capabilities across a wide range of languages.This paper introduces xLLaMA-100 and xBLOOM-100 (collectively xLLMs-100), which extend the multilingual capabilities of LLaMA and BLOOM to 100 languages. To achieve this, the authors construct two datasets: a multilingual instruction dataset covering 100 languages and a cross-lingual human feedback dataset covering 30 languages. They perform multilingual instruction tuning on the instruction data and align the LLMs with human feedback using the DPO algorithm on the cross-lingual feedback dataset. The xLLMs-100 models are evaluated on five multilingual benchmarks, demonstrating significant improvements in both understanding and generating capabilities across all benchmarks. The results show that xLLMs-100 outperforms existing models, defining a new state-of-the-art multilingual LLM that supports 100 languages. The paper also discusses the challenges of scaling multilingual LLMs, including the scarcity of multilingual instruction data and the need to align LLMs with human preferences. The authors propose a two-step training process: supervised fine-tuning and alignment with human preferences. They also conduct an ablation study to evaluate the effectiveness of their approach, showing that cross-lingual human feedback data significantly improves performance. The paper highlights the importance of language democratization, showing that xLLMs-100 achieves a higher degree of linguistic democratization compared to other models. The authors also discuss the limitations of their work, including the size of the models and the coverage of the human feedback dataset. Overall, the paper presents a comprehensive approach to scaling the multilingual capabilities of LLMs, demonstrating the effectiveness of their method in improving both understanding and generating capabilities across a wide range of languages.