3 Jun 2024 | Wen Lai, Mohsen Mesgar, Alexander Fraser
The paper introduces xLLaMA-100 and xBLOOM-100 (collectively xLLMs-100), which scale the multilingual capabilities of LLaMA and BLOOM to support 100 languages. To achieve this, the authors construct two datasets: a multilingual instruction dataset covering 100 languages and a cross-lingual human feedback dataset encompassing 30 languages. The multilingual instruction dataset is created by translating the Alpaca dataset using Google Translate API and NLLB model, while the cross-lingual human feedback dataset is constructed by designing instructions in one language and generating responses in another. The authors perform multilingual instruction tuning on the constructed datasets and align the LLMs with human feedback using the DPO algorithm. The evaluation on five multilingual benchmarks shows that xLLMs-100 consistently outperforms its peers, demonstrating a new state-of-the-art multilingual LLM that supports 100 languages. The paper also discusses the limitations and future directions, including the need for larger models, extending the cross-lingual human feedback dataset, and addressing off-target issues in low-resource languages.The paper introduces xLLaMA-100 and xBLOOM-100 (collectively xLLMs-100), which scale the multilingual capabilities of LLaMA and BLOOM to support 100 languages. To achieve this, the authors construct two datasets: a multilingual instruction dataset covering 100 languages and a cross-lingual human feedback dataset encompassing 30 languages. The multilingual instruction dataset is created by translating the Alpaca dataset using Google Translate API and NLLB model, while the cross-lingual human feedback dataset is constructed by designing instructions in one language and generating responses in another. The authors perform multilingual instruction tuning on the constructed datasets and align the LLMs with human feedback using the DPO algorithm. The evaluation on five multilingual benchmarks shows that xLLMs-100 consistently outperforms its peers, demonstrating a new state-of-the-art multilingual LLM that supports 100 languages. The paper also discusses the limitations and future directions, including the need for larger models, extending the cross-lingual human feedback dataset, and addressing off-target issues in low-resource languages.