RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs

RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs

2 Jul 2024 | John Dang, Arash Ahmadian, Kelly Marchisio, Julia Kreutzer, Ahmet Üstün, Sara Hooker
This paper presents a comprehensive study on multilingual preference optimization for large language models (LLMs). The authors introduce a novel, scalable method for generating high-quality multilingual feedback data to balance data coverage. They demonstrate that cross-lingual transfer and increased dataset size in preference training significantly improve performance. Their preference-trained model achieves a 54.4% win-rate against Aya 23 8B, the current state-of-the-art multilingual LLM in its parameter class, and a 69.5% win-rate or higher against widely used models like Gemma-1.1-7B-it, Llama-3-8B-Instruct, and Mistral-7B-Instruct-v0.3. The study expands the frontier of alignment techniques to 23 languages, covering half of the world's population. The authors address the challenges of multilingual preference optimization, including data scarcity and data quality, and the difficulty of training models in multiple languages simultaneously. They find that multilingual preference data is necessary for aligning multilingual LLMs and that online preference optimization outperforms offline optimization. They also show that preference optimization leads to significant gains in win-rates against both the original Aya model and widely used open-source models. The study evaluates the performance of preference-optimized models on multilingual open-ended generation and summarization tasks using LLM-simulated evaluation. The results show that increasing the number of languages in preference data improves performance, and that online optimization leads to better cross-lingual transfer. The authors also find that multilingual preference optimization can substantially improve generative performance while incurring a minimal alignment tax on common multilingual NLP tasks. The study highlights the importance of multilingual preference optimization in expanding the capabilities of LLMs to support a wider range of languages and users. The authors conclude that their work provides a new state-of-the-art in aligning multilingual LLMs and expands the frontier of alignment techniques to 23 languages, covering half of the world's population.This paper presents a comprehensive study on multilingual preference optimization for large language models (LLMs). The authors introduce a novel, scalable method for generating high-quality multilingual feedback data to balance data coverage. They demonstrate that cross-lingual transfer and increased dataset size in preference training significantly improve performance. Their preference-trained model achieves a 54.4% win-rate against Aya 23 8B, the current state-of-the-art multilingual LLM in its parameter class, and a 69.5% win-rate or higher against widely used models like Gemma-1.1-7B-it, Llama-3-8B-Instruct, and Mistral-7B-Instruct-v0.3. The study expands the frontier of alignment techniques to 23 languages, covering half of the world's population. The authors address the challenges of multilingual preference optimization, including data scarcity and data quality, and the difficulty of training models in multiple languages simultaneously. They find that multilingual preference data is necessary for aligning multilingual LLMs and that online preference optimization outperforms offline optimization. They also show that preference optimization leads to significant gains in win-rates against both the original Aya model and widely used open-source models. The study evaluates the performance of preference-optimized models on multilingual open-ended generation and summarization tasks using LLM-simulated evaluation. The results show that increasing the number of languages in preference data improves performance, and that online optimization leads to better cross-lingual transfer. The authors also find that multilingual preference optimization can substantially improve generative performance while incurring a minimal alignment tax on common multilingual NLP tasks. The study highlights the importance of multilingual preference optimization in expanding the capabilities of LLMs to support a wider range of languages and users. The authors conclude that their work provides a new state-of-the-art in aligning multilingual LLMs and expands the frontier of alignment techniques to 23 languages, covering half of the world's population.
Reach us at info@study.space
Understanding RLHF Can Speak Many Languages%3A Unlocking Multilingual Preference Optimization for LLMs