[slides] Reuse Your Rewards%3A Reward Model Transfer for Zero-Shot Cross-Lingual Alignment

This paper explores the effectiveness of zero-shot cross-lingual alignment for language models (LMs) using a reward model (RM) transfer approach. The authors evaluate a method where a trained RM from one source language is directly applied to align LMs in other target languages, without requiring human-annotated preference data in the target languages. The evaluation covers two tasks—summarization and open-ended dialog generation—and includes various evaluation settings, including human and LM judgments. The results show that cross-lingual alignment consistently improves model performance, with up to 70% of evaluation instances preferring aligned models over unaligned ones. Surprisingly, using a different-language RM sometimes outperforms using a same-language RM. The study also identifies best practices when there is no language-specific data for supervised fine-tuning, another component in alignment. The findings suggest that cross-lingual alignment can be effective even without target-language SFT data, though careful consideration is needed for data distribution and evaluation metrics. The paper concludes by discussing practical recommendations and limitations, highlighting the potential for building more equitable and globally accessible LMs.This paper explores the effectiveness of zero-shot cross-lingual alignment for language models (LMs) using a reward model (RM) transfer approach. The authors evaluate a method where a trained RM from one source language is directly applied to align LMs in other target languages, without requiring human-annotated preference data in the target languages. The evaluation covers two tasks—summarization and open-ended dialog generation—and includes various evaluation settings, including human and LM judgments. The results show that cross-lingual alignment consistently improves model performance, with up to 70% of evaluation instances preferring aligned models over unaligned ones. Surprisingly, using a different-language RM sometimes outperforms using a same-language RM. The study also identifies best practices when there is no language-specific data for supervised fine-tuning, another component in alignment. The findings suggest that cross-lingual alignment can be effective even without target-language SFT data, though careful consideration is needed for data distribution and evaluation metrics. The paper concludes by discussing practical recommendations and limitations, highlighting the potential for building more equitable and globally accessible LMs.

Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment

18 Apr 2024 | Zhaofeng Wu, Ananth Balashankar, Yoon Kim, Jacob Eisenstein, Ahmad Beirami