Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment

Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment

2024 | Zhaofeng Wu, Ananth Balashankar, Yoon Kim, Jacob Eisenstein, Ahmad Beirami
This paper introduces a novel approach for zero-shot cross-lingual alignment of language models (LMs) using reward models (RMs) trained on preference data in one language and applied to other languages. The method is evaluated on two tasks: summarization and open-ended dialog generation. The results show that cross-lingual alignment using a different-language RM is often more effective than using a same-language RM, as judged by both humans and LMs. The approach is also effective when target-language data for supervised fine-tuning (SFT) is unavailable. The study demonstrates that RM signals are generalizable and robust to input distribution changes, which could be leveraged for future applications. The findings suggest that cross-lingual alignment can be achieved without target-language SFT data, though care must be taken when training the surrogate SFT model. The paper also provides practical recommendations for using RMs in cross-lingual alignment, highlighting the effectiveness of English as a source language. The results indicate that cross-lingual alignment can be effective even without target-language SFT data, though domain match is important for best-of-n alignment. The study also shows that cross-lingual alignment can be effective when using translated SFT data, though this may not always be the case. The paper concludes that cross-lingual alignment is a promising approach for building more equitable and effective LM-based systems.This paper introduces a novel approach for zero-shot cross-lingual alignment of language models (LMs) using reward models (RMs) trained on preference data in one language and applied to other languages. The method is evaluated on two tasks: summarization and open-ended dialog generation. The results show that cross-lingual alignment using a different-language RM is often more effective than using a same-language RM, as judged by both humans and LMs. The approach is also effective when target-language data for supervised fine-tuning (SFT) is unavailable. The study demonstrates that RM signals are generalizable and robust to input distribution changes, which could be leveraged for future applications. The findings suggest that cross-lingual alignment can be achieved without target-language SFT data, though care must be taken when training the surrogate SFT model. The paper also provides practical recommendations for using RMs in cross-lingual alignment, highlighting the effectiveness of English as a source language. The results indicate that cross-lingual alignment can be effective even without target-language SFT data, though domain match is important for best-of-n alignment. The study also shows that cross-lingual alignment can be effective when using translated SFT data, though this may not always be the case. The paper concludes that cross-lingual alignment is a promising approach for building more equitable and effective LM-based systems.
Reach us at info@study.space