MAPO: Advancing Multilingual Reasoning through Multilingual-Alignment-as-Preference Optimization

MAPO: Advancing Multilingual Reasoning through Multilingual-Alignment-as-Preference Optimization

2024 | Shua jie She, Wei Zou, Shujian Huang*, Wen hao Zhu, Xiang Liu, Xiang Geng, Jiajun Chen
MAPO: Advancing Multilingual Reasoning through Multilingual-Alignment-as-Preference Optimization This paper proposes MAPO, a framework that enhances multilingual reasoning by aligning non-dominant languages with dominant languages. The framework uses multilingual alignment as a preference to optimize reasoning processes. Specifically, it leverages a well-trained multilingual translation model to calculate alignment scores between answers in non-dominant and dominant languages. These scores are then used for preference optimization, such as Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO). Experiments show that MAPO significantly improves multilingual reasoning across three benchmarks: MSVAMP (+16.2%), MGSM (+6.1%), and MNumGLUESub (+13.3%). The results demonstrate that MAPO effectively enhances the reasoning capabilities of models in non-dominant languages by aligning their reasoning processes with those in dominant languages. The framework is robust across different translation models and shows consistent performance improvements. The analysis confirms that enhancing alignment through preference optimization is key to improving multilingual reasoning capabilities. The method is effective in aligning non-English reasoning with English reasoning, leading to more consistent and accurate reasoning across languages. The results indicate that MAPO achieves state-of-the-art performance on 7B models and demonstrates the effectiveness of the proposed framework.MAPO: Advancing Multilingual Reasoning through Multilingual-Alignment-as-Preference Optimization This paper proposes MAPO, a framework that enhances multilingual reasoning by aligning non-dominant languages with dominant languages. The framework uses multilingual alignment as a preference to optimize reasoning processes. Specifically, it leverages a well-trained multilingual translation model to calculate alignment scores between answers in non-dominant and dominant languages. These scores are then used for preference optimization, such as Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO). Experiments show that MAPO significantly improves multilingual reasoning across three benchmarks: MSVAMP (+16.2%), MGSM (+6.1%), and MNumGLUESub (+13.3%). The results demonstrate that MAPO effectively enhances the reasoning capabilities of models in non-dominant languages by aligning their reasoning processes with those in dominant languages. The framework is robust across different translation models and shows consistent performance improvements. The analysis confirms that enhancing alignment through preference optimization is key to improving multilingual reasoning capabilities. The method is effective in aligning non-English reasoning with English reasoning, leading to more consistent and accurate reasoning across languages. The results indicate that MAPO achieves state-of-the-art performance on 7B models and demonstrates the effectiveness of the proposed framework.
Reach us at info@study.space
[slides and audio] MAPO%3A Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization