MAPO: Advancing Multilingual Reasoning through Multilingual-Alignment-as-Preference Optimization

MAPO: Advancing Multilingual Reasoning through Multilingual-Alignment-as-Preference Optimization

13 Apr 2024 | Shuaijie She, Wei Zou, Shujian Huang*, Wenhao Zhu Xiang Liu, Xiang Geng, Jiajun Chen
The paper "MAPO: Advancing Multilingual Reasoning through Multilingual-Alignment-as-Preference Optimization" addresses the issue of inconsistent reasoning abilities across different languages in large language models (LLMs). The authors propose a framework called Multilingual-Alignment-as-Preference Optimization (MAPO) to enhance reasoning in non-dominant languages by aligning their reasoning processes with the dominant language, such as English. MAPO uses a translation model to estimate the alignment scores between answers in non-dominant and dominant languages, which are then optimized using methods like Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO). Experiments on three benchmarks (MSVAMP, MGSM, and MNumGLUESub) show significant improvements in multilingual reasoning, with average accuracy increases of 16.2%, 6.1%, and 13.3% respectively. The method effectively improves the consistency of reasoning processes and answers across languages, demonstrating its effectiveness and robustness across different translation models and sizes.The paper "MAPO: Advancing Multilingual Reasoning through Multilingual-Alignment-as-Preference Optimization" addresses the issue of inconsistent reasoning abilities across different languages in large language models (LLMs). The authors propose a framework called Multilingual-Alignment-as-Preference Optimization (MAPO) to enhance reasoning in non-dominant languages by aligning their reasoning processes with the dominant language, such as English. MAPO uses a translation model to estimate the alignment scores between answers in non-dominant and dominant languages, which are then optimized using methods like Proximal Policy Optimization (PPO) and Direct Preference Optimization (DPO). Experiments on three benchmarks (MSVAMP, MGSM, and MNumGLUESub) show significant improvements in multilingual reasoning, with average accuracy increases of 16.2%, 6.1%, and 13.3% respectively. The method effectively improves the consistency of reasoning processes and answers across languages, demonstrating its effectiveness and robustness across different translation models and sizes.
Reach us at info@study.space