Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation

Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation

2024 | Haoran Xu, Amr Sharaf, Yunmo Chen, Weiting Tan, Lingfeng Shen, Benjamin Van Durme, Kenton Murray, Young Jin Kim
This paper introduces Contrastive Preference Optimization (CPO), a novel training method that significantly improves the performance of moderate-sized large language models (LLMs) in machine translation (MT). The study addresses the limitations of supervised fine-tuning (SFT) in MT, particularly the reliance on high-quality reference data that may not always be accurate. CPO trains models to avoid generating translations that are nearly perfect but flawed, instead focusing on improving translation quality by learning from preference data. Applying CPO to the ALMA model with only 22K parallel sentences and tuning 0.1% of the parameters leads to significant improvements, resulting in the ALMA-R model that matches or exceeds the performance of the WMT competition winners and GPT-4 on WMT'21, WMT'22, and WMT'23 test datasets. The study highlights that gold reference data, often considered high-quality, may not always be accurate. By analyzing the FLORES-200 dataset, the research finds that system-generated translations can sometimes outperform human-written references. This insight leads to the development of CPO, which uses contrastive learning to train models to prioritize high-quality translations and reject suboptimal ones. The CPO method is more efficient than SFT and DPO, requiring less memory and computational resources while achieving better performance. Experiments show that CPO significantly enhances translation performance across multiple language directions. The ALMA-R model outperforms previous models, including ALMA-13B-LoRA and GPT-4, in both reference-free and reference-based evaluations. Human evaluations further confirm the superiority of ALMA-R in translating Chinese to English, demonstrating its effectiveness in producing high-quality translations. The study also includes ablation studies showing that both the preference learning and negative log-likelihood components of CPO are crucial for improving translation performance. Overall, CPO represents a significant advancement in the field of machine translation, offering a more effective and efficient approach to training LLMs for translation tasks.This paper introduces Contrastive Preference Optimization (CPO), a novel training method that significantly improves the performance of moderate-sized large language models (LLMs) in machine translation (MT). The study addresses the limitations of supervised fine-tuning (SFT) in MT, particularly the reliance on high-quality reference data that may not always be accurate. CPO trains models to avoid generating translations that are nearly perfect but flawed, instead focusing on improving translation quality by learning from preference data. Applying CPO to the ALMA model with only 22K parallel sentences and tuning 0.1% of the parameters leads to significant improvements, resulting in the ALMA-R model that matches or exceeds the performance of the WMT competition winners and GPT-4 on WMT'21, WMT'22, and WMT'23 test datasets. The study highlights that gold reference data, often considered high-quality, may not always be accurate. By analyzing the FLORES-200 dataset, the research finds that system-generated translations can sometimes outperform human-written references. This insight leads to the development of CPO, which uses contrastive learning to train models to prioritize high-quality translations and reject suboptimal ones. The CPO method is more efficient than SFT and DPO, requiring less memory and computational resources while achieving better performance. Experiments show that CPO significantly enhances translation performance across multiple language directions. The ALMA-R model outperforms previous models, including ALMA-13B-LoRA and GPT-4, in both reference-free and reference-based evaluations. Human evaluations further confirm the superiority of ALMA-R in translating Chinese to English, demonstrating its effectiveness in producing high-quality translations. The study also includes ablation studies showing that both the preference learning and negative log-likelihood components of CPO are crucial for improving translation performance. Overall, CPO represents a significant advancement in the field of machine translation, offering a more effective and efficient approach to training LLMs for translation tasks.
Reach us at info@study.space
Understanding Contrastive Preference Optimization%3A Pushing the Boundaries of LLM Performance in Machine Translation