[slides and audio] Contrastive Preference Optimization%3A Pushing the Boundaries of LLM Performance in Machine Translation

This paper addresses the performance gap between moderate-sized large language models (LLMs) and state-of-the-art conventional encoder-decoder translation models or larger-scale LLMs like GPT-4 in machine translation (MT). The study introduces Contrastive Preference Optimization (CPO), a novel approach that trains models to avoid generating adequate but not perfect translations. By applying CPO to ALMA models with only 22K parallel sentences and tuning 0.1% parameters, the resulting model, ALMA-R, matches or exceeds the performance of WMT competition winners and GPT-4 on WMT'21, WMT'22, and WMT'23 test datasets. The paper begins by assessing the shortcomings of supervised fine-tuning (SFT) for LLMs in MT, highlighting the quality issues in reference data. CPO is then introduced as a method to train models using curated preference data, which helps the model prioritize generating higher-quality translations and reject suboptimal ones. The study demonstrates that CPO significantly improves the performance of ALMA models, outperforming both SFT and direct preference optimization (DPO). The authors also conduct a thorough analysis of the quality of gold references, showing that they may not always be flawless and can be surpassed by translations from advanced models. They further validate the effectiveness of CPO through human evaluation and ablation studies, confirming the importance of both model-generated and reference data in enhancing translation quality. Overall, the paper makes significant contributions by bridging the performance gap between moderate-sized LLMs and state-of-the-art translation models, advancing the field of MT and large language model applications.This paper addresses the performance gap between moderate-sized large language models (LLMs) and state-of-the-art conventional encoder-decoder translation models or larger-scale LLMs like GPT-4 in machine translation (MT). The study introduces Contrastive Preference Optimization (CPO), a novel approach that trains models to avoid generating adequate but not perfect translations. By applying CPO to ALMA models with only 22K parallel sentences and tuning 0.1% parameters, the resulting model, ALMA-R, matches or exceeds the performance of WMT competition winners and GPT-4 on WMT'21, WMT'22, and WMT'23 test datasets. The paper begins by assessing the shortcomings of supervised fine-tuning (SFT) for LLMs in MT, highlighting the quality issues in reference data. CPO is then introduced as a method to train models using curated preference data, which helps the model prioritize generating higher-quality translations and reject suboptimal ones. The study demonstrates that CPO significantly improves the performance of ALMA models, outperforming both SFT and direct preference optimization (DPO). The authors also conduct a thorough analysis of the quality of gold references, showing that they may not always be flawless and can be surpassed by translations from advanced models. They further validate the effectiveness of CPO through human evaluation and ablation studies, confirming the importance of both model-generated and reference data in enhancing translation quality. Overall, the paper makes significant contributions by bridging the performance gap between moderate-sized LLMs and state-of-the-art translation models, advancing the field of MT and large language model applications.

Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation

2024 | Haoran Xu, Amr Sharaf, Yunmo Chen, Weiting Tan, Lingfeng Shen, Benjamin Van Durme, Kenton Murray, Young Jin Kim