Towards Efficient Exact Optimization of Language Model Alignment

Towards Efficient Exact Optimization of Language Model Alignment

2024 | Haozhe Ji, Cheng Lu, Yilin Niu, Pei Ke, Hongning Wang, Jun Zhu, Jie Tang, Minlie Huang
This paper presents a novel approach, Efficient Exact Optimization (EXO), for aligning language models with human preferences. The alignment problem is formulated as optimizing the model's policy to maximize the expected reward that reflects human preferences while minimizing deviation from the initial policy. While reinforcement learning (RL) has been used for this task, it suffers from high variance in policy updates, which hinders efficient policy improvement. Direct preference optimization (DPO) was proposed as an alternative, but it leads to a compromised approximation of the optimal solution in practice. EXO is designed to optimize the alignment objective in the same direction as RL algorithms asymptotically for arbitrary policy parametrization. This leads to the same mode-seeking solution while enabling efficient optimization by circumventing the complexities of RL. The paper compares EXO with DPO using both theoretical and empirical analyses, demonstrating that EXO outperforms existing approaches on realistic human preference data. The alignment objective is shown to be equivalent to probability matching between the parametrized policy and the optimal policy measured by the reverse KL divergence. Based on this equivalence, EXO is proposed as an algorithm for efficient exact optimization of the KL-regularized reward maximization objective. Theoretical and empirical results confirm its effectiveness. The paper also shows that DPO corresponds to minimizing the forward KL divergence, which is less effective in capturing the essential characteristics of the optimal policy. The paper presents experiments demonstrating the effectiveness and scalability of EXO, showing that it achieves higher oracle rewards and better alignment performance compared to DPO and PPO. The results highlight the advantages of EXO in terms of sample efficiency and alignment accuracy.This paper presents a novel approach, Efficient Exact Optimization (EXO), for aligning language models with human preferences. The alignment problem is formulated as optimizing the model's policy to maximize the expected reward that reflects human preferences while minimizing deviation from the initial policy. While reinforcement learning (RL) has been used for this task, it suffers from high variance in policy updates, which hinders efficient policy improvement. Direct preference optimization (DPO) was proposed as an alternative, but it leads to a compromised approximation of the optimal solution in practice. EXO is designed to optimize the alignment objective in the same direction as RL algorithms asymptotically for arbitrary policy parametrization. This leads to the same mode-seeking solution while enabling efficient optimization by circumventing the complexities of RL. The paper compares EXO with DPO using both theoretical and empirical analyses, demonstrating that EXO outperforms existing approaches on realistic human preference data. The alignment objective is shown to be equivalent to probability matching between the parametrized policy and the optimal policy measured by the reverse KL divergence. Based on this equivalence, EXO is proposed as an algorithm for efficient exact optimization of the KL-regularized reward maximization objective. Theoretical and empirical results confirm its effectiveness. The paper also shows that DPO corresponds to minimizing the forward KL divergence, which is less effective in capturing the essential characteristics of the optimal policy. The paper presents experiments demonstrating the effectiveness and scalability of EXO, showing that it achieves higher oracle rewards and better alignment performance compared to DPO and PPO. The results highlight the advantages of EXO in terms of sample efficiency and alignment accuracy.
Reach us at info@study.space
[slides and audio] Towards Efficient Exact Optimization of Language Model Alignment