BoNBoN Alignment for Large Language Models and the Sweetness of Best-of-n Sampling

BoNBoN Alignment for Large Language Models and the Sweetness of Best-of-n Sampling

5 Jun 2024 | Lin Gui¹, Cristina Gârbacea², and Victor Veitch¹²
This paper investigates the relationship between best-of-n (BoN) sampling and alignment methods for large language models (LLMs). It shows that the BoN distribution is essentially optimal in terms of win rate versus KL divergence. The paper proposes BoNBoN Alignment, a method to train LLMs to mimic the BoN sampling distribution. BoNBoN Alignment outperforms baselines in achieving high win rates while minimizing off-target effects. The method combines supervised fine-tuning on BoN samples with an IPO objective that uses best-of-n and worst-of-n samples. Experiments on dialogue generation and text summarization tasks demonstrate that BoNBoN achieves high win rates with minimal off-target deviation. The results suggest that BoN is already an optimal policy for alignment, and BoNBoN provides an efficient way to train LLMs to mimic this policy. The paper also discusses the theoretical foundations of BoN and its relationship to other alignment methods.This paper investigates the relationship between best-of-n (BoN) sampling and alignment methods for large language models (LLMs). It shows that the BoN distribution is essentially optimal in terms of win rate versus KL divergence. The paper proposes BoNBoN Alignment, a method to train LLMs to mimic the BoN sampling distribution. BoNBoN Alignment outperforms baselines in achieving high win rates while minimizing off-target effects. The method combines supervised fine-tuning on BoN samples with an IPO objective that uses best-of-n and worst-of-n samples. Experiments on dialogue generation and text summarization tasks demonstrate that BoNBoN achieves high win rates with minimal off-target deviation. The results suggest that BoN is already an optimal policy for alignment, and BoNBoN provides an efficient way to train LLMs to mimic this policy. The paper also discusses the theoretical foundations of BoN and its relationship to other alignment methods.
Reach us at info@study.space
[slides and audio] BoNBoN Alignment for Large Language Models and the Sweetness of Best-of-n Sampling