5 Jun 2024 | Lin Gui, Cristina Gârbacea, and Victor Veitch
This paper explores the problem of aligning large language models (LLMs) to align their outputs with human preferences using *best-of-n* sampling, where $n$ samples are drawn, ranked, and the best one is returned. The authors address two fundamental questions: the relationship between best-of-n and other alignment approaches, and how to fine-tune an LLM to mimic the best-of-n sampling distribution. They find that best-of-n is essentially optimal in terms of win rate versus KL divergence, but it requires drawing $n$ samples for each inference, which is computationally expensive. To address this, they propose *BoNBoN Alignment*, a method that trains an LLM to mimic the best-of-n distribution by using both best-of-$n$ and worst-of-$n$ samples as training data. Experiments show that BoNBoN alignment significantly improves win rates while minimizing off-target deviations compared to other alignment methods. The code for BoNBoN is available at <https://github.com/gl-ybnxb/BoNBoN>.This paper explores the problem of aligning large language models (LLMs) to align their outputs with human preferences using *best-of-n* sampling, where $n$ samples are drawn, ranked, and the best one is returned. The authors address two fundamental questions: the relationship between best-of-n and other alignment approaches, and how to fine-tune an LLM to mimic the best-of-n sampling distribution. They find that best-of-n is essentially optimal in terms of win rate versus KL divergence, but it requires drawing $n$ samples for each inference, which is computationally expensive. To address this, they propose *BoNBoN Alignment*, a method that trains an LLM to mimic the best-of-n distribution by using both best-of-$n$ and worst-of-$n$ samples as training data. Experiments show that BoNBoN alignment significantly improves win rates while minimizing off-target deviations compared to other alignment methods. The code for BoNBoN is available at <https://github.com/gl-ybnxb/BoNBoN>.