25 Jun 2024 | Weizhe Yuan, Ilia Kulikov, Ping Yu, Kyunghyun Cho, Sainbayar Sukhbaatar, Jason Weston, Jing Xu
This paper addresses the issue of length bias in instruction-following models, where models tend to generate longer responses due to evaluation preferences. The authors propose a method called Length-Instruction Fine-Tuning (LIFT) to train models that can be controlled at inference time with length constraints. They show that existing instruction-following models, such as GPT-4, Llama 3, and Mixtral, often fail to follow length instructions, leading to significant violations.
The authors introduce two new benchmarks, AlpacaEval-LI and MT-Bench-LI, which include length constraints in their prompts. These benchmarks evaluate how well models can follow length instructions. They also propose LIFT-DPO, a training method that augments existing instruction-following datasets with length instructions to improve models' ability to follow length constraints.
The LIFT-DPO method involves modifying existing preference pairs to include length instructions, allowing models to learn to follow length constraints. The authors evaluate their method on the new benchmarks and find that LIFT-DPO significantly reduces length constraint violations compared to standard instruction-following models.
The results show that LIFT-DPO models perform better in length-instructed evaluations, with lower violation rates and higher win rates. The authors also find that LIFT-DPO models maintain high performance on standard benchmarks without length instructions.
The paper highlights the importance of length constraints in evaluating instruction-following models and proposes a new method to improve their ability to follow length instructions. The authors argue that length instructions are essential for fair evaluation and that current models often fail to adhere to them. The LIFT-DPO method provides a way to train models that can follow length instructions, improving their performance in length-instructed evaluations.This paper addresses the issue of length bias in instruction-following models, where models tend to generate longer responses due to evaluation preferences. The authors propose a method called Length-Instruction Fine-Tuning (LIFT) to train models that can be controlled at inference time with length constraints. They show that existing instruction-following models, such as GPT-4, Llama 3, and Mixtral, often fail to follow length instructions, leading to significant violations.
The authors introduce two new benchmarks, AlpacaEval-LI and MT-Bench-LI, which include length constraints in their prompts. These benchmarks evaluate how well models can follow length instructions. They also propose LIFT-DPO, a training method that augments existing instruction-following datasets with length instructions to improve models' ability to follow length constraints.
The LIFT-DPO method involves modifying existing preference pairs to include length instructions, allowing models to learn to follow length constraints. The authors evaluate their method on the new benchmarks and find that LIFT-DPO significantly reduces length constraint violations compared to standard instruction-following models.
The results show that LIFT-DPO models perform better in length-instructed evaluations, with lower violation rates and higher win rates. The authors also find that LIFT-DPO models maintain high performance on standard benchmarks without length instructions.
The paper highlights the importance of length constraints in evaluating instruction-following models and proposes a new method to improve their ability to follow length instructions. The authors argue that length instructions are essential for fair evaluation and that current models often fail to adhere to them. The LIFT-DPO method provides a way to train models that can follow length instructions, improving their performance in length-instructed evaluations.