Following Length Constraints in Instructions

Following Length Constraints in Instructions

25 Jun 2024 | Weizhe Yuan, Ilia Kulikov, Ping Yu, Sainbayar Sukhbaatar, Jason Weston, Kyunghyun Cho, Jing Xu
This paper addresses the issue of length bias in instruction-following models, where models tend to produce longer responses than desired. The authors propose a method called Length-Instruction Fine-Tuning (LIFT) to train models that can be controlled at inference time with specific length constraints. LIFT involves augmenting existing instruction-following datasets by inserting length instructions into the prompts, creating preference pairs that reflect both length constraints and response quality. The augmented dataset is then used to fine-tune models using Direct Preference Optimization (DPO). The authors evaluate their method on two benchmarks, AlpacaEval-LI and MT-Bench-LI, which incorporate length instructions. They find that state-of-the-art models like GPT4 and Llama 3 fail to follow length instructions adequately, with GPT4-Turbo violating length constraints almost 50% of the time. In contrast, LIFT-DPO models show significantly lower violation rates and higher win rates on these benchmarks. The paper also demonstrates that LIFT-DPO models can better handle out-of-distribution length instructions and maintain performance on standard benchmarks without length instructions. The authors conclude that their approach provides a way to compare models without length bias and improve the alignment of models with human expectations.This paper addresses the issue of length bias in instruction-following models, where models tend to produce longer responses than desired. The authors propose a method called Length-Instruction Fine-Tuning (LIFT) to train models that can be controlled at inference time with specific length constraints. LIFT involves augmenting existing instruction-following datasets by inserting length instructions into the prompts, creating preference pairs that reflect both length constraints and response quality. The augmented dataset is then used to fine-tune models using Direct Preference Optimization (DPO). The authors evaluate their method on two benchmarks, AlpacaEval-LI and MT-Bench-LI, which incorporate length instructions. They find that state-of-the-art models like GPT4 and Llama 3 fail to follow length instructions adequately, with GPT4-Turbo violating length constraints almost 50% of the time. In contrast, LIFT-DPO models show significantly lower violation rates and higher win rates on these benchmarks. The paper also demonstrates that LIFT-DPO models can better handle out-of-distribution length instructions and maintain performance on standard benchmarks without length instructions. The authors conclude that their approach provides a way to compare models without length bias and improve the alignment of models with human expectations.
Reach us at info@study.space
[slides] Following Length Constraints in Instructions | StudySpace