2024 | William Muldrew, Peter Hayes, Mingtian Zhang, David Barber
This paper introduces Active Preference Learning (APL), an iterative data acquisition and fine-tuning loop that improves the efficiency of preference-based fine-tuning for large language models (LLMs). The key idea is to use an acquisition function that selects the most informative prompt/completion pairs for fine-tuning, based on the predictive entropy of the model and the certainty of the implicit preference model. This approach enhances the learning process by focusing on data points where the model's implicit preference ranking is confidently wrong, leading to better performance.
The paper compares different acquisition strategies, including random, entropy-based, preference certainty-based, and a hybrid approach combining both. It demonstrates that the preference certainty-based acquisition significantly outperforms the random baseline, improving the win-rate of the fine-tuned model by 1-6% on average. This is attributed to the model's ability to focus on data points where the implicit preference model is most uncertain, thereby accelerating learning and improving final performance.
The paper also discusses the use of GPT-4 as an oracle for preference labeling and evaluation, highlighting its effectiveness in providing consistent and reliable preference judgments. It presents experiments on two datasets, IMDB and TLDR, using open-source models with approximately 1 billion parameters. The results show that the APL approach leads to better performance compared to random sampling, with the preference certainty-based acquisition being the most effective.
The paper also addresses the computational challenges of training large language models and suggests that future work could integrate online learning techniques to improve efficiency. Additionally, it explores the potential of combining APL with parameter-efficient fine-tuning methods like LoRA to further enhance performance. Overall, the study provides a practical and effective approach to improving the use of preference labels in fine-tuning LLMs, with significant implications for the development of more aligned and capable language models.This paper introduces Active Preference Learning (APL), an iterative data acquisition and fine-tuning loop that improves the efficiency of preference-based fine-tuning for large language models (LLMs). The key idea is to use an acquisition function that selects the most informative prompt/completion pairs for fine-tuning, based on the predictive entropy of the model and the certainty of the implicit preference model. This approach enhances the learning process by focusing on data points where the model's implicit preference ranking is confidently wrong, leading to better performance.
The paper compares different acquisition strategies, including random, entropy-based, preference certainty-based, and a hybrid approach combining both. It demonstrates that the preference certainty-based acquisition significantly outperforms the random baseline, improving the win-rate of the fine-tuned model by 1-6% on average. This is attributed to the model's ability to focus on data points where the implicit preference model is most uncertain, thereby accelerating learning and improving final performance.
The paper also discusses the use of GPT-4 as an oracle for preference labeling and evaluation, highlighting its effectiveness in providing consistent and reliable preference judgments. It presents experiments on two datasets, IMDB and TLDR, using open-source models with approximately 1 billion parameters. The results show that the APL approach leads to better performance compared to random sampling, with the preference certainty-based acquisition being the most effective.
The paper also addresses the computational challenges of training large language models and suggests that future work could integrate online learning techniques to improve efficiency. Additionally, it explores the potential of combining APL with parameter-efficient fine-tuning methods like LoRA to further enhance performance. Overall, the study provides a practical and effective approach to improving the use of preference labels in fine-tuning LLMs, with significant implications for the development of more aligned and capable language models.