Learning to summarize from human feedback

Learning to summarize from human feedback

15 Feb 2022 | Nisan Stiennon*, Long Ouyang*, Jeff Wu*, Daniel M. Ziegler*, Ryan Lowe*, Chelsea Voss*, Alec Radford, Dario Amodei, Paul Christiano*
This paper presents a method for improving summary quality by training models to optimize for human preferences. The authors collect a large, high-quality dataset of human comparisons between summaries, train a model to predict the human-preferred summary, and use that model as a reward function to fine-tune a summarization policy using reinforcement learning. They apply their method to the TL;DR dataset of Reddit posts and find that their models significantly outperform both human reference summaries and much larger models fine-tuned with supervised learning alone. Their models also transfer to CNN/DM news articles, producing summaries nearly as good as the human reference without any news-specific fine-tuning. The authors conduct extensive analyses to understand their human feedback dataset and fine-tuned models. They establish that their reward model generalizes to new datasets, and that optimizing their reward model results in better summaries than optimizing ROUGE according to humans. The paper also discusses the broader implications of their work, including the potential for aligning machine learning algorithms with human preferences and the risks associated with training models using human feedback. The authors conclude that their method is a promising approach for improving summary quality and aligning machine learning systems with human intentions.This paper presents a method for improving summary quality by training models to optimize for human preferences. The authors collect a large, high-quality dataset of human comparisons between summaries, train a model to predict the human-preferred summary, and use that model as a reward function to fine-tune a summarization policy using reinforcement learning. They apply their method to the TL;DR dataset of Reddit posts and find that their models significantly outperform both human reference summaries and much larger models fine-tuned with supervised learning alone. Their models also transfer to CNN/DM news articles, producing summaries nearly as good as the human reference without any news-specific fine-tuning. The authors conduct extensive analyses to understand their human feedback dataset and fine-tuned models. They establish that their reward model generalizes to new datasets, and that optimizing their reward model results in better summaries than optimizing ROUGE according to humans. The paper also discusses the broader implications of their work, including the potential for aligning machine learning algorithms with human preferences and the risks associated with training models using human feedback. The authors conclude that their method is a promising approach for improving summary quality and aligning machine learning systems with human intentions.
Reach us at info@study.space