Learning to summarize from human feedback

Learning to summarize from human feedback

15 Feb 2022 | Nisan Stiennon*, Long Ouyang*, Jeff Wu*, Daniel M. Ziegler*, Ryan Lowe*, Chelsea Voss*, Alec Radford, Dario Amodei, Paul Christiano*
This paper addresses the issue of aligning machine learning models with human preferences, particularly in the context of summarization tasks. The authors propose a method to train language models using human feedback to optimize for summary quality, rather than relying solely on metrics like ROUGE. They collect a large dataset of human comparisons between summaries and train a reward model to predict the preferred summary. This reward model is then used to fine-tune a summarization policy via reinforcement learning. The method is evaluated on the Reddit TL:DR dataset and the CNN/DM news articles dataset, showing that the models trained with human feedback significantly outperform both human reference summaries and larger models trained with supervised learning alone. The authors also demonstrate that their models generalize well to new domains and provide extensive analyses to understand the performance of their models and reward functions. The paper highlights the importance of aligning machine learning algorithms with human preferences to ensure better alignment with human well-being and safety in more complex tasks.This paper addresses the issue of aligning machine learning models with human preferences, particularly in the context of summarization tasks. The authors propose a method to train language models using human feedback to optimize for summary quality, rather than relying solely on metrics like ROUGE. They collect a large dataset of human comparisons between summaries and train a reward model to predict the preferred summary. This reward model is then used to fine-tune a summarization policy via reinforcement learning. The method is evaluated on the Reddit TL:DR dataset and the CNN/DM news articles dataset, showing that the models trained with human feedback significantly outperform both human reference summaries and larger models trained with supervised learning alone. The authors also demonstrate that their models generalize well to new domains and provide extensive analyses to understand the performance of their models and reward functions. The paper highlights the importance of aligning machine learning algorithms with human preferences to ensure better alignment with human well-being and safety in more complex tasks.
Reach us at info@study.space