WebGPT: Browser-assisted question-answering with human feedback

WebGPT: Browser-assisted question-answering with human feedback

1 Jun 2022 | Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, John Schulman
**WebGPT: Browser-assisted question-answering with human feedback** OpenAI researchers have developed a method to fine-tune GPT-3 to answer long-form questions using a text-based web-browsing environment. This approach allows the model to search and navigate the web, improving both information retrieval and synthesis. The task is set up so that it can be performed by humans, enabling training using imitation learning and optimizing answer quality with human feedback. To facilitate factual accuracy evaluation, models must collect references while browsing. The models are trained and evaluated on the ELI5 dataset, a collection of questions from the "Explain Like I’m Five" subreddit. The best model, obtained by fine-tuning GPT-3 using behavior cloning followed by rejection sampling against a reward model, prefers its answers 56% of the time to those of human demonstrators and 69% of the time to the highest-voted answers from Reddit. The paper also discusses the implications of this work for training models to answer questions truthfully and broader impacts.**WebGPT: Browser-assisted question-answering with human feedback** OpenAI researchers have developed a method to fine-tune GPT-3 to answer long-form questions using a text-based web-browsing environment. This approach allows the model to search and navigate the web, improving both information retrieval and synthesis. The task is set up so that it can be performed by humans, enabling training using imitation learning and optimizing answer quality with human feedback. To facilitate factual accuracy evaluation, models must collect references while browsing. The models are trained and evaluated on the ELI5 dataset, a collection of questions from the "Explain Like I’m Five" subreddit. The best model, obtained by fine-tuning GPT-3 using behavior cloning followed by rejection sampling against a reward model, prefers its answers 56% of the time to those of human demonstrators and 69% of the time to the highest-voted answers from Reddit. The paper also discusses the implications of this work for training models to answer questions truthfully and broader impacts.
Reach us at info@study.space
[slides] WebGPT%3A Browser-assisted question-answering with human feedback | StudySpace