1 Jun 2022 | Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, John Schulman
WebGPT is a system that uses human feedback to improve the ability of a language model to answer long-form questions by interacting with a text-based web-browsing environment. The model is fine-tuned using behavior cloning and rejection sampling against a reward model trained to predict human preferences. This approach allows the model to generate answers that are preferred by humans 56% of the time compared to human demonstrators and 69% of the time compared to the highest-voted answer from Reddit. The system uses a text-based web-browsing environment where the model can search and navigate the web, and it collects references from web pages to support its answers. This helps human evaluators assess the factual accuracy of the answers without needing to conduct independent research. The model is trained on the ELI5 dataset, which contains questions from the "Explain Like I'm Five" subreddit, and also includes comparisons between model-generated answers to the same question. The system uses four main training methods: behavior cloning, reward modeling, reinforcement learning, and rejection sampling. The best-performing model is a 175B parameter model that uses behavior cloning and rejection sampling. The model outperforms GPT-3 on the ELI5 dataset and performs well on the TruthfulQA dataset, which contains adversarial questions designed to test the truthfulness of answers. The system also addresses the challenge of ensuring factual accuracy by using references collected during browsing. However, the model can still produce some false statements, particularly when it relies on unreliable sources. The system also raises concerns about the potential for bias and the need for careful evaluation of the truthfulness of answers. The research highlights the importance of using human feedback to improve the accuracy and reliability of AI systems, and it suggests that further research is needed to address the challenges of ensuring factual accuracy in AI-generated answers.WebGPT is a system that uses human feedback to improve the ability of a language model to answer long-form questions by interacting with a text-based web-browsing environment. The model is fine-tuned using behavior cloning and rejection sampling against a reward model trained to predict human preferences. This approach allows the model to generate answers that are preferred by humans 56% of the time compared to human demonstrators and 69% of the time compared to the highest-voted answer from Reddit. The system uses a text-based web-browsing environment where the model can search and navigate the web, and it collects references from web pages to support its answers. This helps human evaluators assess the factual accuracy of the answers without needing to conduct independent research. The model is trained on the ELI5 dataset, which contains questions from the "Explain Like I'm Five" subreddit, and also includes comparisons between model-generated answers to the same question. The system uses four main training methods: behavior cloning, reward modeling, reinforcement learning, and rejection sampling. The best-performing model is a 175B parameter model that uses behavior cloning and rejection sampling. The model outperforms GPT-3 on the ELI5 dataset and performs well on the TruthfulQA dataset, which contains adversarial questions designed to test the truthfulness of answers. The system also addresses the challenge of ensuring factual accuracy by using references collected during browsing. However, the model can still produce some false statements, particularly when it relies on unreliable sources. The system also raises concerns about the potential for bias and the need for careful evaluation of the truthfulness of answers. The research highlights the importance of using human feedback to improve the accuracy and reliability of AI systems, and it suggests that further research is needed to address the challenges of ensuring factual accuracy in AI-generated answers.