22 Jul 2020 | Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, Dario Amodei
This paper explores the capabilities of large language models in few-shot learning, a task where models must perform well on new tasks with limited examples. The authors train GPT-3, an autoregressive language model with 175 billion parameters, and evaluate its performance on various NLP tasks, including translation, question answering, and cloze tasks. GPT-3 demonstrates strong performance in the few-shot setting, sometimes even surpassing state-of-the-art fine-tuned models. The study also identifies limitations, such as struggles on certain datasets and methodological issues related to training on large web corpora. Additionally, GPT-3 can generate synthetic news articles that are difficult for humans to distinguish from real articles, raising broader societal concerns. The paper discusses the broader impacts of these findings and the potential for misuse of language models.This paper explores the capabilities of large language models in few-shot learning, a task where models must perform well on new tasks with limited examples. The authors train GPT-3, an autoregressive language model with 175 billion parameters, and evaluate its performance on various NLP tasks, including translation, question answering, and cloze tasks. GPT-3 demonstrates strong performance in the few-shot setting, sometimes even surpassing state-of-the-art fine-tuned models. The study also identifies limitations, such as struggles on certain datasets and methodological issues related to training on large web corpora. Additionally, GPT-3 can generate synthetic news articles that are difficult for humans to distinguish from real articles, raising broader societal concerns. The paper discusses the broader impacts of these findings and the potential for misuse of language models.