2020 | Colin Raffel*, Noam Shazeer*, Adam Roberts*, Katherine Lee*, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu
This paper explores the landscape of transfer learning techniques for Natural Language Processing (NLP) by introducing a unified framework that converts all text-based language problems into a text-to-text format. The authors systematically compare pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on various language understanding tasks. By combining insights from their exploration with scale and the "Colossal Clean Crawled Corpus," they achieve state-of-the-art results on benchmarks covering summarization, question answering, text classification, and more. The paper also releases the dataset, pre-trained models, and code to facilitate future work on transfer learning for NLP. The authors emphasize that their goal is not to propose new methods but to provide a comprehensive perspective on the current state of the field. They discuss the setup, including the Transformer model architecture and downstream tasks, and present a large-scale empirical study to explore the field of transfer learning for NLP.This paper explores the landscape of transfer learning techniques for Natural Language Processing (NLP) by introducing a unified framework that converts all text-based language problems into a text-to-text format. The authors systematically compare pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on various language understanding tasks. By combining insights from their exploration with scale and the "Colossal Clean Crawled Corpus," they achieve state-of-the-art results on benchmarks covering summarization, question answering, text classification, and more. The paper also releases the dataset, pre-trained models, and code to facilitate future work on transfer learning for NLP. The authors emphasize that their goal is not to propose new methods but to provide a comprehensive perspective on the current state of the field. They discuss the setup, including the Transformer model architecture and downstream tasks, and present a large-scale empirical study to explore the field of transfer learning for NLP.