2020 | Colin Raffel*, Noam Shazeer*, Adam Roberts*, Katherine Lee*, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu
This paper explores the limits of transfer learning in natural language processing (NLP) by introducing a unified text-to-text framework that converts all text-based language problems into a text-to-text format. The authors systematically study various aspects of transfer learning, including pre-training objectives, architectures, unlabeled data sets, and transfer approaches, across dozens of language understanding tasks. By combining insights from their exploration with scale and their new "Colossal Clean Crawled Corpus" (C4), they achieve state-of-the-art results on benchmarks covering summarization, question answering, text classification, and more. They release their dataset, pre-trained models, and code to facilitate future work on transfer learning for NLP.
The paper introduces a unified approach to transfer learning by treating every text processing problem as a "text-to-text" problem, i.e., taking text as input and producing new text as output. This approach allows the same model, objective, training procedure, and decoding process to be applied to every task considered. The authors evaluate performance on a wide variety of English-based NLP problems, including question answering, document summarization, and sentiment classification. With this unified approach, they compare the effectiveness of different transfer learning objectives, unlabeled data sets, and other factors while exploring the limits of transfer learning for NLP by scaling up models and data sets beyond what has previously been considered.
The authors emphasize that their goal is not to propose new methods but to provide a comprehensive perspective on where the field stands. Their work primarily comprises a survey, exploration, and empirical comparison of existing techniques. They also explore the limits of current approaches by scaling up the insights from their systematic study (training models up to 11 billion parameters) to obtain state-of-the-art results in many of the tasks they consider. To perform experiments at this scale, they introduce the "Colossal Clean Crawled Corpus" (C4), a dataset consisting of hundreds of gigabytes of clean English text scraped from the web.
The paper is structured as follows: In the following section, they discuss their base model and its implementation, their procedure for formulating every text processing problem as a text-to-text task, and the suite of tasks they consider. In Section 3, they present a large set of experiments that explore the field of transfer learning for NLP. At the end of the section (Section 3.7), they combine insights from their systematic study to obtain state-of-the-art results on a wide variety of benchmarks. Finally, they provide a summary of their results and wrap up with a look towards the future in Section 4.This paper explores the limits of transfer learning in natural language processing (NLP) by introducing a unified text-to-text framework that converts all text-based language problems into a text-to-text format. The authors systematically study various aspects of transfer learning, including pre-training objectives, architectures, unlabeled data sets, and transfer approaches, across dozens of language understanding tasks. By combining insights from their exploration with scale and their new "Colossal Clean Crawled Corpus" (C4), they achieve state-of-the-art results on benchmarks covering summarization, question answering, text classification, and more. They release their dataset, pre-trained models, and code to facilitate future work on transfer learning for NLP.
The paper introduces a unified approach to transfer learning by treating every text processing problem as a "text-to-text" problem, i.e., taking text as input and producing new text as output. This approach allows the same model, objective, training procedure, and decoding process to be applied to every task considered. The authors evaluate performance on a wide variety of English-based NLP problems, including question answering, document summarization, and sentiment classification. With this unified approach, they compare the effectiveness of different transfer learning objectives, unlabeled data sets, and other factors while exploring the limits of transfer learning for NLP by scaling up models and data sets beyond what has previously been considered.
The authors emphasize that their goal is not to propose new methods but to provide a comprehensive perspective on where the field stands. Their work primarily comprises a survey, exploration, and empirical comparison of existing techniques. They also explore the limits of current approaches by scaling up the insights from their systematic study (training models up to 11 billion parameters) to obtain state-of-the-art results in many of the tasks they consider. To perform experiments at this scale, they introduce the "Colossal Clean Crawled Corpus" (C4), a dataset consisting of hundreds of gigabytes of clean English text scraped from the web.
The paper is structured as follows: In the following section, they discuss their base model and its implementation, their procedure for formulating every text processing problem as a text-to-text task, and the suite of tasks they consider. In Section 3, they present a large set of experiments that explore the field of transfer learning for NLP. At the end of the section (Section 3.7), they combine insights from their systematic study to obtain state-of-the-art results on a wide variety of benchmarks. Finally, they provide a summary of their results and wrap up with a look towards the future in Section 4.