July 5 - 10, 2020 | Suchin Gururangan, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, Noah A. Smith
Language models pretrained on diverse text sources form the foundation of modern NLP. This paper investigates whether adapting pretrained models to specific domains or tasks still provides performance gains. Across four domains (biomedical, computer science, news, and reviews) and eight classification tasks, domain-adaptive pretraining (DAPT) improves performance in both high- and low-resource settings. Task-adaptive pretraining (TAPT), which uses unlabeled task data, further enhances performance even after DAPT. Additionally, using simple data selection strategies to augment task corpora proves effective, especially when domain-adaptive pretraining is unavailable. The study shows that multi-phase adaptive pretraining consistently improves task performance.
The paper explores the benefits of continued pretraining on domain-specific or task-specific data. DAPT involves pretraining on domain-specific text, while TAPT uses unlabeled task data. Results show that TAPT provides significant performance boosts for RoBERTA, both with and without DAPT. When additional unlabeled data is manually curated, performance improves further. Automated data selection methods, such as kNN-TAPT, also enhance performance in low-resource settings.
The study also examines the importance of pretraining on human-curated data and the effectiveness of domain and task adaptation. It highlights that pretraining on domain-relevant data is crucial for improving performance on tasks. The paper demonstrates that combining DAPT and TAPT achieves the best results, with TAPT being more efficient and effective in many cases. The findings suggest that continued pretraining on task-specific data is beneficial, even when the task is closely related to the original pretraining domain.
The paper also discusses the computational requirements of different pretraining approaches. TAPT is significantly faster and less resource-intensive than DAPT. However, combining both methods can yield the best performance. The study emphasizes the importance of data selection and the need for further research into more efficient pretraining strategies. Overall, the results show that adaptive pretraining techniques can significantly improve task performance, making them valuable for a wide range of NLP applications.Language models pretrained on diverse text sources form the foundation of modern NLP. This paper investigates whether adapting pretrained models to specific domains or tasks still provides performance gains. Across four domains (biomedical, computer science, news, and reviews) and eight classification tasks, domain-adaptive pretraining (DAPT) improves performance in both high- and low-resource settings. Task-adaptive pretraining (TAPT), which uses unlabeled task data, further enhances performance even after DAPT. Additionally, using simple data selection strategies to augment task corpora proves effective, especially when domain-adaptive pretraining is unavailable. The study shows that multi-phase adaptive pretraining consistently improves task performance.
The paper explores the benefits of continued pretraining on domain-specific or task-specific data. DAPT involves pretraining on domain-specific text, while TAPT uses unlabeled task data. Results show that TAPT provides significant performance boosts for RoBERTA, both with and without DAPT. When additional unlabeled data is manually curated, performance improves further. Automated data selection methods, such as kNN-TAPT, also enhance performance in low-resource settings.
The study also examines the importance of pretraining on human-curated data and the effectiveness of domain and task adaptation. It highlights that pretraining on domain-relevant data is crucial for improving performance on tasks. The paper demonstrates that combining DAPT and TAPT achieves the best results, with TAPT being more efficient and effective in many cases. The findings suggest that continued pretraining on task-specific data is beneficial, even when the task is closely related to the original pretraining domain.
The paper also discusses the computational requirements of different pretraining approaches. TAPT is significantly faster and less resource-intensive than DAPT. However, combining both methods can yield the best performance. The study emphasizes the importance of data selection and the need for further research into more efficient pretraining strategies. Overall, the results show that adaptive pretraining techniques can significantly improve task performance, making them valuable for a wide range of NLP applications.