[slides] Don%E2%80%99t Stop Pretraining%3A Adapt Language Models to Domains and Tasks

The paper "Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks" by Suchin Gururangan et al. investigates the effectiveness of adapting pre-trained language models to specific domains and tasks. The authors explore whether continued pre-training on domain-specific or task-relevant data can improve performance on downstream tasks. They use RoBERTa, a large pre-trained language model, and conduct experiments across four domains (biomedical, computer science publications, news, and reviews) and eight classification tasks. The results show that domain-adaptive pre-training (DAPT) and task-adaptive pre-training (TAPT) significantly enhance performance, even in low-resource settings. DAPT involves continuing pre-training on domain-specific unlabeled data, while TAPT focuses on pre-training on task-relevant data. The authors also demonstrate that combining both approaches yields the best results. Additionally, they explore methods for automatically selecting relevant unlabeled data, showing that this can further improve performance. The study highlights the importance of adapting pre-trained models to specific domains and tasks to achieve better performance in downstream applications.The paper "Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks" by Suchin Gururangan et al. investigates the effectiveness of adapting pre-trained language models to specific domains and tasks. The authors explore whether continued pre-training on domain-specific or task-relevant data can improve performance on downstream tasks. They use RoBERTa, a large pre-trained language model, and conduct experiments across four domains (biomedical, computer science publications, news, and reviews) and eight classification tasks. The results show that domain-adaptive pre-training (DAPT) and task-adaptive pre-training (TAPT) significantly enhance performance, even in low-resource settings. DAPT involves continuing pre-training on domain-specific unlabeled data, while TAPT focuses on pre-training on task-relevant data. The authors also demonstrate that combining both approaches yields the best results. Additionally, they explore methods for automatically selecting relevant unlabeled data, showing that this can further improve performance. The study highlights the importance of adapting pre-trained models to specific domains and tasks to achieve better performance in downstream applications.

Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks

July 5 - 10, 2020 | Suchin Gururangan, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, Noah A. Smith