July 15 - 20, 2018 | Jeremy Howard, Sebastian Ruder
This paper introduces Universal Language Model Fine-tuning (ULMFiT), a transfer learning method for natural language processing (NLP) tasks. ULMFiT significantly outperforms existing methods on six text classification tasks, reducing error by 18-24% on most datasets. It achieves performance comparable to training from scratch with 100 times more data using only 100 labeled examples. The method involves three stages: pretraining a language model on a general-domain corpus, fine-tuning the model on the target task, and fine-tuning a classifier for the target task. Key techniques include discriminative fine-tuning, slanted triangular learning rates, and gradual unfreezing to prevent catastrophic forgetting. ULMFiT is effective across diverse tasks and datasets, and it is sample-efficient, requiring minimal labeled data. The method is open-sourced, allowing wider adoption. The paper also discusses related work, including transfer learning in computer vision, hypercolumns in NLP, and multi-task learning. It evaluates ULMFiT on various text classification tasks, showing its effectiveness in both small and large datasets. The results demonstrate that ULMFiT outperforms state-of-the-art methods, particularly in low-resource settings. The paper concludes that ULMFiT is a universal transfer learning method that can be applied to any NLP task.This paper introduces Universal Language Model Fine-tuning (ULMFiT), a transfer learning method for natural language processing (NLP) tasks. ULMFiT significantly outperforms existing methods on six text classification tasks, reducing error by 18-24% on most datasets. It achieves performance comparable to training from scratch with 100 times more data using only 100 labeled examples. The method involves three stages: pretraining a language model on a general-domain corpus, fine-tuning the model on the target task, and fine-tuning a classifier for the target task. Key techniques include discriminative fine-tuning, slanted triangular learning rates, and gradual unfreezing to prevent catastrophic forgetting. ULMFiT is effective across diverse tasks and datasets, and it is sample-efficient, requiring minimal labeled data. The method is open-sourced, allowing wider adoption. The paper also discusses related work, including transfer learning in computer vision, hypercolumns in NLP, and multi-task learning. It evaluates ULMFiT on various text classification tasks, showing its effectiveness in both small and large datasets. The results demonstrate that ULMFiT outperforms state-of-the-art methods, particularly in low-resource settings. The paper concludes that ULMFiT is a universal transfer learning method that can be applied to any NLP task.