Making Pre-trained Language Models Better Few-shot Learners

Making Pre-trained Language Models Better Few-shot Learners

2 Jun 2021 | Tianyu Gao†*, Adam Fisch†*, Danqi Chen†
The paper "Making Pre-trained Language Models Better Few-shot Learners" by Tianyu Gao, Adam Fisch, and Danqi Chen explores methods to enhance the few-shot learning capabilities of pre-trained language models. Inspired by the GPT-3 model's ability to perform well with minimal training data, the authors propose LM-BFF (Better Few-shot Fine-tuning of Language Models), a suite of techniques designed to improve the performance of smaller language models on few-shot learning tasks. The key contributions include: 1. **Prompt-based Fine-tuning**: This approach treats the downstream task as a masked language modeling problem, where the model generates a textual response based on a given prompt. The authors introduce automatic prompt generation, including a pruned brute-force search for optimal label words and a novel decoding objective using the T5 model to generate templates. 2. **Dynamic Incorporation of Demonstrations**: Unlike GPT-3's approach of randomly sampling and concatenating demonstrations, the authors propose a refined strategy that randomly samples one example per class to create multiple minimal demonstration sets. They also devise a sampling strategy that pairs inputs with similar examples to provide more discriminative comparisons. The paper presents a systematic evaluation across various NLP tasks, including classification and regression, demonstrating that their methods significantly outperform standard fine-tuning procedures. The average improvement across all tasks is 11%, with up to 30% absolute improvement in some cases. The approach is task-agnostic and requires minimal assumptions about task resources and domain expertise.The paper "Making Pre-trained Language Models Better Few-shot Learners" by Tianyu Gao, Adam Fisch, and Danqi Chen explores methods to enhance the few-shot learning capabilities of pre-trained language models. Inspired by the GPT-3 model's ability to perform well with minimal training data, the authors propose LM-BFF (Better Few-shot Fine-tuning of Language Models), a suite of techniques designed to improve the performance of smaller language models on few-shot learning tasks. The key contributions include: 1. **Prompt-based Fine-tuning**: This approach treats the downstream task as a masked language modeling problem, where the model generates a textual response based on a given prompt. The authors introduce automatic prompt generation, including a pruned brute-force search for optimal label words and a novel decoding objective using the T5 model to generate templates. 2. **Dynamic Incorporation of Demonstrations**: Unlike GPT-3's approach of randomly sampling and concatenating demonstrations, the authors propose a refined strategy that randomly samples one example per class to create multiple minimal demonstration sets. They also devise a sampling strategy that pairs inputs with similar examples to provide more discriminative comparisons. The paper presents a systematic evaluation across various NLP tasks, including classification and regression, demonstrating that their methods significantly outperform standard fine-tuning procedures. The average improvement across all tasks is 11%, with up to 30% absolute improvement in some cases. The approach is task-agnostic and requires minimal assumptions about task resources and domain expertise.
Reach us at info@study.space
[slides and audio] Making Pre-trained Language Models Better Few-shot Learners