8 Feb 2022 | Jason Wei*, Maarten Bosma*, Vincent Y. Zhao*, Kelvin Guu*, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, and Quoc V. Le
This paper explores a method to enhance the zero-shot learning capabilities of large language models through *instruction tuning*. The authors fine-tune a 137B parameter pretrained language model, LaMDA-PT, on over 60 NLP datasets described via natural language instructions. The resulting model, named FLAN, is evaluated on unseen task types and shows significant improvements over the unmodified model and even surpasses 175B parameter GPT-3 on 20 out of 25 datasets. Ablation studies reveal that the number of finetuning datasets, model scale, and natural language instructions are key to the success of instruction tuning. The paper also discusses the limitations and ethical considerations of the approach, highlighting the potential for using labeled data to improve large language models' performance on a broader range of tasks.This paper explores a method to enhance the zero-shot learning capabilities of large language models through *instruction tuning*. The authors fine-tune a 137B parameter pretrained language model, LaMDA-PT, on over 60 NLP datasets described via natural language instructions. The resulting model, named FLAN, is evaluated on unseen task types and shows significant improvements over the unmodified model and even surpasses 175B parameter GPT-3 on 20 out of 25 datasets. Ablation studies reveal that the number of finetuning datasets, model scale, and natural language instructions are key to the success of instruction tuning. The paper also discusses the limitations and ethical considerations of the approach, highlighting the potential for using labeled data to improve large language models' performance on a broader range of tasks.