[slides] The Power of Scale for Parameter-Efficient Prompt Tuning

This paper explores "prompt tuning," a method for learning "soft prompts" to condition frozen language models on specific downstream tasks. Unlike discrete text prompts used by GPT-3, soft prompts are learned through backpropagation and can be tuned to incorporate signals from any number of labeled examples. The authors demonstrate that their end-to-end learned approach outperforms GPT-3's few-shot learning by a large margin. More significantly, through ablations on model size using T5, they show that prompt tuning becomes more competitive with scale: as models exceed billions of parameters, the method "closes the gap" and matches the strong performance of model tuning (where all model weights are tuned). This finding is particularly relevant because large models are costly to share and serve, and the ability to reuse a single frozen model for multiple downstream tasks can ease this burden. The method can be seen as a simplification of the recently proposed "prefix tuning" and is compared to other similar approaches. Finally, the authors show that conditioning a frozen model with soft prompts confers benefits in robustness to domain transfer and enables efficient "prompt ensembling."This paper explores "prompt tuning," a method for learning "soft prompts" to condition frozen language models on specific downstream tasks. Unlike discrete text prompts used by GPT-3, soft prompts are learned through backpropagation and can be tuned to incorporate signals from any number of labeled examples. The authors demonstrate that their end-to-end learned approach outperforms GPT-3's few-shot learning by a large margin. More significantly, through ablations on model size using T5, they show that prompt tuning becomes more competitive with scale: as models exceed billions of parameters, the method "closes the gap" and matches the strong performance of model tuning (where all model weights are tuned). This finding is particularly relevant because large models are costly to share and serve, and the ability to reuse a single frozen model for multiple downstream tasks can ease this burden. The method can be seen as a simplification of the recently proposed "prefix tuning" and is compared to other similar approaches. Finally, the authors show that conditioning a frozen model with soft prompts confers benefits in robustness to domain transfer and enables efficient "prompt ensembling."

The Power of Scale for Parameter-Efficient Prompt Tuning

2 Sep 2021 | Brian Lester*, Rami Al-Rfou, Noah Constant