Understanding Prompting a Pretrained Transformer Can Be a Universal Approximator

This paper explores the theoretical capabilities of prompting and prefix-tuning in transformer models, demonstrating that these methods can make arbitrary modifications to the behavior of pre-trained models. Specifically, it shows that a single attention head can approximate any continuous function on the hypersphere with arbitrary precision, and that any sequence-to-sequence function can be approximated by prefixing a transformer with a depth linear in the sequence length. The paper also provides Jackson-type bounds on the required prefix length for achieving desired approximation accuracy. These results highlight the expressive power of attention mechanisms and suggest that prefix-tuning can be a universal approximator, even with smaller models than previously thought. The findings have implications for understanding the limitations and potential of context-based fine-tuning methods and provide insights into the mechanisms behind their success in certain tasks.This paper explores the theoretical capabilities of prompting and prefix-tuning in transformer models, demonstrating that these methods can make arbitrary modifications to the behavior of pre-trained models. Specifically, it shows that a single attention head can approximate any continuous function on the hypersphere with arbitrary precision, and that any sequence-to-sequence function can be approximated by prefixing a transformer with a depth linear in the sequence length. The paper also provides Jackson-type bounds on the required prefix length for achieving desired approximation accuracy. These results highlight the expressive power of attention mechanisms and suggest that prefix-tuning can be a universal approximator, even with smaller models than previously thought. The findings have implications for understanding the limitations and potential of context-based fine-tuning methods and provide insights into the mechanisms behind their success in certain tasks.

Prompting a Pretrained Transformer Can Be a Universal Approximator

22 Feb 2024 | Aleksandar Petrov, Philip H.S. Torr, Adel Bibi