Linear Transformers are Versatile In-Context Learners

Linear Transformers are Versatile In-Context Learners

2024 | Max Vladymyrov, Johannes von Oswald, Mark Sandler, Rong Ge
This paper explores the in-context learning capabilities of linear transformers, demonstrating that they can implicitly perform gradient-descent-like algorithms. The authors prove that any linear transformer maintains an implicit linear model and can be interpreted as performing a variant of preconditioned gradient descent. They investigate the use of linear transformers in a challenging scenario where training data is corrupted with varying levels of noise. Surprisingly, the transformers discover a sophisticated optimization algorithm that outperforms or matches many reasonable baselines. The algorithm incorporates momentum and adaptive rescaling based on noise levels, showcasing the surprising ability of linear transformers to discover complex optimization strategies. The findings highlight the versatility of linear transformers in handling complex problems and their potential to advance optimization and machine learning techniques.This paper explores the in-context learning capabilities of linear transformers, demonstrating that they can implicitly perform gradient-descent-like algorithms. The authors prove that any linear transformer maintains an implicit linear model and can be interpreted as performing a variant of preconditioned gradient descent. They investigate the use of linear transformers in a challenging scenario where training data is corrupted with varying levels of noise. Surprisingly, the transformers discover a sophisticated optimization algorithm that outperforms or matches many reasonable baselines. The algorithm incorporates momentum and adaptive rescaling based on noise levels, showcasing the surprising ability of linear transformers to discover complex optimization strategies. The findings highlight the versatility of linear transformers in handling complex problems and their potential to advance optimization and machine learning techniques.
Reach us at info@study.space
[slides] Linear Transformers are Versatile In-Context Learners | StudySpace