[slides and audio] Repeat After Me%3A Transformers are Better than State Space Models at Copying

Transformers are the dominant architecture for sequence modeling, but generalized state space models (GSSMs) have gained attention for their efficiency in inference. This paper demonstrates that while GSSMs are efficient, they are limited compared to transformers on tasks requiring copying from the input context. The authors provide a theoretical analysis of the string copying task, proving that a two-layer transformer can copy strings of exponential length, whereas GSSMs are fundamentally limited by their fixed-size latent state. Empirical results show that transformers outperform GSSMs in terms of efficiency and generalization on synthetic tasks requiring context copying. Pre-trained large language models further confirm that transformers outperform GSSMs in copying and retrieving information from context. These findings suggest a fundamental gap between transformers and GSSMs on practical tasks.Transformers are the dominant architecture for sequence modeling, but generalized state space models (GSSMs) have gained attention for their efficiency in inference. This paper demonstrates that while GSSMs are efficient, they are limited compared to transformers on tasks requiring copying from the input context. The authors provide a theoretical analysis of the string copying task, proving that a two-layer transformer can copy strings of exponential length, whereas GSSMs are fundamentally limited by their fixed-size latent state. Empirical results show that transformers outperform GSSMs in terms of efficiency and generalization on synthetic tasks requiring context copying. Pre-trained large language models further confirm that transformers outperform GSSMs in copying and retrieving information from context. These findings suggest a fundamental gap between transformers and GSSMs on practical tasks.

Repeat After Me: Transformers are Better than State Space Models at Copying

3 Jun 2024 | Samy Jelassi, David Brandfonbrener, Sham M. Kakade, Eran Malach