The Illusion of State in State-Space Models

The Illusion of State in State-Space Models

4 Jun 2024 | William Merrill, Jackson Petty, Ashish Sabharwal
The paper "The Illusion of State in State-Space Models" by William Merrill, Jackson Petty, and Ashish Sabharwal explores the expressive power of state-space models (SSMs) compared to transformers. The authors argue that despite their stateful design, SSMs like S4 and Mamba have limited expressive power for state tracking, similar to transformers. They prove that these models are in the complexity class L-uniform TC^0, which means they cannot solve inherently sequential problems like permutation composition, a fundamental problem in state-tracking tasks such as chess move tracking, code evaluation, and entity tracking in narratives. The paper also presents empirical evidence showing that SSMs struggle with state-tracking problems, while simple recurrent neural networks (RNNs) can solve them with a single layer. To address this limitation, the authors propose minimal extensions to SSMs, such as adding nonlinearities or making the transition matrices input-dependent, which can enhance their expressive power for state tracking. These extensions allow SSMs to solve problems like the $S_5$ word problem, which is $\mathrm{NC}^1$-complete. The paper concludes by discussing the practical implications of these findings and the potential for developing new neural architectures that balance parallelism and expressive power for state tracking.The paper "The Illusion of State in State-Space Models" by William Merrill, Jackson Petty, and Ashish Sabharwal explores the expressive power of state-space models (SSMs) compared to transformers. The authors argue that despite their stateful design, SSMs like S4 and Mamba have limited expressive power for state tracking, similar to transformers. They prove that these models are in the complexity class L-uniform TC^0, which means they cannot solve inherently sequential problems like permutation composition, a fundamental problem in state-tracking tasks such as chess move tracking, code evaluation, and entity tracking in narratives. The paper also presents empirical evidence showing that SSMs struggle with state-tracking problems, while simple recurrent neural networks (RNNs) can solve them with a single layer. To address this limitation, the authors propose minimal extensions to SSMs, such as adding nonlinearities or making the transition matrices input-dependent, which can enhance their expressive power for state tracking. These extensions allow SSMs to solve problems like the $S_5$ word problem, which is $\mathrm{NC}^1$-complete. The paper concludes by discussing the practical implications of these findings and the potential for developing new neural architectures that balance parallelism and expressive power for state tracking.
Reach us at info@study.space
[slides and audio] The Illusion of State in State-Space Models