2024 | William Merrill, Jackson Petty, Ashish Sabharwal
This paper investigates the expressive power of state-space models (SSMs) in state-tracking tasks, challenging the assumption that their recurrent architecture gives them an advantage over transformers. The authors show that SSMs like S4 and Mamba, despite their recurrent design, have similar expressive limitations to transformers, both being confined to the complexity class TC^0. This means they cannot solve inherently sequential problems such as permutation composition, which is central to state-tracking tasks like chess move tracking, code evaluation, and entity tracking in narratives. Theoretical analysis and experiments confirm that SSMs cannot learn to compose permutations with a fixed number of layers, unlike simple recurrent neural networks (RNNs). The paper also shows that only minimal changes to SSMs can enable them to express and learn state tracking, suggesting the need for new, more expressive SSM architectures. While SSMs cannot solve complex state-tracking problems, the paper also presents an extended SSM variant, Input-Dependent S4 (IDS4), which can express and learn the S5 word problem, demonstrating that current SSM limitations can be overcome. Empirical results show that IDS4 and RNNs can solve the S5 problem efficiently, while transformers, S4, and Mamba require increasing depth with sequence length. The study highlights the importance of theoretical analysis in understanding the capabilities and limitations of SSMs for real-world state-tracking tasks.This paper investigates the expressive power of state-space models (SSMs) in state-tracking tasks, challenging the assumption that their recurrent architecture gives them an advantage over transformers. The authors show that SSMs like S4 and Mamba, despite their recurrent design, have similar expressive limitations to transformers, both being confined to the complexity class TC^0. This means they cannot solve inherently sequential problems such as permutation composition, which is central to state-tracking tasks like chess move tracking, code evaluation, and entity tracking in narratives. Theoretical analysis and experiments confirm that SSMs cannot learn to compose permutations with a fixed number of layers, unlike simple recurrent neural networks (RNNs). The paper also shows that only minimal changes to SSMs can enable them to express and learn state tracking, suggesting the need for new, more expressive SSM architectures. While SSMs cannot solve complex state-tracking problems, the paper also presents an extended SSM variant, Input-Dependent S4 (IDS4), which can express and learn the S5 word problem, demonstrating that current SSM limitations can be overcome. Empirical results show that IDS4 and RNNs can solve the S5 problem efficiently, while transformers, S4, and Mamba require increasing depth with sequence length. The study highlights the importance of theoretical analysis in understanding the capabilities and limitations of SSMs for real-world state-tracking tasks.