20 Jun 2024 | Franz Nowak, Anej Svete, Alexandra Butoi, Ryan Cotterell
This paper investigates the representational capacity of neural language models (LMs) with chain-of-thought (CoT) reasoning. CoT reasoning allows LMs to store intermediate results in a scratch space, enabling them to perform additional sequential computation. The authors formalize CoT reasoning in a probabilistic setting and show that CoT-augmented LMs can represent the same family of distributions over strings as probabilistic Turing machines.
The paper compares LMs to Turing machines, noting that while Turing machines decide language membership, LMs define distributions over strings. The authors introduce the concept of "regular reducibility" to define a new relationship between language models, where one LM is regularly reducible to another if the strings generated by the latter are sufficiently simple transformations of those generated by the former.
The authors analyze the representational capacity of recurrent and transformer LMs with CoT reasoning, showing that they can represent the same family of distributions over strings as probabilistic Turing machines. They demonstrate that CoT reasoning increases the computational power of both RNN and transformer LMs. Specifically, they show that CoT-augmented constant-precision RNN LMs are equivalent to probabilistic finite-state automata (PFSAs), while linearly bounded precision RNN LMs and logarithmically bounded precision decoder-only transformer LMs with CoT reasoning can simulate any probabilistic Turing machine.
The paper also discusses the theoretical implications of CoT reasoning, showing that it allows neural LMs to model non-determinism and increases their representational capacity. The authors conclude that CoT reasoning provides a principled way to overcome the limitations of fixed-precision neural models and allow them to simulate non-deterministic automata.This paper investigates the representational capacity of neural language models (LMs) with chain-of-thought (CoT) reasoning. CoT reasoning allows LMs to store intermediate results in a scratch space, enabling them to perform additional sequential computation. The authors formalize CoT reasoning in a probabilistic setting and show that CoT-augmented LMs can represent the same family of distributions over strings as probabilistic Turing machines.
The paper compares LMs to Turing machines, noting that while Turing machines decide language membership, LMs define distributions over strings. The authors introduce the concept of "regular reducibility" to define a new relationship between language models, where one LM is regularly reducible to another if the strings generated by the latter are sufficiently simple transformations of those generated by the former.
The authors analyze the representational capacity of recurrent and transformer LMs with CoT reasoning, showing that they can represent the same family of distributions over strings as probabilistic Turing machines. They demonstrate that CoT reasoning increases the computational power of both RNN and transformer LMs. Specifically, they show that CoT-augmented constant-precision RNN LMs are equivalent to probabilistic finite-state automata (PFSAs), while linearly bounded precision RNN LMs and logarithmically bounded precision decoder-only transformer LMs with CoT reasoning can simulate any probabilistic Turing machine.
The paper also discusses the theoretical implications of CoT reasoning, showing that it allows neural LMs to model non-determinism and increases their representational capacity. The authors conclude that CoT reasoning provides a principled way to overcome the limitations of fixed-precision neural models and allow them to simulate non-deterministic automata.