20 Jun 2024 | Franz Nowak, Anej Svete, Alexandra Butoi, Ryan Cotterell
The paper explores the representational capacity of neural language models (LMs) enhanced with chain-of-thought (CoT) reasoning, a process that generates intermediate results to guide the model towards a final answer. CoT reasoning is argued to extend the computational power of LMs, similar to Turing machines, but the direct comparison is criticized for its category error. To address this, the authors formalize CoT reasoning in a probabilistic setting, showing that recurrent and transformer LMs with CoT reasoning can represent the same family of distributions over strings as probabilistic Turing machines.
Key findings include:
1. **Probabilistic Models of Computation**: LMs are treated as probabilistic models of computation, assigning weights to strings rather than deciding language membership.
2. **Regular Reducibility**: A new concept is introduced to define the relationship between language models, where an LM is regularly reducible to another if the strings generated by the latter are simple transformations of those generated by the former.
3. **Equivalence Results**:
- Constant-precision RNN LMs with CoT reasoning are equivalent to probabilistic finite-state automata (PFSAs).
- Linearly bounded precision RNN LMs and logarithmically bounded precision decoder-only transformer LMs with CoT reasoning can simulate any probabilistic Turing machine.
4. **Theoretical Framework**: The paper provides a detailed theoretical framework for understanding CoT reasoning in the context of LMs, bridging the gap between empirical success and formal analysis.
The authors conclude that CoT reasoning significantly enhances the computational capabilities of LMs, making them Turing-complete in a probabilistic sense. However, the constructions rely on fixed-precision arithmetic, which is unrealistic in practical applications.The paper explores the representational capacity of neural language models (LMs) enhanced with chain-of-thought (CoT) reasoning, a process that generates intermediate results to guide the model towards a final answer. CoT reasoning is argued to extend the computational power of LMs, similar to Turing machines, but the direct comparison is criticized for its category error. To address this, the authors formalize CoT reasoning in a probabilistic setting, showing that recurrent and transformer LMs with CoT reasoning can represent the same family of distributions over strings as probabilistic Turing machines.
Key findings include:
1. **Probabilistic Models of Computation**: LMs are treated as probabilistic models of computation, assigning weights to strings rather than deciding language membership.
2. **Regular Reducibility**: A new concept is introduced to define the relationship between language models, where an LM is regularly reducible to another if the strings generated by the latter are simple transformations of those generated by the former.
3. **Equivalence Results**:
- Constant-precision RNN LMs with CoT reasoning are equivalent to probabilistic finite-state automata (PFSAs).
- Linearly bounded precision RNN LMs and logarithmically bounded precision decoder-only transformer LMs with CoT reasoning can simulate any probabilistic Turing machine.
4. **Theoretical Framework**: The paper provides a detailed theoretical framework for understanding CoT reasoning in the context of LMs, bridging the gap between empirical success and formal analysis.
The authors conclude that CoT reasoning significantly enhances the computational capabilities of LMs, making them Turing-complete in a probabilistic sense. However, the constructions rely on fixed-precision arithmetic, which is unrealistic in practical applications.