23 May 2024 | Zhiyuan Li, Hong Liu, Denny Zhou, and Tengyu Ma
The paper explores the effectiveness of chain of thought (CoT) in improving the accuracy of large language models (LLMs) on arithmetic and symbolic reasoning tasks. CoT involves instructing models to generate intermediate steps, which enhances their ability to perform serial computations, a capability that is otherwise lacking in transformers, especially at low depths. The authors provide a theoretical framework to understand the power of CoT through the lens of expressiveness, using circuit complexity as a measure. They show that constant-depth transformers with constant-bit precision can only solve problems in AC$^0$, a subset of TC$^0$. However, with T steps of CoT, these transformers can solve any problem solvable by boolean circuits of size T. Empirically, enabling CoT significantly improves the accuracy of low-depth transformers on tasks that are inherently serial, such as modular addition, permutation composition, iterated squaring, and circuit value problems. The paper also discusses related work and concludes by highlighting the importance of CoT in enhancing the expressiveness of transformers.The paper explores the effectiveness of chain of thought (CoT) in improving the accuracy of large language models (LLMs) on arithmetic and symbolic reasoning tasks. CoT involves instructing models to generate intermediate steps, which enhances their ability to perform serial computations, a capability that is otherwise lacking in transformers, especially at low depths. The authors provide a theoretical framework to understand the power of CoT through the lens of expressiveness, using circuit complexity as a measure. They show that constant-depth transformers with constant-bit precision can only solve problems in AC$^0$, a subset of TC$^0$. However, with T steps of CoT, these transformers can solve any problem solvable by boolean circuits of size T. Empirically, enabling CoT significantly improves the accuracy of low-depth transformers on tasks that are inherently serial, such as modular addition, permutation composition, iterated squaring, and circuit value problems. The paper also discusses related work and concludes by highlighting the importance of CoT in enhancing the expressiveness of transformers.