Let's Think Dot by Dot: Hidden Computation in Transformer Language Models

Let's Think Dot by Dot: Hidden Computation in Transformer Language Models

24 Apr 2024 | Jacob Pfau, William Merrill & Samuel R. Bowman
This paper investigates the role of filler tokens in transformer language models (LLMs), exploring whether they provide computational benefits beyond chain-of-thought (CoT) reasoning. The authors show that transformers can solve complex algorithmic tasks using filler tokens (e.g., '……') instead of explicit intermediate tokens, but learning to use filler tokens is challenging and requires dense supervision. They also provide a theoretical characterization of the class of problems where filler tokens are useful, relating it to the quantifier depth of first-order logic formulas. For problems with quantifier depth greater than 2, filler tokens can match the performance of CoT reasoning without providing information about intermediate steps. The study evaluates the performance of LLMs on synthetic tasks, including 3SUM and 2SUM-Transform, where filler tokens significantly improve accuracy. For 3SUM, models trained with filler tokens achieve 100% accuracy, while models without filler tokens fail to solve the task. The results suggest that filler tokens can extend the expressive power of transformers within the TC⁰ circuit complexity class, allowing them to solve problems that require nested quantifiers. However, the study also shows that learning to use filler tokens is difficult and requires specific, dense supervision. The paper highlights the limitations of current LLMs in using filler tokens, noting that commercial models like Claude 2 and GPT-3.5 do not benefit from filler tokens on common benchmarks. However, the results suggest that with sufficient training data and computational resources, LLMs could potentially benefit from filler tokens. The study also raises concerns about the hidden computations performed by LLMs, which may not be reflected in the observed chain-of-thought tokens. Overall, the findings suggest that additional tokens can provide computational benefits independent of token choice, and that filler tokens may enable LLMs to solve problems that are not solvable with traditional CoT reasoning. The study provides empirical evidence that filler tokens can enhance the expressive power of transformers, but also highlights the challenges in learning to use them effectively.This paper investigates the role of filler tokens in transformer language models (LLMs), exploring whether they provide computational benefits beyond chain-of-thought (CoT) reasoning. The authors show that transformers can solve complex algorithmic tasks using filler tokens (e.g., '……') instead of explicit intermediate tokens, but learning to use filler tokens is challenging and requires dense supervision. They also provide a theoretical characterization of the class of problems where filler tokens are useful, relating it to the quantifier depth of first-order logic formulas. For problems with quantifier depth greater than 2, filler tokens can match the performance of CoT reasoning without providing information about intermediate steps. The study evaluates the performance of LLMs on synthetic tasks, including 3SUM and 2SUM-Transform, where filler tokens significantly improve accuracy. For 3SUM, models trained with filler tokens achieve 100% accuracy, while models without filler tokens fail to solve the task. The results suggest that filler tokens can extend the expressive power of transformers within the TC⁰ circuit complexity class, allowing them to solve problems that require nested quantifiers. However, the study also shows that learning to use filler tokens is difficult and requires specific, dense supervision. The paper highlights the limitations of current LLMs in using filler tokens, noting that commercial models like Claude 2 and GPT-3.5 do not benefit from filler tokens on common benchmarks. However, the results suggest that with sufficient training data and computational resources, LLMs could potentially benefit from filler tokens. The study also raises concerns about the hidden computations performed by LLMs, which may not be reflected in the observed chain-of-thought tokens. Overall, the findings suggest that additional tokens can provide computational benefits independent of token choice, and that filler tokens may enable LLMs to solve problems that are not solvable with traditional CoT reasoning. The study provides empirical evidence that filler tokens can enhance the expressive power of transformers, but also highlights the challenges in learning to use them effectively.
Reach us at info@study.space