[slides and audio] Do language models plan ahead for future tokens%3F

The paper investigates whether transformers "think ahead" during inference by examining two hypotheses: *pre-caching* and *breadcrumbs*. Pre-caching suggests that transformers compute features at a given time step that are useful for future steps, while breadcrumbs implies that the most relevant features for the current step are also beneficial for future steps. To test these hypotheses, the authors propose a *myopic training* scheme, which does not propagate gradients from past timesteps to hidden states. In synthetic data experiments, they find clear evidence for pre-caching. In natural language modeling experiments, the results suggest that the breadcrumbs hypothesis is more plausible, though pre-caching increases with model scale. The study concludes that smaller models do not intentionally prepare information for the future, while larger models exhibit more significant pre-caching. The findings have implications for understanding the mechanisms behind future token prediction in transformers and potential applications in safety and interpretability.The paper investigates whether transformers "think ahead" during inference by examining two hypotheses: *pre-caching* and *breadcrumbs*. Pre-caching suggests that transformers compute features at a given time step that are useful for future steps, while breadcrumbs implies that the most relevant features for the current step are also beneficial for future steps. To test these hypotheses, the authors propose a *myopic training* scheme, which does not propagate gradients from past timesteps to hidden states. In synthetic data experiments, they find clear evidence for pre-caching. In natural language modeling experiments, the results suggest that the breadcrumbs hypothesis is more plausible, though pre-caching increases with model scale. The study concludes that smaller models do not intentionally prepare information for the future, while larger models exhibit more significant pre-caching. The findings have implications for understanding the mechanisms behind future token prediction in transformers and potential applications in safety and interpretability.

Do Language Models Plan Ahead for Future Tokens?

1 Aug 2024 | Wilson Wu, John X. Morris, Lionel Levine