Talking Heads: Understanding Inter-layer Communication in Transformer Language Models

Talking Heads: Understanding Inter-layer Communication in Transformer Language Models

13 Jun 2024 | Jack Merullo, Carsten Eickhoff, Ellie Pavlick
This paper investigates how information is passed between layers in transformer language models (LMs), focusing on a mechanism that allows models to recall items from a list. The authors find that LMs use low-rank subspaces of the residual stream to represent and route information between layers, forming communication channels. By decomposing attention head weight matrices using Singular Value Decomposition (SVD), they show that interactions between heads separated by one or more layers can be predicted. They demonstrate that this mechanism explains the model's sensitivity to the order of items in a prompt, and that manipulating these communication channels can improve performance on a synthetic "Laundry List" task, where the model must recall items from a list. The analysis reveals an intricate, interpretable structure learned during pretraining, which helps explain why sophisticated LMs sometimes fail in simple domains. The findings suggest that neural networks can learn complex, content-independent representations from self-supervised pretraining, which has implications for understanding and controlling these models. The study also shows that communication channels carry interpretable signals that can be manipulated to improve model behavior. The results indicate that these channels are crucial for tasks involving token indexing, and that the model's ability to handle such tasks is limited by the number of objects in the list. The paper also explores the role of inhibition heads in controlling which tokens are attended to, and how this affects the model's ability to recall items from a list. The findings have implications for understanding how LMs process and represent information, and for developing methods to improve their performance in various tasks.This paper investigates how information is passed between layers in transformer language models (LMs), focusing on a mechanism that allows models to recall items from a list. The authors find that LMs use low-rank subspaces of the residual stream to represent and route information between layers, forming communication channels. By decomposing attention head weight matrices using Singular Value Decomposition (SVD), they show that interactions between heads separated by one or more layers can be predicted. They demonstrate that this mechanism explains the model's sensitivity to the order of items in a prompt, and that manipulating these communication channels can improve performance on a synthetic "Laundry List" task, where the model must recall items from a list. The analysis reveals an intricate, interpretable structure learned during pretraining, which helps explain why sophisticated LMs sometimes fail in simple domains. The findings suggest that neural networks can learn complex, content-independent representations from self-supervised pretraining, which has implications for understanding and controlling these models. The study also shows that communication channels carry interpretable signals that can be manipulated to improve model behavior. The results indicate that these channels are crucial for tasks involving token indexing, and that the model's ability to handle such tasks is limited by the number of objects in the list. The paper also explores the role of inhibition heads in controlling which tokens are attended to, and how this affects the model's ability to recall items from a list. The findings have implications for understanding how LMs process and represent information, and for developing methods to improve their performance in various tasks.
Reach us at info@study.space
Understanding Talking Heads%3A Understanding Inter-layer Communication in Transformer Language Models