Understanding Spectral Filters%2C Dark Signals%2C and Attention Sinks

This paper explores the role of "dark signals" in transformer-based large language models (LLMs), particularly in the context of attention mechanisms and residual streams. The authors introduce the concept of *spectral filters* to analyze intermediate representations by partitioning the singular vectors of the vocabulary embedding and unembedding matrices into bands. They find that signals in the tail end of the spectrum are crucial for attention sinking, a phenomenon where the model allocates excess attention to a specific token ( typically the beginning of sentence token) to maintain global features while minimizing interference with the next token prediction. The study uses LLaMa2 models as a case study and demonstrates that suppressing significant parts of the embedding spectrum can be done while maintaining low loss as long as attention sinking is preserved. Additionally, they observe a positive correlation between the average attention received by a token and the prevalence of dark signals in its residual stream. The paper also discusses the implications of these findings for understanding and potentially improving the behavior of LLMs, suggesting that spectral compression could be a promising direction for future research.This paper explores the role of "dark signals" in transformer-based large language models (LLMs), particularly in the context of attention mechanisms and residual streams. The authors introduce the concept of *spectral filters* to analyze intermediate representations by partitioning the singular vectors of the vocabulary embedding and unembedding matrices into bands. They find that signals in the tail end of the spectrum are crucial for attention sinking, a phenomenon where the model allocates excess attention to a specific token ( typically the beginning of sentence token) to maintain global features while minimizing interference with the next token prediction. The study uses LLaMa2 models as a case study and demonstrates that suppressing significant parts of the embedding spectrum can be done while maintaining low loss as long as attention sinking is preserved. Additionally, they observe a positive correlation between the average attention received by a token and the prevalence of dark signals in its residual stream. The paper also discusses the implications of these findings for understanding and potentially improving the behavior of LLMs, suggesting that spectral compression could be a promising direction for future research.

Spectral Filters, Dark Signals, and Attention Sinks

February 15, 2024 | Nicola Cancedda