This paper introduces a novel approach to analyzing transformer-based large language models (LLMs) by extending the logit lens to logit spectroscopy, which involves spectral filtering of intermediate representations. The authors investigate the role of "dark signals" in the model, which are signals in the tail end of the spectrum that are not captured by the logit lens. They find that these dark signals are crucial for attention sinking, a phenomenon where the beginning of sentence (BoS) token receives an disproportionate amount of attention. The authors also show that dark signals are essential for maintaining low loss in pretrained models even when significant portions of the unembedding spectrum are suppressed. They further demonstrate that tokens receiving high attention have a higher prevalence of dark signals in their residual streams. The paper also introduces sink-preserving spectral filters that allow for the preservation of attention sinking while filtering out other parts of the spectrum. The results show that these filters can maintain low negative log-likelihood (NLL) even when a significant portion of the spectrum is removed. The authors also explore the relationship between attention bars, dark signals, and the residual stream of the BoS token, finding that attention bars may occupy a different subspace within the dark subspace but may still function as attention sinks. The study highlights the importance of understanding the role of dark signals in LLMs and suggests that spectral compression could be a promising direction for future research. The paper also discusses the limitations of the study, including the focus on the LLaMa2 family of models and the use of limited text samples in the experiments. The authors conclude that a better understanding of the inner workings of transformer models is essential for making them safer and more reliable.This paper introduces a novel approach to analyzing transformer-based large language models (LLMs) by extending the logit lens to logit spectroscopy, which involves spectral filtering of intermediate representations. The authors investigate the role of "dark signals" in the model, which are signals in the tail end of the spectrum that are not captured by the logit lens. They find that these dark signals are crucial for attention sinking, a phenomenon where the beginning of sentence (BoS) token receives an disproportionate amount of attention. The authors also show that dark signals are essential for maintaining low loss in pretrained models even when significant portions of the unembedding spectrum are suppressed. They further demonstrate that tokens receiving high attention have a higher prevalence of dark signals in their residual streams. The paper also introduces sink-preserving spectral filters that allow for the preservation of attention sinking while filtering out other parts of the spectrum. The results show that these filters can maintain low negative log-likelihood (NLL) even when a significant portion of the spectrum is removed. The authors also explore the relationship between attention bars, dark signals, and the residual stream of the BoS token, finding that attention bars may occupy a different subspace within the dark subspace but may still function as attention sinks. The study highlights the importance of understanding the role of dark signals in LLMs and suggests that spectral compression could be a promising direction for future research. The paper also discusses the limitations of the study, including the focus on the LLaMa2 family of models and the use of limited text samples in the experiments. The authors conclude that a better understanding of the inner workings of transformer models is essential for making them safer and more reliable.