[slides and audio] Information Flow Routes%3A Automatically Interpreting Language Models at Scale

The paper "Information Flow Routes: Automatically Interpreting Language Models at Scale" explores the internal mechanisms of large language models (LLMs) by representing information flow as graphs, where nodes are token representations and edges are computations. The authors propose an automated method to extract these graphs by tracing information flow through the network, leaving only the most important nodes and edges. This approach, based on attribution rather than activation patching, is more efficient and versatile, allowing for the analysis of any prediction and the comparison with contrastive examples. The method is applied to LLMs like Llama 2, revealing that certain attention heads, such as previous token heads and subword merging heads, are generally important. The study also finds that some model components are specialized for specific domains, such as coding or multilingual texts. Additionally, the paper discusses the role of attention heads in handling tokens of the same part of speech and the peculiar behavior of periods acting as BOS tokens. The contributions include a novel method for interpreting LLM predictions and insights into the general and domain-specific importance of model components.The paper "Information Flow Routes: Automatically Interpreting Language Models at Scale" explores the internal mechanisms of large language models (LLMs) by representing information flow as graphs, where nodes are token representations and edges are computations. The authors propose an automated method to extract these graphs by tracing information flow through the network, leaving only the most important nodes and edges. This approach, based on attribution rather than activation patching, is more efficient and versatile, allowing for the analysis of any prediction and the comparison with contrastive examples. The method is applied to LLMs like Llama 2, revealing that certain attention heads, such as previous token heads and subword merging heads, are generally important. The study also finds that some model components are specialized for specific domains, such as coding or multilingual texts. Additionally, the paper discusses the role of attention heads in handling tokens of the same part of speech and the peculiar behavior of periods acting as BOS tokens. The contributions include a novel method for interpreting LLM predictions and insights into the general and domain-specific importance of model components.

Information Flow Routes: Automatically Interpreting Language Models at Scale

16 Apr 2024 | Javier Ferrando * 1 Elena Voita 2