11 Feb 2024 | Bilal Chughtai, Alan Cooney, Neel Nanda
This paper explores the mechanisms behind factual recall in large language models (LLMs), focusing on the task of retrieving stored facts from prompts. The authors find that factual recall is more complex than previously thought, involving multiple independent and distinct mechanisms that additively combine to constructively interfere on the correct answer. They term this phenomenon the "additive motif," where each mechanism contributes positively to the correct answer, even if individually insufficient. The study identifies four specific mechanisms: Subject Heads, Relation Heads, Mixed Heads, and MLPs, each contributing to the final output in a unique way. The authors also extend the direct logit attribution (DLA) technique to attribute individual attention head outputs to specific source tokens, revealing the additive contributions of these heads. The paper highlights the limitations of narrow circuit analysis and provides insights into the mechanisms underlying factual recall, contributing to the growing literature on interpretability and understanding LLMs.This paper explores the mechanisms behind factual recall in large language models (LLMs), focusing on the task of retrieving stored facts from prompts. The authors find that factual recall is more complex than previously thought, involving multiple independent and distinct mechanisms that additively combine to constructively interfere on the correct answer. They term this phenomenon the "additive motif," where each mechanism contributes positively to the correct answer, even if individually insufficient. The study identifies four specific mechanisms: Subject Heads, Relation Heads, Mixed Heads, and MLPs, each contributing to the final output in a unique way. The authors also extend the direct logit attribution (DLA) technique to attribute individual attention head outputs to specific source tokens, revealing the additive contributions of these heads. The paper highlights the limitations of narrow circuit analysis and provides insights into the mechanisms underlying factual recall, contributing to the growing literature on interpretability and understanding LLMs.