Summing Up The Facts: Additive Mechanisms Behind Factual Recall in LLMs

Summing Up The Facts: Additive Mechanisms Behind Factual Recall in LLMs

11 Feb 2024 | Bilal Chughtai, Alan Cooney, Neel Nanda
This paper investigates how large language models (LLMs) store and retrieve factual knowledge, focusing on factual recall tasks where models must explicitly surface stored facts in prompts like "Fact: The Colosseum is in the country of." The study reveals that factual recall involves multiple distinct, independent, and qualitatively different mechanisms that additively combine to constructively interfere on the correct answer. These mechanisms, termed the additive motif, involve models summing contributions from various components to arrive at the correct output. The research identifies four primary mechanisms: Subject Heads, Relation Heads, Mixed Heads, and MLPs. Subject Heads focus on extracting attributes related to the subject, while Relation Heads focus on attributes related to the relation. Mixed Heads combine both, and MLPs enhance attributes in the set R. Each mechanism contributes independently to the correct answer, but their combined effect is significantly more robust. The study extends the technique of direct logit attribution (DLA) to attribute attention head outputs to individual source tokens, enabling the disentanglement of contributions from different source tokens. This technique helps identify 'mixed heads,' which combine contributions from both subject and relation tokens. The findings suggest that LLMs use additive mechanisms for factual recall, where multiple independent components contribute to the final output. This approach allows models to handle complex tasks by combining contributions from various sources, leading to more robust and accurate results. The study also highlights the limitations of narrow circuit analysis and emphasizes the importance of considering multiple sources of information in mechanistic interpretability. The results contribute to the understanding of how LLMs process and retrieve factual knowledge, offering insights into the underlying mechanisms that enable their performance.This paper investigates how large language models (LLMs) store and retrieve factual knowledge, focusing on factual recall tasks where models must explicitly surface stored facts in prompts like "Fact: The Colosseum is in the country of." The study reveals that factual recall involves multiple distinct, independent, and qualitatively different mechanisms that additively combine to constructively interfere on the correct answer. These mechanisms, termed the additive motif, involve models summing contributions from various components to arrive at the correct output. The research identifies four primary mechanisms: Subject Heads, Relation Heads, Mixed Heads, and MLPs. Subject Heads focus on extracting attributes related to the subject, while Relation Heads focus on attributes related to the relation. Mixed Heads combine both, and MLPs enhance attributes in the set R. Each mechanism contributes independently to the correct answer, but their combined effect is significantly more robust. The study extends the technique of direct logit attribution (DLA) to attribute attention head outputs to individual source tokens, enabling the disentanglement of contributions from different source tokens. This technique helps identify 'mixed heads,' which combine contributions from both subject and relation tokens. The findings suggest that LLMs use additive mechanisms for factual recall, where multiple independent components contribute to the final output. This approach allows models to handle complex tasks by combining contributions from various sources, leading to more robust and accurate results. The study also highlights the limitations of narrow circuit analysis and emphasizes the importance of considering multiple sources of information in mechanistic interpretability. The results contribute to the understanding of how LLMs process and retrieve factual knowledge, offering insights into the underlying mechanisms that enable their performance.
Reach us at info@study.space
[slides] Summing Up the Facts%3A Additive Mechanisms Behind Factual Recall in LLMs | StudySpace