Locating and Editing Factual Associations in GPT

Locating and Editing Factual Associations in GPT

13 Jan 2023 | Kevin Meng*, David Bau*, Alex Andonian, Yonatan Belinkov†
This paper investigates how factual associations are stored and recalled in autoregressive transformer language models, specifically in GPT. The authors find evidence that these associations are stored in localized, directly-editable computations within the model. They develop a causal intervention method to identify neuron activations that are critical for factual predictions, revealing that middle-layer feed-forward modules play a key role in mediating factual predictions. To test their hypothesis, they use Rank-One Model Editing (ROME) to modify feed-forward weights and update specific factual associations. ROME is effective on a standard zero-shot relation extraction task and performs well on a new dataset of difficult counterfactual assertions, maintaining both specificity and generalization. The results suggest that mid-layer feed-forward modules are important for storing factual associations and that direct manipulation of computational mechanisms may be a feasible approach for model editing. The code, dataset, visualizations, and an interactive demo notebook are available at https://rome.baulab.info/. The paper also discusses the limitations of ROME, including its inability to edit multiple facts simultaneously and the potential for misuse in adding malicious information. The authors conclude that their findings provide insight into how facts are stored and recalled in large language models, and that ROME is a simple and principled method for model editing.This paper investigates how factual associations are stored and recalled in autoregressive transformer language models, specifically in GPT. The authors find evidence that these associations are stored in localized, directly-editable computations within the model. They develop a causal intervention method to identify neuron activations that are critical for factual predictions, revealing that middle-layer feed-forward modules play a key role in mediating factual predictions. To test their hypothesis, they use Rank-One Model Editing (ROME) to modify feed-forward weights and update specific factual associations. ROME is effective on a standard zero-shot relation extraction task and performs well on a new dataset of difficult counterfactual assertions, maintaining both specificity and generalization. The results suggest that mid-layer feed-forward modules are important for storing factual associations and that direct manipulation of computational mechanisms may be a feasible approach for model editing. The code, dataset, visualizations, and an interactive demo notebook are available at https://rome.baulab.info/. The paper also discusses the limitations of ROME, including its inability to edit multiple facts simultaneously and the potential for misuse in adding malicious information. The authors conclude that their findings provide insight into how facts are stored and recalled in large language models, and that ROME is a simple and principled method for model editing.
Reach us at info@study.space
Understanding Locating and Editing Factual Associations in GPT