Locating and Editing Factual Associations in Mamba

Locating and Editing Factual Associations in Mamba

2024 | Arnab Sen Sharma, David Atkinson, and David Bau
This paper investigates the mechanisms of factual recall in the Mamba state space model, a recent recurrent neural network that achieves performance competitive with transformers. The study is inspired by findings in autoregressive transformer language models, which suggest that their knowledge recall is localized to specific modules at particular token positions. The authors ask whether Mamba exhibits similar localization in factual recall. To investigate this, the authors conduct four experiments on Mamba. First, they apply causal tracing or interchange interventions to localize key components responsible for recalling facts, revealing that specific components in middle layers show strong causal effects at the last token of the subject, while the causal effect of intervening on later layers is most pronounced at the last token of the prompt, matching previous findings on autoregressive transformers. Second, they show that rank-one model editing methods can successfully insert facts at specific locations, again resembling findings on transformer LMs. Third, they examine the linearity of Mamba's representations of factual relations. Finally, they adapt attention-knockout techniques to Mamba to dissect information flow during factual recall. The authors compare Mamba directly to a similar-sized autoregressive transformer LM and conclude that despite significant differences in architectural approach, when it comes to factual recall, the two architectures share many similarities. The study finds that Mamba exhibits localized factual recall, with specific components in middle layers showing strong causal effects at the last token of the subject, and the causal effect of intervening on later layers is most pronounced at the last token of the prompt. They also find that rank-one model editing can successfully insert facts at specific locations, and that the linearity of Mamba's representations of factual relations is similar to that of transformers. Additionally, they find that the information flow during factual recall in Mamba is similar to that in transformers. The study also explores the effectiveness of ROME (Rank One Model Editing) in editing factual associations in Mamba. They find that ROME can achieve high scores for a range of early to middle layers by modifying any one of the projection matrices, matching observations made by Hase et al. (2024) regarding transformer LMs. However, they find that performance does depend on the location of the edit. They also find that edits to W_a have poor generalization in early layers, whereas high PS can be achieved at early layers by modifying either W_g or W_o. The study also explores the linearity of relation embedding (LRE) in Mamba. They find that only for 10 out of 26 factual relations can a linear LRE achieve more than 50% faithfulness. For comparison, in the same sized Pythia-2.8b LRE achieves >50% faithfulness for 11 factual relations. Finally, the study explores attention knock-out in Mamba. They find that performing similar experiments in Mamba can be difficult. However, they find that the factual information flow in Mamba is similar to what GeThis paper investigates the mechanisms of factual recall in the Mamba state space model, a recent recurrent neural network that achieves performance competitive with transformers. The study is inspired by findings in autoregressive transformer language models, which suggest that their knowledge recall is localized to specific modules at particular token positions. The authors ask whether Mamba exhibits similar localization in factual recall. To investigate this, the authors conduct four experiments on Mamba. First, they apply causal tracing or interchange interventions to localize key components responsible for recalling facts, revealing that specific components in middle layers show strong causal effects at the last token of the subject, while the causal effect of intervening on later layers is most pronounced at the last token of the prompt, matching previous findings on autoregressive transformers. Second, they show that rank-one model editing methods can successfully insert facts at specific locations, again resembling findings on transformer LMs. Third, they examine the linearity of Mamba's representations of factual relations. Finally, they adapt attention-knockout techniques to Mamba to dissect information flow during factual recall. The authors compare Mamba directly to a similar-sized autoregressive transformer LM and conclude that despite significant differences in architectural approach, when it comes to factual recall, the two architectures share many similarities. The study finds that Mamba exhibits localized factual recall, with specific components in middle layers showing strong causal effects at the last token of the subject, and the causal effect of intervening on later layers is most pronounced at the last token of the prompt. They also find that rank-one model editing can successfully insert facts at specific locations, and that the linearity of Mamba's representations of factual relations is similar to that of transformers. Additionally, they find that the information flow during factual recall in Mamba is similar to that in transformers. The study also explores the effectiveness of ROME (Rank One Model Editing) in editing factual associations in Mamba. They find that ROME can achieve high scores for a range of early to middle layers by modifying any one of the projection matrices, matching observations made by Hase et al. (2024) regarding transformer LMs. However, they find that performance does depend on the location of the edit. They also find that edits to W_a have poor generalization in early layers, whereas high PS can be achieved at early layers by modifying either W_g or W_o. The study also explores the linearity of relation embedding (LRE) in Mamba. They find that only for 10 out of 26 factual relations can a linear LRE achieve more than 50% faithfulness. For comparison, in the same sized Pythia-2.8b LRE achieves >50% faithfulness for 11 factual relations. Finally, the study explores attention knock-out in Mamba. They find that performing similar experiments in Mamba can be difficult. However, they find that the factual information flow in Mamba is similar to what Ge
Reach us at info@study.space
Understanding Locating and Editing Factual Associations in Mamba