2 Aug 2024 | Arnab Sen Sharma, David Atkinson, and David Bau
The paper investigates the mechanisms of factual recall in the Mamba state space model, a recurrent neural network (RNN) architecture. Inspired by previous findings in autoregressive transformer language models (LMs), which show localized patterns of internal computations for recalling facts, the authors explore whether Mamba exhibits similar localization. They conduct four lines of experiments:
1. **Causal Tracing and Interchange Interventions**: They apply causal tracing and interchange interventions to identify key components responsible for recalling facts. Results show that specific components within middle layers have strong causal effects at the last token of the subject, while interventions on later layers are most effective at the last token of the prompt.
2. **Rank-One Model Editing**: They demonstrate that rank-one model editing methods can successfully insert facts at specific locations, similar to findings in transformer LMs.
3. **Linearity of Factual Relations**: They examine the linearity of Mamba’s representations of factual relations, finding that many can be well approximated by a linear transformation.
4. **Attention Knockout Techniques**: They adapt attention-knockout techniques to Mamba to dissect information flow during factual recall, comparing it to a similar-sized autoregressive transformer LM.
The authors conclude that despite architectural differences, Mamba shares many similarities with autoregressive transformer LMs in terms of factual recall. They discuss the challenges and mismatches in applying methods from transformer LMs to Mamba due to architectural constraints. Overall, the paper provides insights into the internal mechanisms of Mamba and highlights the potential for adapting interpretability methods to different neural architectures.The paper investigates the mechanisms of factual recall in the Mamba state space model, a recurrent neural network (RNN) architecture. Inspired by previous findings in autoregressive transformer language models (LMs), which show localized patterns of internal computations for recalling facts, the authors explore whether Mamba exhibits similar localization. They conduct four lines of experiments:
1. **Causal Tracing and Interchange Interventions**: They apply causal tracing and interchange interventions to identify key components responsible for recalling facts. Results show that specific components within middle layers have strong causal effects at the last token of the subject, while interventions on later layers are most effective at the last token of the prompt.
2. **Rank-One Model Editing**: They demonstrate that rank-one model editing methods can successfully insert facts at specific locations, similar to findings in transformer LMs.
3. **Linearity of Factual Relations**: They examine the linearity of Mamba’s representations of factual relations, finding that many can be well approximated by a linear transformation.
4. **Attention Knockout Techniques**: They adapt attention-knockout techniques to Mamba to dissect information flow during factual recall, comparing it to a similar-sized autoregressive transformer LM.
The authors conclude that despite architectural differences, Mamba shares many similarities with autoregressive transformer LMs in terms of factual recall. They discuss the challenges and mismatches in applying methods from transformer LMs to Mamba due to architectural constraints. Overall, the paper provides insights into the internal mechanisms of Mamba and highlights the potential for adapting interpretability methods to different neural architectures.