3 May 2024 | Jingcheng Niu, Andrew Liu, Zining Zhu, Gerald Penn
The Knowledge Neuron (KN) Thesis posits that large language models (LLMs) recall facts from training data through multi-layer perceptron (MLP) weights, akin to key-value memory. This thesis suggests that factual knowledge is stored in the network and that modifying MLP modules can control factual generation. However, the authors argue that the KN thesis is an oversimplification. While KN-inspired methods like KN edit and ROME have shown some success in editing factual information, they fail to adequately explain the complex mechanisms behind factual expression. The authors demonstrate that these methods are limited in their ability to generalize and maintain robustness, especially when applied to syntactic phenomena.
They propose two new evaluation criteria: bijective symmetry and synonymous invariance, which assess whether edits maintain consistency across related facts and synonyms. The results show that existing methods are insufficient under these criteria. The authors also find that syntactic phenomena, such as determiner-noun agreement, can be localized to specific MLP neurons, similar to factual information. However, the effects of these edits are limited and do not fully align with the KN thesis's claims of knowledge storage.
The study highlights that the KN thesis oversimplifies the complex mechanisms underlying factual and syntactic processing in LLMs. The authors argue that knowledge is not stored in MLP weights but in complex "token expression patterns." They emphasize the need to explore the rich layer structures and attention mechanisms of recent models to better understand their underlying mechanics. The findings challenge the KN thesis and suggest that the formal and functional competencies of LLMs may be governed by different mechanisms. The study concludes that current model-editing methods are not robust enough to support the KN thesis and that further research is needed to understand the true mechanisms of knowledge representation in LLMs.The Knowledge Neuron (KN) Thesis posits that large language models (LLMs) recall facts from training data through multi-layer perceptron (MLP) weights, akin to key-value memory. This thesis suggests that factual knowledge is stored in the network and that modifying MLP modules can control factual generation. However, the authors argue that the KN thesis is an oversimplification. While KN-inspired methods like KN edit and ROME have shown some success in editing factual information, they fail to adequately explain the complex mechanisms behind factual expression. The authors demonstrate that these methods are limited in their ability to generalize and maintain robustness, especially when applied to syntactic phenomena.
They propose two new evaluation criteria: bijective symmetry and synonymous invariance, which assess whether edits maintain consistency across related facts and synonyms. The results show that existing methods are insufficient under these criteria. The authors also find that syntactic phenomena, such as determiner-noun agreement, can be localized to specific MLP neurons, similar to factual information. However, the effects of these edits are limited and do not fully align with the KN thesis's claims of knowledge storage.
The study highlights that the KN thesis oversimplifies the complex mechanisms underlying factual and syntactic processing in LLMs. The authors argue that knowledge is not stored in MLP weights but in complex "token expression patterns." They emphasize the need to explore the rich layer structures and attention mechanisms of recent models to better understand their underlying mechanics. The findings challenge the KN thesis and suggest that the formal and functional competencies of LLMs may be governed by different mechanisms. The study concludes that current model-editing methods are not robust enough to support the KN thesis and that further research is needed to understand the true mechanisms of knowledge representation in LLMs.