16 Jun 2024 | Jia-Chen Gu, Hao-Xiang Xu, Jun-Yu Ma, Pan Lu, Zhen-Hua Ling, Kai-Wei Chang, Nanyun Peng
This paper investigates the potential negative side effects of model editing on the general abilities of large language models (LLMs). While model editing is a technique that allows for updating knowledge in LLMs without retraining, it can lead to significant degradation in the models' general capabilities such as reasoning, natural language inference, and question answering. The study evaluates four popular editing methods on three LLMs across eight representative tasks and finds that current methods struggle to simultaneously improve factuality and maintain general abilities. The analysis reveals that the side effects are caused by excessive changes to the original model weights, leading to overfitting to the edited facts. To address this issue, the authors propose a regularization method called RECT (RElative Change in weighT) that helps mitigate the side effects while maintaining high editing performance. The results show that RECT significantly reduces the negative impact of editing on the general abilities of LLMs, with edited models retaining over 94% of their editing performance. The paper highlights the importance of balancing factuality improvements with the preservation of general abilities in model editing and calls for further research into robust and trustworthy model editing techniques.This paper investigates the potential negative side effects of model editing on the general abilities of large language models (LLMs). While model editing is a technique that allows for updating knowledge in LLMs without retraining, it can lead to significant degradation in the models' general capabilities such as reasoning, natural language inference, and question answering. The study evaluates four popular editing methods on three LLMs across eight representative tasks and finds that current methods struggle to simultaneously improve factuality and maintain general abilities. The analysis reveals that the side effects are caused by excessive changes to the original model weights, leading to overfitting to the edited facts. To address this issue, the authors propose a regularization method called RECT (RElative Change in weighT) that helps mitigate the side effects while maintaining high editing performance. The results show that RECT significantly reduces the negative impact of editing on the general abilities of LLMs, with edited models retaining over 94% of their editing performance. The paper highlights the importance of balancing factuality improvements with the preservation of general abilities in model editing and calls for further research into robust and trustworthy model editing techniques.