[slides] Model Editing Harms General Abilities of Large Language Models%3A Regularization to the Rescue

The paper "Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue" addresses the issue of model editing, a technique used to update large language models (LLMs) with new knowledge to reduce hallucinations without retraining. While current model editing methods can effectively modify specific behaviors, they often overlook the potential unintended side effects on the general abilities of LLMs, such as reasoning, natural language inference, and question answering. The authors systematically analyze these side effects by evaluating four popular editing methods on three LLMs across eight representative tasks. Their extensive empirical experiments show that current editing methods struggle to improve factuality while maintaining general abilities. The analysis reveals that the side effects are caused by excessive alteration of original model weights, leading to overfitting to the edited facts. To mitigate this, the authors propose a regularization method named RECT (RElative Change in weightT), which prevents overfitting by discouraging overly complex editing updates. Evaluation results demonstrate that RECT can significantly reduce the side effects while maintaining over 94% editing performance. The paper highlights the need for a balanced approach in model editing to ensure both factuality and general abilities, calling for further research on trustworthy and robust model editing techniques.The paper "Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue" addresses the issue of model editing, a technique used to update large language models (LLMs) with new knowledge to reduce hallucinations without retraining. While current model editing methods can effectively modify specific behaviors, they often overlook the potential unintended side effects on the general abilities of LLMs, such as reasoning, natural language inference, and question answering. The authors systematically analyze these side effects by evaluating four popular editing methods on three LLMs across eight representative tasks. Their extensive empirical experiments show that current editing methods struggle to improve factuality while maintaining general abilities. The analysis reveals that the side effects are caused by excessive alteration of original model weights, leading to overfitting to the edited facts. To mitigate this, the authors propose a regularization method named RECT (RElative Change in weightT), which prevents overfitting by discouraging overly complex editing updates. Evaluation results demonstrate that RECT can significantly reduce the side effects while maintaining over 94% editing performance. The paper highlights the need for a balanced approach in model editing to ensure both factuality and general abilities, calling for further research on trustworthy and robust model editing techniques.

Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue

16 Jun 2024 | Jia-Chen Gu * 1 Hao-Xiang Xu * 2 Jun-Yu Ma 2 Pan Lu 1 Zhen-Hua Ling 2 Kai-Wei Chang 1 Nanyun Peng 1