7 Jun 2024 | Hongyu Li, Liang Ding*, Meng Fang, Dacheng Tao
This paper addresses the issue of Catastrophic Forgetting (CF) in large language models (LLMs) during fine-tuning, which refers to the model's tendency to forget previously acquired knowledge when learning new data. The authors investigate the relationship between the flatness of the model's loss landscape (LLS) and the extent of CF, finding a strong correlation. They introduce Sharpness-Aware Minimization (SAM), an optimization technique that aims to flatten the LLS to mitigate CF. Experiments on three widely-used fine-tuning datasets with different model scales demonstrate the effectiveness of SAM in reducing CF. The method is shown to complement existing anti-forgetting strategies, enhancing the model's resistance to CF. The paper contributes to the literature by empirically revealing the direct link between LLS flatness and CF and presenting a novel optimization-based approach to address this issue.This paper addresses the issue of Catastrophic Forgetting (CF) in large language models (LLMs) during fine-tuning, which refers to the model's tendency to forget previously acquired knowledge when learning new data. The authors investigate the relationship between the flatness of the model's loss landscape (LLS) and the extent of CF, finding a strong correlation. They introduce Sharpness-Aware Minimization (SAM), an optimization technique that aims to flatten the LLS to mitigate CF. Experiments on three widely-used fine-tuning datasets with different model scales demonstrate the effectiveness of SAM in reducing CF. The method is shown to complement existing anti-forgetting strategies, enhancing the model's resistance to CF. The paper contributes to the literature by empirically revealing the direct link between LLS flatness and CF and presenting a novel optimization-based approach to address this issue.