Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models

Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models

19 Feb 2024 | Didi Zhu, Zhongyi Sun, Zexi Li, Tao Shen, Ke Yan, Shouhong Ding, Kun Kuang, Chao Wu
This paper presents Model Tailor, a parameter-efficient post-training method to mitigate catastrophic forgetting in multi-modal large language models (MLLMs). Catastrophic forgetting occurs when fine-tuning MLLMs on new tasks leads to a significant drop in performance on previously learned tasks. Model Tailor preserves most of the pre-trained parameters while selectively modifying a small fraction (≤10%) of fine-tuned parameters to maintain performance on original tasks and improve performance on new tasks. The method identifies a "model patch" through a fusion of salience and sensitivity analysis, then applies a compensation mechanism to enhance the model's performance on both target and original tasks. It is adaptable to multi-task scenarios and has been tested on InstructBLIP and LLaVA-1.5 in image captioning and visual question answering tasks, demonstrating significant task adaptability while preserving pre-trained capabilities. The method is based on the Lottery Ticket Hypothesis and Optimal Brain Surgeon theories, using a sparse mask to identify critical parameters and a compensation mechanism to adjust them. Model Tailor outperforms standard fine-tuning and other forgetting mitigation methods, achieving high performance on both original and new tasks. It also synergizes with parameter-efficient methods like LoRA, enhancing their effectiveness. The method is validated through extensive experiments, showing its effectiveness in reducing catastrophic forgetting and improving model performance.This paper presents Model Tailor, a parameter-efficient post-training method to mitigate catastrophic forgetting in multi-modal large language models (MLLMs). Catastrophic forgetting occurs when fine-tuning MLLMs on new tasks leads to a significant drop in performance on previously learned tasks. Model Tailor preserves most of the pre-trained parameters while selectively modifying a small fraction (≤10%) of fine-tuned parameters to maintain performance on original tasks and improve performance on new tasks. The method identifies a "model patch" through a fusion of salience and sensitivity analysis, then applies a compensation mechanism to enhance the model's performance on both target and original tasks. It is adaptable to multi-task scenarios and has been tested on InstructBLIP and LLaVA-1.5 in image captioning and visual question answering tasks, demonstrating significant task adaptability while preserving pre-trained capabilities. The method is based on the Lottery Ticket Hypothesis and Optimal Brain Surgeon theories, using a sparse mask to identify critical parameters and a compensation mechanism to adjust them. Model Tailor outperforms standard fine-tuning and other forgetting mitigation methods, achieving high performance on both original and new tasks. It also synergizes with parameter-efficient methods like LoRA, enhancing their effectiveness. The method is validated through extensive experiments, showing its effectiveness in reducing catastrophic forgetting and improving model performance.
Reach us at info@study.space
Understanding Model Tailor%3A Mitigating Catastrophic Forgetting in Multi-modal Large Language Models