19 Feb 2024 | Didi Zhu, Zhongyi Sun, Zexi Li, Tao Shen, Ke Yan, Shouhong Ding, Kun Kuang, Chao Wu
This paper addresses the critical challenge of catastrophic forgetting in multi-modal large language models (MLLMs), where fine-tuning on new tasks significantly degrades performance on original tasks. The authors introduce Model Tailor, a parameter-efficient post-training method that preserves pre-trained parameters while selectively fine-tuning a small subset (≤ 10%) of parameters to enhance performance on new tasks without compromising original task performance. Model Tailor uses a sparse mask to identify the "model patch" based on a fusion strategy combining salience and sensitivity analysis. A compensation mechanism, called "patch decoration," is then applied to these selected parameters to mitigate performance loss on new tasks. Extensive experiments on InstructBLIP and LLaVA-1.5 datasets in image captioning and visual question answering tasks demonstrate the effectiveness of Model Tailor, showing significant task adaptability and preservation of pre-trained capabilities. The method is also adaptable to multi-task scenarios, further enhancing its practical utility.This paper addresses the critical challenge of catastrophic forgetting in multi-modal large language models (MLLMs), where fine-tuning on new tasks significantly degrades performance on original tasks. The authors introduce Model Tailor, a parameter-efficient post-training method that preserves pre-trained parameters while selectively fine-tuning a small subset (≤ 10%) of parameters to enhance performance on new tasks without compromising original task performance. Model Tailor uses a sparse mask to identify the "model patch" based on a fusion strategy combining salience and sensitivity analysis. A compensation mechanism, called "patch decoration," is then applied to these selected parameters to mitigate performance loss on new tasks. Extensive experiments on InstructBLIP and LLaVA-1.5 datasets in image captioning and visual question answering tasks demonstrate the effectiveness of Model Tailor, showing significant task adaptability and preservation of pre-trained capabilities. The method is also adaptable to multi-task scenarios, further enhancing its practical utility.