[slides] Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters

The paper presents a parameter-efficient framework for enhancing the continual learning capabilities of vision-language models. The approach involves dynamically expanding a pre-trained CLIP model by integrating Mixture-of-Experts (MoE) adapters to address new tasks. To preserve zero-shot recognition capability, a Distribution Discriminative Auto-Selector (DDAS) is introduced to automatically route in-distribution and out-of-distribution inputs to the MoE adapters and the original CLIP, respectively. Extensive experiments across various settings demonstrate the effectiveness of the proposed method, showing significant improvements over previous state-of-the-art approaches while reducing parameter training burdens by 60%. The method is particularly effective in few-shot continual learning, outperforming competitors by 3.6%, 7.0%, and 4.2% in a 5-shot setting. The contributions include a parameter-efficient training framework, an incremental activate-freeze strategy, and a Distribution Discriminative Auto-Selector for automated substream assignment.The paper presents a parameter-efficient framework for enhancing the continual learning capabilities of vision-language models. The approach involves dynamically expanding a pre-trained CLIP model by integrating Mixture-of-Experts (MoE) adapters to address new tasks. To preserve zero-shot recognition capability, a Distribution Discriminative Auto-Selector (DDAS) is introduced to automatically route in-distribution and out-of-distribution inputs to the MoE adapters and the original CLIP, respectively. Extensive experiments across various settings demonstrate the effectiveness of the proposed method, showing significant improvements over previous state-of-the-art approaches while reducing parameter training burdens by 60%. The method is particularly effective in few-shot continual learning, outperforming competitors by 3.6%, 7.0%, and 4.2% in a 5-shot setting. The contributions include a parameter-efficient training framework, an incremental activate-freeze strategy, and a Distribution Discriminative Auto-Selector for automated substream assignment.

Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters

3 Jun 2024 | Jiazuo Yu, Yunzhi Zhuge, Lu Zhang, Ping Hu, Dong Wang, Huchuan Lu, You He