[slides and audio] Class-Incremental Learning with CLIP%3A Adaptive Representation Adjustment and Parameter Fusion

This paper proposes a method called Adaptive Representation Adjustment and Parameter Fusion (RAPF) for class-incremental learning, which aims to reduce forgetting in vision-language models like CLIP. The method uses textual features to adjust the representations of old classes affected by new classes and employs a decomposed parameter fusion strategy to further mitigate forgetting during adapter module fine-tuning. The approach measures the influence of new classes on old ones and adjusts the representations using textual features. After training, a decomposed parameter fusion strategy is applied to reduce forgetting. Experiments on several conventional benchmarks show that RAPF achieves state-of-the-art results. The method is effective in reducing forgetting and maintaining model stability by leveraging textual information and parameter fusion. The main contributions include using textual features to enhance classification ability in class-incremental learning, proposing a simple but effective decomposed parameter fusion method for the linear layer adapter of pre-trained models, and achieving state-of-the-art results on several datasets. The method is evaluated on multiple datasets, including CIFAR100, ImageNet100, ImageNet-R, and CUB200, and shows significant improvements in accuracy compared to existing methods. The results demonstrate that RAPF effectively reduces forgetting and maintains model stability in class-incremental learning.This paper proposes a method called Adaptive Representation Adjustment and Parameter Fusion (RAPF) for class-incremental learning, which aims to reduce forgetting in vision-language models like CLIP. The method uses textual features to adjust the representations of old classes affected by new classes and employs a decomposed parameter fusion strategy to further mitigate forgetting during adapter module fine-tuning. The approach measures the influence of new classes on old ones and adjusts the representations using textual features. After training, a decomposed parameter fusion strategy is applied to reduce forgetting. Experiments on several conventional benchmarks show that RAPF achieves state-of-the-art results. The method is effective in reducing forgetting and maintaining model stability by leveraging textual information and parameter fusion. The main contributions include using textual features to enhance classification ability in class-incremental learning, proposing a simple but effective decomposed parameter fusion method for the linear layer adapter of pre-trained models, and achieving state-of-the-art results on several datasets. The method is evaluated on multiple datasets, including CIFAR100, ImageNet100, ImageNet-R, and CUB200, and shows significant improvements in accuracy compared to existing methods. The results demonstrate that RAPF effectively reduces forgetting and maintains model stability in class-incremental learning.

Class-Incremental Learning with CLIP: Adaptive Representation Adjustment and Parameter Fusion

19 Jul 2024 | Linlan Huang, Xusheng Cao, Haori Lu, and Xialei Liu