Beyond Prompt Learning: Continual Adapter for Efficient Rehearsal-Free Continual Learning

Beyond Prompt Learning: Continual Adapter for Efficient Rehearsal-Free Continual Learning

14 Jul 2024 | Xinyuan Gao, Songlin Dong, Yuhang He, Qiang Wang, Yihong Gong
This paper proposes a novel approach called Continual Adapter (C-ADA) for Rehearsal-Free Continual Learning (RFCL). C-ADA introduces a parameter-extensible continual adapter layer (CAL) and a scaling and shifting (S&S) module in parallel with the pre-trained model. The CAL allows the model to flexibly extend specific weights to learn new knowledge for each task while freezing old weights to preserve prior knowledge, thereby avoiding errors from key-query matching. The S&S module reduces the domain gap between pre-trained datasets and downstream datasets by transferring the feature space. Additionally, an orthogonal loss is introduced to mitigate the interaction between old and new knowledge, enhancing the model's ability to retain previous knowledge. C-ADA achieves significantly improved performance and training speed, outperforming the state-of-the-art methods. Experiments on domain-incremental learning show that C-ADA surpasses the SOTA, demonstrating its generality across different settings. The key contributions include a simple yet effective C-ADA approach, a novel CAL module, an S&S module to reduce domain gap, and an orthogonal loss to preserve old knowledge. The method is parameter-efficient and privacy-preserving, making it suitable for real-world applications.This paper proposes a novel approach called Continual Adapter (C-ADA) for Rehearsal-Free Continual Learning (RFCL). C-ADA introduces a parameter-extensible continual adapter layer (CAL) and a scaling and shifting (S&S) module in parallel with the pre-trained model. The CAL allows the model to flexibly extend specific weights to learn new knowledge for each task while freezing old weights to preserve prior knowledge, thereby avoiding errors from key-query matching. The S&S module reduces the domain gap between pre-trained datasets and downstream datasets by transferring the feature space. Additionally, an orthogonal loss is introduced to mitigate the interaction between old and new knowledge, enhancing the model's ability to retain previous knowledge. C-ADA achieves significantly improved performance and training speed, outperforming the state-of-the-art methods. Experiments on domain-incremental learning show that C-ADA surpasses the SOTA, demonstrating its generality across different settings. The key contributions include a simple yet effective C-ADA approach, a novel CAL module, an S&S module to reduce domain gap, and an orthogonal loss to preserve old knowledge. The method is parameter-efficient and privacy-preserving, making it suitable for real-world applications.
Reach us at info@study.space