Beyond Prompt Learning: Continual Adapter for Efficient Rehearsal-Free Continual Learning

Beyond Prompt Learning: Continual Adapter for Efficient Rehearsal-Free Continual Learning

14 Jul 2024 | Xinyuan Gao, Songlin Dong, Yuhang He, Qiang Wang, and Yihong Gong
The paper introduces a novel approach called Continual Adapter (C-ADA) for Rehearsal-Free Continual Learning (RFCL), which aims to learn new knowledge without forgetting old knowledge and without storing any previous samples or prototypes. C-ADA consists of a parameter-extensible continual adapter layer (CAL) and a scaling and shifting (S\&S) module in parallel with the pre-trained model. The CAL allows the model to flexibly extend specific weights to learn new knowledge for each task while freezing old weights to preserve prior knowledge, thus avoiding errors from key-query matching. The S\&S module reduces the domain gap between the pre-training dataset and downstream datasets by transferring the feature space. Additionally, an orthogonal loss is introduced to mitigate the interaction between old and new knowledge, enhancing performance and training speed. C-ADA achieves significant improvements over existing methods in both performance and efficiency, and outperforms the state-of-the-art in domain-incremental learning settings. The approach is shown to be effective in various continual learning scenarios, demonstrating its generality and robustness.The paper introduces a novel approach called Continual Adapter (C-ADA) for Rehearsal-Free Continual Learning (RFCL), which aims to learn new knowledge without forgetting old knowledge and without storing any previous samples or prototypes. C-ADA consists of a parameter-extensible continual adapter layer (CAL) and a scaling and shifting (S\&S) module in parallel with the pre-trained model. The CAL allows the model to flexibly extend specific weights to learn new knowledge for each task while freezing old weights to preserve prior knowledge, thus avoiding errors from key-query matching. The S\&S module reduces the domain gap between the pre-training dataset and downstream datasets by transferring the feature space. Additionally, an orthogonal loss is introduced to mitigate the interaction between old and new knowledge, enhancing performance and training speed. C-ADA achieves significant improvements over existing methods in both performance and efficiency, and outperforms the state-of-the-art in domain-incremental learning settings. The approach is shown to be effective in various continual learning scenarios, demonstrating its generality and robustness.
Reach us at info@study.space