Rehearsal-Free Modular and Compositional Continual Learning for Language Models

Rehearsal-Free Modular and Compositional Continual Learning for Language Models

31 Mar 2024 | Mingyang Wang, Heike Adel, Lukas Lange, Jannik Strötgen, Hinrich Schütze
This paper introduces MoCL, a rehearsal-free modular and compositional continual learning framework for language models that effectively addresses catastrophic forgetting and enhances knowledge transfer. MoCL avoids storing additional data and facilitates knowledge transfer via module composition. It allocates task-specific parameters using parameter-efficient fine-tuning (PEFT) modules. During training, MoCL continually adds new task-specific modules to language models. Once training on a task is complete, the corresponding PEFT parameters are frozen to prevent catastrophic forgetting. MoCL also enables knowledge transfer across tasks by composing existing and new modules based on task matching weights while learning the new task. Experiments on various benchmarks show that MoCL outperforms state-of-the-art methods in task-incremental learning settings, where task identities are available during testing. It also demonstrates strong abilities to transfer knowledge from previous tasks to new tasks. MoCL's task matching strategy enables task composition during testing, effectively addressing the continual learning problem in the challenging class-incremental setting where task identities are not provided during testing. The code base for MoCL is available online. MoCL falls into the category of parameter isolation-based continual learning, i.e., we allocate task-specific parameters to avoid knowledge interference. In contrast to related work, we additionally encourage knowledge transfer considering the relatedness across tasks. MoCL is evaluated on near-domain and far-domain continual learning benchmarks, showing superior performance compared to existing methods. The results indicate that MoCL is effective in transferring knowledge across tasks, even when tasks are dissimilar. The framework is applicable to a wide range of tasks, but further experiments with other types of NLP tasks, especially generative tasks, are left for future work. MoCL's task matching weights distribution is sparse, suggesting that the number of task modules could be reduced. This could be an interesting direction for future work. The paper also discusses the limitations of the work, including the scope of evaluation and the computational and storage costs associated with continually initializing new PEFT modules for each task. The authors thank the anonymous reviewers for their constructive feedback and acknowledge the support from the Deutsche Forschungsgemeinschaft.This paper introduces MoCL, a rehearsal-free modular and compositional continual learning framework for language models that effectively addresses catastrophic forgetting and enhances knowledge transfer. MoCL avoids storing additional data and facilitates knowledge transfer via module composition. It allocates task-specific parameters using parameter-efficient fine-tuning (PEFT) modules. During training, MoCL continually adds new task-specific modules to language models. Once training on a task is complete, the corresponding PEFT parameters are frozen to prevent catastrophic forgetting. MoCL also enables knowledge transfer across tasks by composing existing and new modules based on task matching weights while learning the new task. Experiments on various benchmarks show that MoCL outperforms state-of-the-art methods in task-incremental learning settings, where task identities are available during testing. It also demonstrates strong abilities to transfer knowledge from previous tasks to new tasks. MoCL's task matching strategy enables task composition during testing, effectively addressing the continual learning problem in the challenging class-incremental setting where task identities are not provided during testing. The code base for MoCL is available online. MoCL falls into the category of parameter isolation-based continual learning, i.e., we allocate task-specific parameters to avoid knowledge interference. In contrast to related work, we additionally encourage knowledge transfer considering the relatedness across tasks. MoCL is evaluated on near-domain and far-domain continual learning benchmarks, showing superior performance compared to existing methods. The results indicate that MoCL is effective in transferring knowledge across tasks, even when tasks are dissimilar. The framework is applicable to a wide range of tasks, but further experiments with other types of NLP tasks, especially generative tasks, are left for future work. MoCL's task matching weights distribution is sparse, suggesting that the number of task modules could be reduced. This could be an interesting direction for future work. The paper also discusses the limitations of the work, including the scope of evaluation and the computational and storage costs associated with continually initializing new PEFT modules for each task. The authors thank the anonymous reviewers for their constructive feedback and acknowledge the support from the Deutsche Forschungsgemeinschaft.
Reach us at info@study.space
Understanding Rehearsal-Free Modular and Compositional Continual Learning for Language Models