SCALABLE LANGUAGE MODEL WITH GENERALIZED CONTINUAL LEARNING

SCALABLE LANGUAGE MODEL WITH GENERALIZED CONTINUAL LEARNING

11 Apr 2024 | Bohao PENG† Zhuotao TIAN‡ Shu LIU‡ Mingchang YANG† Jiaya JIA†
This paper introduces the Scalable Language Model (SLM) to address the limitations of existing continual learning methods, which often rely on experience replay, optimization constraints, and inference task-ID. SLM incorporates two key components: Joint Adaptive Re-parameterization (JARe) and Dynamic Task-related Knowledge Retrieval (DTKR). JARe dynamically adjusts the model's weights to align with specific downstream tasks, while DTKR retrieves relevant knowledge from the vector space based on task distributions. This approach enables efficient and scalable knowledge expansion and management, achieving state-of-the-art performance on various benchmarks, including BERT, T5, and LLaMA-2. The method demonstrates significant reduction in forgetting, with minimal performance degradation, and explores continual learning across multiple task types and domains, showcasing superior generalization capabilities. The paper also provides detailed experimental results and analysis, highlighting the effectiveness and practical applicability of SLM in real-world scenarios.This paper introduces the Scalable Language Model (SLM) to address the limitations of existing continual learning methods, which often rely on experience replay, optimization constraints, and inference task-ID. SLM incorporates two key components: Joint Adaptive Re-parameterization (JARe) and Dynamic Task-related Knowledge Retrieval (DTKR). JARe dynamically adjusts the model's weights to align with specific downstream tasks, while DTKR retrieves relevant knowledge from the vector space based on task distributions. This approach enables efficient and scalable knowledge expansion and management, achieving state-of-the-art performance on various benchmarks, including BERT, T5, and LLaMA-2. The method demonstrates significant reduction in forgetting, with minimal performance degradation, and explores continual learning across multiple task types and domains, showcasing superior generalization capabilities. The paper also provides detailed experimental results and analysis, highlighting the effectiveness and practical applicability of SLM in real-world scenarios.
Reach us at info@study.space
[slides and audio] Scalable Language Model with Generalized Continual Learning