Continual Learning of Large Language Models: A Comprehensive Survey

Continual Learning of Large Language Models: A Comprehensive Survey

25 Nov 2024 | HAIZHOU SHI, ZIHAO XU, HENGYI WANG, WEIYI QIN, WENYUAN WANG, and YIBIN WANG, ZIFENG WANG and SAYNA EBRAHIMI, HAO WANG
This survey provides a comprehensive overview of the current research on continual learning (CL) for large language models (LLMs). The challenge of adapting pre-trained LLMs to evolving data distributions remains significant, as these models often suffer from catastrophic forgetting when applied to new tasks. The survey discusses two main directions of continuity in CL for LLMs: vertical continuity (vertical continual learning) and horizontal continuity (horizontal continual learning). Vertical continuity involves adapting LLMs from general to specific domains, while horizontal continuity involves adapting across time and domains. The survey outlines three key stages of LLM learning in the context of modern CL: Continual Pre-Training (CPT), Domain-Adaptive Pre-training (DAP), and Continual Fine-Tuning (CFT). It also provides an overview of evaluation protocols and data sources for continual learning with LLMs, and discusses open questions in the field. The survey highlights the underexplored area of continually pre-training, adapting, and fine-tuning LLMs, emphasizing the need for practical evaluation benchmarks and methods to counter forgetting. Key areas requiring attention include the development of accessible evaluation benchmarks and techniques to enable knowledge transfer in the evolving landscape of LLM learning paradigms. The survey also discusses the challenges and techniques involved in continual learning for LLMs, including distributional shifts, task heterogeneity, and inaccessible upstream data. The survey concludes with a discussion of the future research directions for continual learning in LLMs.This survey provides a comprehensive overview of the current research on continual learning (CL) for large language models (LLMs). The challenge of adapting pre-trained LLMs to evolving data distributions remains significant, as these models often suffer from catastrophic forgetting when applied to new tasks. The survey discusses two main directions of continuity in CL for LLMs: vertical continuity (vertical continual learning) and horizontal continuity (horizontal continual learning). Vertical continuity involves adapting LLMs from general to specific domains, while horizontal continuity involves adapting across time and domains. The survey outlines three key stages of LLM learning in the context of modern CL: Continual Pre-Training (CPT), Domain-Adaptive Pre-training (DAP), and Continual Fine-Tuning (CFT). It also provides an overview of evaluation protocols and data sources for continual learning with LLMs, and discusses open questions in the field. The survey highlights the underexplored area of continually pre-training, adapting, and fine-tuning LLMs, emphasizing the need for practical evaluation benchmarks and methods to counter forgetting. Key areas requiring attention include the development of accessible evaluation benchmarks and techniques to enable knowledge transfer in the evolving landscape of LLM learning paradigms. The survey also discusses the challenges and techniques involved in continual learning for LLMs, including distributional shifts, task heterogeneity, and inaccessible upstream data. The survey concludes with a discussion of the future research directions for continual learning in LLMs.
Reach us at info@study.space