Towards Lifelong Learning of Large Language Models: A Survey

Towards Lifelong Learning of Large Language Models: A Survey

June 2024 | JUNHAO ZHENG, SHENGJIE QIU, CHENGMING SHI, QIANLI MA
This survey provides a comprehensive overview of lifelong learning methods for large language models (LLMs), categorizing strategies into two main groups: Internal Knowledge and External Knowledge. Internal Knowledge involves integrating new knowledge into the model's parameters through full or partial training, while External Knowledge incorporates new information as external resources without updating the model's parameters. The survey introduces a novel taxonomy to categorize existing literature into 12 scenarios, identifies common techniques across all scenarios, and highlights emerging methods such as model expansion and data selection. It also discusses challenges and future directions in lifelong learning for LLMs. The survey covers various aspects of lifelong learning, including problem formulation, evaluation metrics, common techniques, benchmarks, and datasets. It examines existing techniques for continual pretraining, continual finetuning, and external-knowledge-based lifelong learning. Continual pretraining methods include vertical domain pretraining, language domain pretraining, and temporal domain pretraining. Techniques such as experience replay, knowledge distillation, parameter-efficient finetuning, model expansion, and re-warming are discussed. Continual finetuning methods include text classification, named entity recognition, relation extraction, and machine translation, with strategies like distillation-based, replay-based, regularization-based, and architecture-based approaches. The survey also addresses challenges in lifelong learning, such as catastrophic forgetting and temporal adaptation, and highlights the importance of efficient and cost-effective strategies. It discusses the role of architecture-based methods in adapting model structures to integrate new tasks while preserving previous knowledge. The survey emphasizes the need for innovative approaches to mitigate forgetting, improve temporal generalization, and develop efficient, adaptive architectures for sustained model performance. Overall, the survey provides a detailed analysis of lifelong learning methods for LLMs, contributing to the ongoing development of more robust and versatile models capable of adapting to evolving digital landscapes.This survey provides a comprehensive overview of lifelong learning methods for large language models (LLMs), categorizing strategies into two main groups: Internal Knowledge and External Knowledge. Internal Knowledge involves integrating new knowledge into the model's parameters through full or partial training, while External Knowledge incorporates new information as external resources without updating the model's parameters. The survey introduces a novel taxonomy to categorize existing literature into 12 scenarios, identifies common techniques across all scenarios, and highlights emerging methods such as model expansion and data selection. It also discusses challenges and future directions in lifelong learning for LLMs. The survey covers various aspects of lifelong learning, including problem formulation, evaluation metrics, common techniques, benchmarks, and datasets. It examines existing techniques for continual pretraining, continual finetuning, and external-knowledge-based lifelong learning. Continual pretraining methods include vertical domain pretraining, language domain pretraining, and temporal domain pretraining. Techniques such as experience replay, knowledge distillation, parameter-efficient finetuning, model expansion, and re-warming are discussed. Continual finetuning methods include text classification, named entity recognition, relation extraction, and machine translation, with strategies like distillation-based, replay-based, regularization-based, and architecture-based approaches. The survey also addresses challenges in lifelong learning, such as catastrophic forgetting and temporal adaptation, and highlights the importance of efficient and cost-effective strategies. It discusses the role of architecture-based methods in adapting model structures to integrate new tasks while preserving previous knowledge. The survey emphasizes the need for innovative approaches to mitigate forgetting, improve temporal generalization, and develop efficient, adaptive architectures for sustained model performance. Overall, the survey provides a detailed analysis of lifelong learning methods for LLMs, contributing to the ongoing development of more robust and versatile models capable of adapting to evolving digital landscapes.
Reach us at info@study.space
[slides] Towards Lifelong Learning of Large Language Models%3A A Survey | StudySpace