A Survey on Knowledge Distillation of Large Language Models

A Survey on Knowledge Distillation of Large Language Models

8 Mar 2024 | Xiaohan Xu, Ming Li, Chongyang Tao, Tao Shen, Reynold Cheng, Jinyang Li, Can Xu, Dacheng Tao, Tianyi Zhou
A survey on knowledge distillation of large language models (LLMs) explores the role of knowledge distillation (KD) in transferring advanced capabilities from proprietary LLMs like GPT-4 to open-source models such as LLaMA and Mistral. KD is crucial for compressing models and enabling self-improvement through self-teaching. The survey highlights three pillars: algorithm, skill, and verticalization, examining KD mechanisms, skill enhancement, and practical applications. It emphasizes the interplay between data augmentation (DA) and KD, showing how DA enhances LLM performance by generating context-rich training data. KD also facilitates model compression and self-improvement, making open-source models more efficient and capable. The survey discusses various KD techniques, including labeling, expansion, data curation, feature extraction, and feedback mechanisms, to enhance specific skills like context understanding, alignment with user intent, and NLP task specialization. It also covers domain-specific vertical distillation, demonstrating how KD improves performance in specialized fields like healthcare, law, and science. The survey identifies challenges and future research directions, advocating for ethical and legal compliance in KD. It provides a comprehensive overview of current KD methodologies and their potential to democratize AI access. The survey is structured into sections covering KD overview, algorithms, skill distillation, vertical distillation, and open problems, offering insights for researchers and practitioners in the field.A survey on knowledge distillation of large language models (LLMs) explores the role of knowledge distillation (KD) in transferring advanced capabilities from proprietary LLMs like GPT-4 to open-source models such as LLaMA and Mistral. KD is crucial for compressing models and enabling self-improvement through self-teaching. The survey highlights three pillars: algorithm, skill, and verticalization, examining KD mechanisms, skill enhancement, and practical applications. It emphasizes the interplay between data augmentation (DA) and KD, showing how DA enhances LLM performance by generating context-rich training data. KD also facilitates model compression and self-improvement, making open-source models more efficient and capable. The survey discusses various KD techniques, including labeling, expansion, data curation, feature extraction, and feedback mechanisms, to enhance specific skills like context understanding, alignment with user intent, and NLP task specialization. It also covers domain-specific vertical distillation, demonstrating how KD improves performance in specialized fields like healthcare, law, and science. The survey identifies challenges and future research directions, advocating for ethical and legal compliance in KD. It provides a comprehensive overview of current KD methodologies and their potential to democratize AI access. The survey is structured into sections covering KD overview, algorithms, skill distillation, vertical distillation, and open problems, offering insights for researchers and practitioners in the field.
Reach us at info@study.space
[slides] A Survey on Knowledge Distillation of Large Language Models | StudySpace