| Jianping Gou, Baosheng Yu, Stephen J. Maybank, Dacheng Tao
This paper provides a comprehensive survey of knowledge distillation, a technique used to compress large deep neural networks into smaller, more efficient models. The survey covers various aspects of knowledge distillation, including different types of knowledge, training schemes, teacher-student architectures, distillation algorithms, performance comparisons, and applications. Knowledge distillation effectively learns a small student model from a large teacher model, addressing the challenges of deploying deep models on devices with limited resources. The paper discusses the theoretical and empirical understanding of knowledge distillation, highlighting its role in model compression and its extensions to other tasks such as label smoothing, data augmentation, and data privacy. It also explores the relationship between the teacher and student models, the importance of choosing appropriate teacher and student structures, and the use of adversarial learning and multi-teacher distillation to enhance performance. The survey concludes with a discussion on future research directions and challenges in knowledge distillation.This paper provides a comprehensive survey of knowledge distillation, a technique used to compress large deep neural networks into smaller, more efficient models. The survey covers various aspects of knowledge distillation, including different types of knowledge, training schemes, teacher-student architectures, distillation algorithms, performance comparisons, and applications. Knowledge distillation effectively learns a small student model from a large teacher model, addressing the challenges of deploying deep models on devices with limited resources. The paper discusses the theoretical and empirical understanding of knowledge distillation, highlighting its role in model compression and its extensions to other tasks such as label smoothing, data augmentation, and data privacy. It also explores the relationship between the teacher and student models, the importance of choosing appropriate teacher and student structures, and the use of adversarial learning and multi-teacher distillation to enhance performance. The survey concludes with a discussion on future research directions and challenges in knowledge distillation.