27 Mar 2015 | Adriana Romero1, Nicolas Ballas2, Samira Ebrahimi Kahou3, Antoine Chassang2, Carlo Gatta4 & Yoshua Bengio2†
The paper introduces FitNets, a novel framework for training thin and deeper neural networks by leveraging intermediate-level hints from a wider and deeper teacher network. The approach extends the concept of knowledge distillation (KD) to allow the student network to learn intermediate representations that are predictive of the teacher's hidden layers. This method helps train deeper models with fewer parameters, improving both generalization and computational efficiency. The authors demonstrate that FitNets can achieve better performance than their teacher networks while requiring significantly fewer parameters, as shown on various benchmark datasets such as CIFAR-10, CIFAR-100, SVHN, and AFLW. The results highlight the effectiveness of using hints from the teacher network to guide the training process, making it possible to train deeper and thinner models that outperform their wider and shallower counterparts.The paper introduces FitNets, a novel framework for training thin and deeper neural networks by leveraging intermediate-level hints from a wider and deeper teacher network. The approach extends the concept of knowledge distillation (KD) to allow the student network to learn intermediate representations that are predictive of the teacher's hidden layers. This method helps train deeper models with fewer parameters, improving both generalization and computational efficiency. The authors demonstrate that FitNets can achieve better performance than their teacher networks while requiring significantly fewer parameters, as shown on various benchmark datasets such as CIFAR-10, CIFAR-100, SVHN, and AFLW. The results highlight the effectiveness of using hints from the teacher network to guide the training process, making it possible to train deeper and thinner models that outperform their wider and shallower counterparts.