[slides] Similarity-Preserving Knowledge Distillation

The paper introduces a novel form of knowledge distillation called similarity-preserving knowledge distillation, which aims to preserve pairwise activation similarities in the student network's representation space. Unlike traditional methods that focus on mimicking the teacher's representation space, this approach guides the student to produce similar activations for input pairs that elicit similar activations in the teacher. The distillation loss is defined on the pairwise similarity matrices of activations from the student and teacher networks. Experiments on three public datasets (CIFAR-10,Describable Textures, and CINIC-10) demonstrate the effectiveness of this method, showing improved training outcomes and complementing traditional distillation techniques. The approach is particularly useful in scenarios with limited training data, domain shifts, and resource constraints, making it a versatile tool for various applications such as model compression, privileged learning, and adversarial defense.The paper introduces a novel form of knowledge distillation called similarity-preserving knowledge distillation, which aims to preserve pairwise activation similarities in the student network's representation space. Unlike traditional methods that focus on mimicking the teacher's representation space, this approach guides the student to produce similar activations for input pairs that elicit similar activations in the teacher. The distillation loss is defined on the pairwise similarity matrices of activations from the student and teacher networks. Experiments on three public datasets (CIFAR-10,Describable Textures, and CINIC-10) demonstrate the effectiveness of this method, showing improved training outcomes and complementing traditional distillation techniques. The approach is particularly useful in scenarios with limited training data, domain shifts, and resource constraints, making it a versatile tool for various applications such as model compression, privileged learning, and adversarial defense.

Similarity-Preserving Knowledge Distillation

1 Aug 2019 | Frederick Tung1,2 and Greg Mori1,2