1 May 2019 | Wonpyo Park, Dongju Kim, Yan Lu, Minsu Cho
Relational Knowledge Distillation (RKD) is a novel approach to knowledge distillation that transfers mutual relations between data examples rather than individual outputs. Unlike conventional knowledge distillation (KD), which focuses on matching outputs between teacher and student models, RKD emphasizes the structural relationships in the data embedding space. The method introduces two types of distillation losses: distance-wise and angle-wise, which penalize differences in relational structures. These losses are designed to capture higher-order properties of data examples, making RKD more effective in transferring knowledge, especially in metric learning tasks where students can outperform their teachers.
RKD is implemented by training the student model to match the relational structure of the teacher's outputs. This approach is more robust to differences in output dimensions and can be combined with other methods to enhance performance. Experiments on metric learning, image classification, and few-shot learning show that RKD significantly improves student model performance. In particular, RKD outperforms existing KD methods on benchmark datasets, including CUB-200-2011 and Cars 196, and achieves state-of-the-art results in metric learning.
The method is also effective in self-distillation, where the student model is trained using the same architecture as the teacher. RKD benefits from this process, allowing smaller student models to outperform larger teachers. Additionally, RKD is shown to be effective in domain adaptation tasks, where it adapts models to specific domains while sacrificing generalization to other domains.
RKD is implemented using distance-wise and angle-wise losses, which measure the Euclidean distance and angular relationships between data examples. These losses are applied to the final embedding outputs of the teacher and student models. The method is trained with a combination of task-specific losses and knowledge distillation losses, allowing for flexible and effective model training.
Overall, RKD provides a more comprehensive and effective approach to knowledge distillation by focusing on relational structures rather than individual outputs. This approach is particularly beneficial for tasks such as metric learning, where students can outperform their teachers, and for domain adaptation, where models can be adapted to specific domains. The method is implemented with a variety of loss functions and is shown to be effective across multiple tasks and datasets.Relational Knowledge Distillation (RKD) is a novel approach to knowledge distillation that transfers mutual relations between data examples rather than individual outputs. Unlike conventional knowledge distillation (KD), which focuses on matching outputs between teacher and student models, RKD emphasizes the structural relationships in the data embedding space. The method introduces two types of distillation losses: distance-wise and angle-wise, which penalize differences in relational structures. These losses are designed to capture higher-order properties of data examples, making RKD more effective in transferring knowledge, especially in metric learning tasks where students can outperform their teachers.
RKD is implemented by training the student model to match the relational structure of the teacher's outputs. This approach is more robust to differences in output dimensions and can be combined with other methods to enhance performance. Experiments on metric learning, image classification, and few-shot learning show that RKD significantly improves student model performance. In particular, RKD outperforms existing KD methods on benchmark datasets, including CUB-200-2011 and Cars 196, and achieves state-of-the-art results in metric learning.
The method is also effective in self-distillation, where the student model is trained using the same architecture as the teacher. RKD benefits from this process, allowing smaller student models to outperform larger teachers. Additionally, RKD is shown to be effective in domain adaptation tasks, where it adapts models to specific domains while sacrificing generalization to other domains.
RKD is implemented using distance-wise and angle-wise losses, which measure the Euclidean distance and angular relationships between data examples. These losses are applied to the final embedding outputs of the teacher and student models. The method is trained with a combination of task-specific losses and knowledge distillation losses, allowing for flexible and effective model training.
Overall, RKD provides a more comprehensive and effective approach to knowledge distillation by focusing on relational structures rather than individual outputs. This approach is particularly beneficial for tasks such as metric learning, where students can outperform their teachers, and for domain adaptation, where models can be adapted to specific domains. The method is implemented with a variety of loss functions and is shown to be effective across multiple tasks and datasets.