9 Mar 2024 | Cuong Pham, Van-Anh Nguyen, Trung Le, Dinh Phung, Gustavo Carneiro
This paper proposes a novel frequency attention module (FAM) for knowledge distillation, which operates in the frequency domain to encourage the student model to mimic the teacher's features. The FAM consists of a learnable global filter that adjusts the frequencies of the student's features based on the teacher's features, enabling the student to capture both detailed and higher-level information. The module also includes a local branch that leverages spatial domain information. The FAM is integrated into two popular knowledge distillation mechanisms: layer-to-layer feature-based distillation and knowledge review-based distillation. The proposed approach is evaluated on benchmark datasets for image classification and object detection, showing significant improvements over existing methods. The FAM module is effective in capturing global information and geometric structures, which are difficult to extract using traditional spatial domain techniques. The experiments demonstrate that the proposed approach outperforms other knowledge distillation methods on CIFAR-100, ImageNet, and MS COCO datasets. The results show that the FAM module enhances the student's ability to mimic the teacher's features, leading to better performance in both classification and object detection tasks. The method is efficient and effective, with a computational complexity that is manageable for practical applications. The FAM module is a promising approach for knowledge distillation, offering a new way to transfer knowledge from teacher to student models.This paper proposes a novel frequency attention module (FAM) for knowledge distillation, which operates in the frequency domain to encourage the student model to mimic the teacher's features. The FAM consists of a learnable global filter that adjusts the frequencies of the student's features based on the teacher's features, enabling the student to capture both detailed and higher-level information. The module also includes a local branch that leverages spatial domain information. The FAM is integrated into two popular knowledge distillation mechanisms: layer-to-layer feature-based distillation and knowledge review-based distillation. The proposed approach is evaluated on benchmark datasets for image classification and object detection, showing significant improvements over existing methods. The FAM module is effective in capturing global information and geometric structures, which are difficult to extract using traditional spatial domain techniques. The experiments demonstrate that the proposed approach outperforms other knowledge distillation methods on CIFAR-100, ImageNet, and MS COCO datasets. The results show that the FAM module enhances the student's ability to mimic the teacher's features, leading to better performance in both classification and object detection tasks. The method is efficient and effective, with a computational complexity that is manageable for practical applications. The FAM module is a promising approach for knowledge distillation, offering a new way to transfer knowledge from teacher to student models.