Understanding Frequency Attention for Knowledge Distillation

The paper "Frequency Attention for Knowledge Distillation" introduces a novel approach to knowledge distillation by leveraging the frequency domain to capture global and detailed information from a teacher model. The authors propose a Frequency Attention Module (FAM) that operates in the frequency domain, using a learnable global filter to adjust the frequencies of the student's features, thereby encouraging the student to mimic the teacher's features more effectively. This module is integrated into two enhanced knowledge distillation models: layer-to-layer feature-based distillation and knowledge review distillation. Extensive experiments on various datasets, including CIFAR-100, ImageNet, and MS COCO, demonstrate that the proposed method outperforms other state-of-the-art knowledge distillation methods in both image classification and object detection tasks. The FAM module's effectiveness is further validated through ablation studies, showing that both the global and local branches, as well as the high pass filter, contribute significantly to the model's performance.The paper "Frequency Attention for Knowledge Distillation" introduces a novel approach to knowledge distillation by leveraging the frequency domain to capture global and detailed information from a teacher model. The authors propose a Frequency Attention Module (FAM) that operates in the frequency domain, using a learnable global filter to adjust the frequencies of the student's features, thereby encouraging the student to mimic the teacher's features more effectively. This module is integrated into two enhanced knowledge distillation models: layer-to-layer feature-based distillation and knowledge review distillation. Extensive experiments on various datasets, including CIFAR-100, ImageNet, and MS COCO, demonstrate that the proposed method outperforms other state-of-the-art knowledge distillation methods in both image classification and object detection tasks. The FAM module's effectiveness is further validated through ablation studies, showing that both the global and local branches, as well as the high pass filter, contribute significantly to the model's performance.

Frequency Attention for Knowledge Distillation

9 Mar 2024 | Cuong Pham, Van-Anh Nguyen, Trung Le, Dinh Phung, Gustavo Carneiro, Thanh-Toan Do