15 Apr 2024 | Yuxuan Jiang, Chen Feng, Fan Zhang, and David Bull
This paper proposes a novel Multi-Teacher Knowledge Distillation (MTKD) framework for image super-resolution. The framework leverages multiple teacher models with different architectures to enhance the learning process of a compact student network. It introduces a wavelet-based loss function that optimizes training by considering both spatial and frequency domain differences. The MTKD framework combines outputs from multiple teacher models through a Knowledge Aggregation module, which generates an enhanced representation of the high-resolution image. This representation is then used to guide the student model during distillation. The proposed method achieves significant improvements in super-resolution performance, up to 0.46dB (based on PSNR), compared to state-of-the-art knowledge distillation approaches across different network structures. The method is evaluated using three popular network architectures and five commonly used knowledge distillation methods. The results show that MTKD outperforms existing methods in terms of both quantitative and qualitative performance. The framework is also validated through ablation studies, demonstrating the effectiveness of the proposed wavelet-based loss function and the Knowledge Aggregation module. The MTKD approach is applicable to various image super-resolution tasks and has the potential to be extended to other low-level computer vision tasks.This paper proposes a novel Multi-Teacher Knowledge Distillation (MTKD) framework for image super-resolution. The framework leverages multiple teacher models with different architectures to enhance the learning process of a compact student network. It introduces a wavelet-based loss function that optimizes training by considering both spatial and frequency domain differences. The MTKD framework combines outputs from multiple teacher models through a Knowledge Aggregation module, which generates an enhanced representation of the high-resolution image. This representation is then used to guide the student model during distillation. The proposed method achieves significant improvements in super-resolution performance, up to 0.46dB (based on PSNR), compared to state-of-the-art knowledge distillation approaches across different network structures. The method is evaluated using three popular network architectures and five commonly used knowledge distillation methods. The results show that MTKD outperforms existing methods in terms of both quantitative and qualitative performance. The framework is also validated through ablation studies, demonstrating the effectiveness of the proposed wavelet-based loss function and the Knowledge Aggregation module. The MTKD approach is applicable to various image super-resolution tasks and has the potential to be extended to other low-level computer vision tasks.