The paper "HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution" introduces a novel approach to enhance image super-resolution (SR) using hierarchical transformers. The authors address the limitations of popular transformer-based SR methods, which often use fixed small windows with quadratic computational complexity, leading to limited receptive fields and poor performance in capturing long-range dependencies and multi-scale features. To overcome these issues, they propose a general strategy to convert transformer-based SR networks into hierarchical transformers (HiT-SR). This approach involves replacing fixed small windows with expanding hierarchical windows to aggregate features at different scales and establish long-range dependencies. Additionally, they introduce a spatial-channel correlation (SCC) method to efficiently gather spatial and channel information from large hierarchical windows, achieving linear computational complexity to window sizes.
The key contributions of the paper are:
1. A general strategy to convert transformer-based SR networks to hierarchical transformers (HiT-SR), enhancing SR performance by exploiting multi-scale features and long-range dependencies.
2. The design of a spatial-channel correlation (SCC) method to efficiently leverage spatial and channel features with linear computational complexity to window sizes, enabling the use of large hierarchical windows.
3. The conversion of popular SR methods SwinIR-Light, SwinIR-NG, and SRFormer-Light to HiT-SR versions (HiT-SIR, HiT-SNG, and HiT-SRF), achieving state-of-the-art SR results with fewer parameters, FLOPs, and faster speeds ($\sim 7 \times$).
The paper includes extensive experiments to validate the effectiveness and efficiency of HiT-SR, demonstrating significant improvements in SR performance, computational efficiency, and convergence speed compared to existing methods. The results are evaluated on several benchmark datasets, showing superior performance in terms of PSNR and SSIM metrics.The paper "HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution" introduces a novel approach to enhance image super-resolution (SR) using hierarchical transformers. The authors address the limitations of popular transformer-based SR methods, which often use fixed small windows with quadratic computational complexity, leading to limited receptive fields and poor performance in capturing long-range dependencies and multi-scale features. To overcome these issues, they propose a general strategy to convert transformer-based SR networks into hierarchical transformers (HiT-SR). This approach involves replacing fixed small windows with expanding hierarchical windows to aggregate features at different scales and establish long-range dependencies. Additionally, they introduce a spatial-channel correlation (SCC) method to efficiently gather spatial and channel information from large hierarchical windows, achieving linear computational complexity to window sizes.
The key contributions of the paper are:
1. A general strategy to convert transformer-based SR networks to hierarchical transformers (HiT-SR), enhancing SR performance by exploiting multi-scale features and long-range dependencies.
2. The design of a spatial-channel correlation (SCC) method to efficiently leverage spatial and channel features with linear computational complexity to window sizes, enabling the use of large hierarchical windows.
3. The conversion of popular SR methods SwinIR-Light, SwinIR-NG, and SRFormer-Light to HiT-SR versions (HiT-SIR, HiT-SNG, and HiT-SRF), achieving state-of-the-art SR results with fewer parameters, FLOPs, and faster speeds ($\sim 7 \times$).
The paper includes extensive experiments to validate the effectiveness and efficiency of HiT-SR, demonstrating significant improvements in SR performance, computational efficiency, and convergence speed compared to existing methods. The results are evaluated on several benchmark datasets, showing superior performance in terms of PSNR and SSIM metrics.