HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution
This paper proposes HiT-SR, a hierarchical transformer-based approach for efficient image super-resolution (SR). Traditional transformer-based SR methods use fixed small windows with quadratic computational complexity, limiting receptive fields and multi-scale feature aggregation. HiT-SR replaces these with expanding hierarchical windows to aggregate multi-scale features and establish long-range dependencies. A spatial-channel correlation (SCC) method with linear complexity is introduced to efficiently gather spatial and channel information from hierarchical windows. Extensive experiments show that HiT-SR improves SR performance with fewer parameters, FLOPs, and faster speeds (up to 7×). The method is applied to SwinIR-Light, SwinIR-NG, and SRFormer-Light, achieving state-of-the-art results. HiT-SR's block-level design uses hierarchical windows to collect multi-scale features, while the layer-level design employs SCC for efficient aggregation. The SCC method uses dual feature extraction and spatial-channel self-correlation to achieve linear complexity. The method outperforms existing approaches in performance, efficiency, and convergence. HiT-SR is evaluated on benchmark datasets, showing significant improvements in PSNR and SSIM. The method is also effective in challenging scenes, producing finer details and sharper textures. The approach is validated through ablation studies and comparisons with other methods. HiT-SR's hierarchical windows and SCC method enable efficient aggregation of multi-scale features, leading to better SR performance. The method is efficient and scalable, with linear complexity to window sizes. HiT-SR is a general strategy for converting transformer-based SR methods to hierarchical transformers, achieving state-of-the-art results in efficient image super-resolution.HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution
This paper proposes HiT-SR, a hierarchical transformer-based approach for efficient image super-resolution (SR). Traditional transformer-based SR methods use fixed small windows with quadratic computational complexity, limiting receptive fields and multi-scale feature aggregation. HiT-SR replaces these with expanding hierarchical windows to aggregate multi-scale features and establish long-range dependencies. A spatial-channel correlation (SCC) method with linear complexity is introduced to efficiently gather spatial and channel information from hierarchical windows. Extensive experiments show that HiT-SR improves SR performance with fewer parameters, FLOPs, and faster speeds (up to 7×). The method is applied to SwinIR-Light, SwinIR-NG, and SRFormer-Light, achieving state-of-the-art results. HiT-SR's block-level design uses hierarchical windows to collect multi-scale features, while the layer-level design employs SCC for efficient aggregation. The SCC method uses dual feature extraction and spatial-channel self-correlation to achieve linear complexity. The method outperforms existing approaches in performance, efficiency, and convergence. HiT-SR is evaluated on benchmark datasets, showing significant improvements in PSNR and SSIM. The method is also effective in challenging scenes, producing finer details and sharper textures. The approach is validated through ablation studies and comparisons with other methods. HiT-SR's hierarchical windows and SCC method enable efficient aggregation of multi-scale features, leading to better SR performance. The method is efficient and scalable, with linear complexity to window sizes. HiT-SR is a general strategy for converting transformer-based SR methods to hierarchical transformers, achieving state-of-the-art results in efficient image super-resolution.