DRCT: Saving Image Super-Resolution away from Information Bottleneck

DRCT: Saving Image Super-Resolution away from Information Bottleneck

15 Apr 2024 | Chih-Chung Hsu, Chia-Ming Lee, Yi-Shiuan Chou
DRCT: Saving Image Super-Resolution away from Information Bottleneck Chih-Chung Hsu, Chia-Ming Lee, Yi-Shiuan Chou Institute of Data Science, National Cheng Kung University cchsu@gs.ncku.edu.tw, zuw408421476@gmail.com, nelly910421@gmail.com Abstract: Recent years have seen significant success in low-level vision tasks using Vision Transformer-based approaches. Unlike CNN-based models, Transformers are better at capturing long-range dependencies, enabling image reconstruction using non-local information. In super-resolution, Swin-transformer-based models have become mainstream due to their ability to model global spatial information and their shifting window attention mechanism. However, feature map intensity is abruptly suppressed towards the network's end, indicating an information bottleneck. To address this, the authors propose the Dense-residual-connected Transformer (DRCT), which mitigates spatial information loss and stabilizes information flow through dense-residual connections, thereby unleashing the model's potential. Experimental results show that DRCT surpasses state-of-the-art methods on benchmark datasets and performs well in the NTIRE-2024 Image Super-Resolution (x4) Challenge. Introduction: Single Image Super-Resolution (SISR) aims to reconstruct a high-quality image from a low-resolution version. CNN-based strategies have dominated the super-resolution domain for a long time, leveraging techniques like residual learning and recursive learning. However, CNNs have limitations in capturing long-range dependencies. Transformer-based SISR networks, such as IPT and EDT, have been introduced to enhance SISR performance. SwinIR incorporates Swin-Transformer into SISR, marking a significant advancement. However, as network depth increases, feature map intensity decreases, indicating spatial information loss. The authors propose DRCT to address this issue by enhancing receptive fields and adding dense connections within residual blocks to mitigate information bottlenecks. Problem Statement: The information bottleneck principle suggests that information loss occurs as data passes through layers, leading to gradient vanishing. In SISR, the goal is to maximize mutual information between HR and SR images. Spatial information loss in super-resolution is a common issue, leading to non-smooth feature map intensity and information bottlenecks. Dense connections in Swin-Transformer-based models can stabilize information flow and enhance performance. Motivation: Dense-residual group auxiliary supervision and dense connections with shifting-window mechanisms are introduced to stabilize information flow and enhance receptive fields. DRCT's design allows for achieving outstanding performance with simpler model architectures. Methodology: DRCT comprises three components: shallow feature extraction, deep feature extraction, and image reconstruction. Shallow and deep feature extraction involve convolutional layers and deep feature extraction modules. Image reconstruction aggregates shallow and deep features. Deep feature extraction uses residual deep feature extraction groups and Swin-Dense-Residual-Connected Blocks to enhance receptive fields and capture long-range dependencies. Same-task Progressive Training StrategyDRCT: Saving Image Super-Resolution away from Information Bottleneck Chih-Chung Hsu, Chia-Ming Lee, Yi-Shiuan Chou Institute of Data Science, National Cheng Kung University cchsu@gs.ncku.edu.tw, zuw408421476@gmail.com, nelly910421@gmail.com Abstract: Recent years have seen significant success in low-level vision tasks using Vision Transformer-based approaches. Unlike CNN-based models, Transformers are better at capturing long-range dependencies, enabling image reconstruction using non-local information. In super-resolution, Swin-transformer-based models have become mainstream due to their ability to model global spatial information and their shifting window attention mechanism. However, feature map intensity is abruptly suppressed towards the network's end, indicating an information bottleneck. To address this, the authors propose the Dense-residual-connected Transformer (DRCT), which mitigates spatial information loss and stabilizes information flow through dense-residual connections, thereby unleashing the model's potential. Experimental results show that DRCT surpasses state-of-the-art methods on benchmark datasets and performs well in the NTIRE-2024 Image Super-Resolution (x4) Challenge. Introduction: Single Image Super-Resolution (SISR) aims to reconstruct a high-quality image from a low-resolution version. CNN-based strategies have dominated the super-resolution domain for a long time, leveraging techniques like residual learning and recursive learning. However, CNNs have limitations in capturing long-range dependencies. Transformer-based SISR networks, such as IPT and EDT, have been introduced to enhance SISR performance. SwinIR incorporates Swin-Transformer into SISR, marking a significant advancement. However, as network depth increases, feature map intensity decreases, indicating spatial information loss. The authors propose DRCT to address this issue by enhancing receptive fields and adding dense connections within residual blocks to mitigate information bottlenecks. Problem Statement: The information bottleneck principle suggests that information loss occurs as data passes through layers, leading to gradient vanishing. In SISR, the goal is to maximize mutual information between HR and SR images. Spatial information loss in super-resolution is a common issue, leading to non-smooth feature map intensity and information bottlenecks. Dense connections in Swin-Transformer-based models can stabilize information flow and enhance performance. Motivation: Dense-residual group auxiliary supervision and dense connections with shifting-window mechanisms are introduced to stabilize information flow and enhance receptive fields. DRCT's design allows for achieving outstanding performance with simpler model architectures. Methodology: DRCT comprises three components: shallow feature extraction, deep feature extraction, and image reconstruction. Shallow and deep feature extraction involve convolutional layers and deep feature extraction modules. Image reconstruction aggregates shallow and deep features. Deep feature extraction uses residual deep feature extraction groups and Swin-Dense-Residual-Connected Blocks to enhance receptive fields and capture long-range dependencies. Same-task Progressive Training Strategy
Reach us at info@study.space
Understanding DRCT%3A Saving Image Super-Resolution away from Information Bottleneck