Understanding An Efficient Hybrid CNN-Transformer Approach for Remote Sensing Super-Resolution

The paper presents an efficient hybrid network (EHNet) for remote sensing super-resolution (SR) that combines a lightweight convolutional encoder and an improved Swin Transformer decoder. The encoder, featuring a novel Lightweight Feature Extraction Block (LFEB), uses depthwise convolutions and Cross Stage Partial (CSP) connections to enhance feature extraction efficiency. The decoder, based on the Swin Transformer, introduces a sequence-based upsample block (SUB) that focuses on semantic information, improving detail recovery. EHNet achieves state-of-the-art performance on the UCMerced and AID datasets, with a PSNR of 28.02 and 29.44, respectively, and a computational complexity of 2.64 million parameters. The method effectively balances model efficiency and accuracy, making it suitable for various remote sensing applications.The paper presents an efficient hybrid network (EHNet) for remote sensing super-resolution (SR) that combines a lightweight convolutional encoder and an improved Swin Transformer decoder. The encoder, featuring a novel Lightweight Feature Extraction Block (LFEB), uses depthwise convolutions and Cross Stage Partial (CSP) connections to enhance feature extraction efficiency. The decoder, based on the Swin Transformer, introduces a sequence-based upsample block (SUB) that focuses on semantic information, improving detail recovery. EHNet achieves state-of-the-art performance on the UCMerced and AID datasets, with a PSNR of 28.02 and 29.44, respectively, and a computational complexity of 2.64 million parameters. The method effectively balances model efficiency and accuracy, making it suitable for various remote sensing applications.

An Efficient Hybrid CNN-Transformer Approach for Remote Sensing Super-Resolution

2024 | Wenjian Zhang, Zheng Tan, Qunbo Lv, Jiaoo Li, Baoyu Zhu, Yangyang Liu