Diffusion Enhancement for Cloud Removal in Ultra-Resolution Remote Sensing Imagery

Diffusion Enhancement for Cloud Removal in Ultra-Resolution Remote Sensing Imagery

25 Jan 2024 | Jialu Sui, Yiyang Ma, Wenhan Yang, Member, IEEE, Xiaokang Zhang, Member, IEEE, Man-On Pun, Senior Member, IEEE, and Jiaying Liu, Senior Member, IEEE
This paper proposes a novel diffusion-based framework called Diffusion Enhancement (DE) for cloud removal in ultra-resolution remote sensing imagery. The presence of clouds significantly degrades the quality and effectiveness of optical remote sensing (RS) images. Existing deep-learning (DL)-based cloud removal (CR) techniques struggle to accurately reconstruct the original visual authenticity and detailed semantic content of the images. To address this challenge, the authors propose a new ultra-resolution benchmark named CUHK Cloud Removal (CUHK-CR) with 0.5 m spatial resolution, incorporating rich detailed textures and diverse cloud coverage. This benchmark serves as a robust foundation for designing and assessing CR models. The DE framework is designed to perform progressive texture detail recovery, mitigating training difficulty and improving inference accuracy. A Weight Allocation (WA) network is developed to dynamically adjust the weights for feature fusion, enhancing performance, particularly in ultra-resolution image generation. A coarse-to-fine training strategy is applied to expedite training convergence while reducing computational complexity. The DE framework outperforms existing DL-based methods in terms of both perceptual quality and signal fidelity, as demonstrated by extensive experiments on the newly established CUHK-CR and existing datasets such as RICE. The proposed DE network merges global visual information with progressive diffusion recovery, offering enhanced capability of capturing data distribution. It excels in predicting detailed information by utilizing reference visual prior during the inference process. The WA network computes adaptive weighting coefficients for the fusion of the reference visual prior and intermediate denoising images derived from the diffusion models. The reference visual prior refinement predominantly contributes to coarse-grained content reconstruction in the initial steps, while the diffusion model focuses on incorporating rich details in the subsequent stages. A coarse-to-fine training strategy is applied to stabilize the training while accelerating the convergence speed of DE. The authors also establish an ultra-resolution benchmark called CUHK-CR to evaluate CR methods against different types of cloud coverage. The benchmark consists of 668 images of thin clouds and 559 images of thick clouds with multispectral information. The data and code can be downloaded from GitHub. The proposed DE network is evaluated on the RICE and CUHK-CR datasets, demonstrating superior performance in terms of PSNR, SSIM, and LPIPS. The results show that the DE framework significantly enhances the generation of fine textures, closely matching the ground truth. The WA network dynamically adjusts the weighting matrix based on image features and noise strength, improving the denoising process and encouraging the diffusion model to focus on generating more detailed texture information. The authors also conduct an ablation study to evaluate the impact of the coarse-to-fine training strategy, WA, and reference visual prior integration. The results show that the inclusion of WA and reference visual prior refinement leads to a significant improvement in PSNR, SSIM, and LPIPS. The coarse-to-fine training strategy accelerates convergence while obtaining superior results within a limited number of iterations. The computational complexity analysis shows that theThis paper proposes a novel diffusion-based framework called Diffusion Enhancement (DE) for cloud removal in ultra-resolution remote sensing imagery. The presence of clouds significantly degrades the quality and effectiveness of optical remote sensing (RS) images. Existing deep-learning (DL)-based cloud removal (CR) techniques struggle to accurately reconstruct the original visual authenticity and detailed semantic content of the images. To address this challenge, the authors propose a new ultra-resolution benchmark named CUHK Cloud Removal (CUHK-CR) with 0.5 m spatial resolution, incorporating rich detailed textures and diverse cloud coverage. This benchmark serves as a robust foundation for designing and assessing CR models. The DE framework is designed to perform progressive texture detail recovery, mitigating training difficulty and improving inference accuracy. A Weight Allocation (WA) network is developed to dynamically adjust the weights for feature fusion, enhancing performance, particularly in ultra-resolution image generation. A coarse-to-fine training strategy is applied to expedite training convergence while reducing computational complexity. The DE framework outperforms existing DL-based methods in terms of both perceptual quality and signal fidelity, as demonstrated by extensive experiments on the newly established CUHK-CR and existing datasets such as RICE. The proposed DE network merges global visual information with progressive diffusion recovery, offering enhanced capability of capturing data distribution. It excels in predicting detailed information by utilizing reference visual prior during the inference process. The WA network computes adaptive weighting coefficients for the fusion of the reference visual prior and intermediate denoising images derived from the diffusion models. The reference visual prior refinement predominantly contributes to coarse-grained content reconstruction in the initial steps, while the diffusion model focuses on incorporating rich details in the subsequent stages. A coarse-to-fine training strategy is applied to stabilize the training while accelerating the convergence speed of DE. The authors also establish an ultra-resolution benchmark called CUHK-CR to evaluate CR methods against different types of cloud coverage. The benchmark consists of 668 images of thin clouds and 559 images of thick clouds with multispectral information. The data and code can be downloaded from GitHub. The proposed DE network is evaluated on the RICE and CUHK-CR datasets, demonstrating superior performance in terms of PSNR, SSIM, and LPIPS. The results show that the DE framework significantly enhances the generation of fine textures, closely matching the ground truth. The WA network dynamically adjusts the weighting matrix based on image features and noise strength, improving the denoising process and encouraging the diffusion model to focus on generating more detailed texture information. The authors also conduct an ablation study to evaluate the impact of the coarse-to-fine training strategy, WA, and reference visual prior integration. The results show that the inclusion of WA and reference visual prior refinement leads to a significant improvement in PSNR, SSIM, and LPIPS. The coarse-to-fine training strategy accelerates convergence while obtaining superior results within a limited number of iterations. The computational complexity analysis shows that the
Reach us at info@study.space
[slides and audio] Diffusion Enhancement for Cloud Removal in Ultra-Resolution Remote Sensing Imagery