Transfer CLIP for Generalizable Image Denoising

Transfer CLIP for Generalizable Image Denoising

22 Mar 2024 | Jun Cheng, Dong Liang, Shan Tan
This paper proposes a method for generalizable image denoising by leveraging the frozen ResNet image encoder from the CLIP model. The CLIP model has demonstrated strong generalization capabilities in open-world image recognition and segmentation. The authors find that certain dense features extracted from the frozen ResNet image encoder of CLIP exhibit distortion-invariant and content-related properties, which are highly desirable for generalizable denoising. Based on these properties, they design an asymmetrical encoder-decoder denoising network that incorporates dense features, including the noisy image and its multi-scale features from the frozen ResNet encoder of CLIP, into a learnable image decoder to achieve generalizable denoising. They also propose a progressive feature augmentation strategy to mitigate feature overfitting and improve the robustness of the learnable decoder. Extensive experiments on diverse OOD noises, including synthetic noise, real-world sRGB noise, and low-dose CT image noise, demonstrate the superior generalization ability of their method. The results show that their method outperforms existing methods in terms of generalization to OOD noises. The method is simple, effective, and generalizable, and it can be applied to various image denoising tasks. The key contributions of this paper include identifying the distortion-invariant and content-related properties of the frozen ResNet encoder of CLIP, proposing an asymmetrical encoder-decoder denoising network, and introducing a progressive feature augmentation strategy to enhance the robustness of the denoising model.This paper proposes a method for generalizable image denoising by leveraging the frozen ResNet image encoder from the CLIP model. The CLIP model has demonstrated strong generalization capabilities in open-world image recognition and segmentation. The authors find that certain dense features extracted from the frozen ResNet image encoder of CLIP exhibit distortion-invariant and content-related properties, which are highly desirable for generalizable denoising. Based on these properties, they design an asymmetrical encoder-decoder denoising network that incorporates dense features, including the noisy image and its multi-scale features from the frozen ResNet encoder of CLIP, into a learnable image decoder to achieve generalizable denoising. They also propose a progressive feature augmentation strategy to mitigate feature overfitting and improve the robustness of the learnable decoder. Extensive experiments on diverse OOD noises, including synthetic noise, real-world sRGB noise, and low-dose CT image noise, demonstrate the superior generalization ability of their method. The results show that their method outperforms existing methods in terms of generalization to OOD noises. The method is simple, effective, and generalizable, and it can be applied to various image denoising tasks. The key contributions of this paper include identifying the distortion-invariant and content-related properties of the frozen ResNet encoder of CLIP, proposing an asymmetrical encoder-decoder denoising network, and introducing a progressive feature augmentation strategy to enhance the robustness of the denoising model.
Reach us at info@study.space
[slides and audio] Transfer CLIP for Generalizable Image Denoising