This paper explores the potential of leveraging the CLIP model for generalizable image denoising, particularly focusing on its ability to handle out-of-distribution (OOD) noise. The authors identify that the dense features extracted from the frozen ResNet encoder of CLIP exhibit distortion-invariant and content-related properties, which are crucial for robust denoising. They propose an asymmetrical encoder-decoder denoising network that incorporates these dense features, including the noisy image and its multi-scale features, into a learnable image decoder. To mitigate feature overfitting and improve robustness, they introduce a progressive feature augmentation strategy. Extensive experiments on various OOD noises, such as synthetic noise, real-world sRGB noise, and low-dose CT image noise, demonstrate the superior generalization capabilities of their method, CLIPdenoising. The paper also discusses the limitations and future directions, including the robustness of the CLIP RN50 encoder to noise and the potential of other self-supervised representations.This paper explores the potential of leveraging the CLIP model for generalizable image denoising, particularly focusing on its ability to handle out-of-distribution (OOD) noise. The authors identify that the dense features extracted from the frozen ResNet encoder of CLIP exhibit distortion-invariant and content-related properties, which are crucial for robust denoising. They propose an asymmetrical encoder-decoder denoising network that incorporates these dense features, including the noisy image and its multi-scale features, into a learnable image decoder. To mitigate feature overfitting and improve robustness, they introduce a progressive feature augmentation strategy. Extensive experiments on various OOD noises, such as synthetic noise, real-world sRGB noise, and low-dose CT image noise, demonstrate the superior generalization capabilities of their method, CLIPdenoising. The paper also discusses the limitations and future directions, including the robustness of the CLIP RN50 encoder to noise and the potential of other self-supervised representations.