Efficient Diffusion Model for Image Restoration by Residual Shifting

Efficient Diffusion Model for Image Restoration by Residual Shifting

2024-11-23 | Zongsheng Yue, Jianyi Wang, Chen Change Loy
This paper proposes an efficient diffusion model for image restoration (IR) that significantly reduces the number of diffusion steps required during inference. The proposed method constructs a Markov chain to facilitate transitions between high-quality (HQ) and low-quality (LQ) images by shifting their residuals. This approach improves transition efficiency and allows for a more flexible noise schedule to control the shifting speed and noise strength during the diffusion process. The method achieves superior or comparable performance to current state-of-the-art methods on four classical IR tasks: image super-resolution, image inpainting, blind face restoration, and image deblurring, even with only four sampling steps. The model is publicly available at https://github.com/zsyOAOA/ResShift. The key contributions of this work include: (1) the proposal of an efficient diffusion model specifically for IR, which builds a short Markov chain between HQ and LQ images, enabling fast reverse sampling during inference; (2) the design of a highly flexible noise schedule that precisely controls the transition properties, including the shifting speed and noise level; and (3) the substitution of self-attention layers with Swin Transformer blocks to enhance the model's capability in handling images with varying resolutions. The proposed method is a general diffusion-based framework for IR and can handle various IR tasks. It has been thoroughly validated on four typical and challenging IR tasks, namely image super-resolution, image inpainting, blind face restoration, and image deblurring. The method demonstrates significant improvements in both model design and empirical evaluation across diverse IR tasks compared to previous work. The model incorporates a perceptual loss into the optimization process and substitutes the self-attention layer with a shifted window-based self-attention mechanism from the Swin Transformer, which further reduces the number of diffusion steps and enhances the model's adaptability to arbitrary resolutions during inference.This paper proposes an efficient diffusion model for image restoration (IR) that significantly reduces the number of diffusion steps required during inference. The proposed method constructs a Markov chain to facilitate transitions between high-quality (HQ) and low-quality (LQ) images by shifting their residuals. This approach improves transition efficiency and allows for a more flexible noise schedule to control the shifting speed and noise strength during the diffusion process. The method achieves superior or comparable performance to current state-of-the-art methods on four classical IR tasks: image super-resolution, image inpainting, blind face restoration, and image deblurring, even with only four sampling steps. The model is publicly available at https://github.com/zsyOAOA/ResShift. The key contributions of this work include: (1) the proposal of an efficient diffusion model specifically for IR, which builds a short Markov chain between HQ and LQ images, enabling fast reverse sampling during inference; (2) the design of a highly flexible noise schedule that precisely controls the transition properties, including the shifting speed and noise level; and (3) the substitution of self-attention layers with Swin Transformer blocks to enhance the model's capability in handling images with varying resolutions. The proposed method is a general diffusion-based framework for IR and can handle various IR tasks. It has been thoroughly validated on four typical and challenging IR tasks, namely image super-resolution, image inpainting, blind face restoration, and image deblurring. The method demonstrates significant improvements in both model design and empirical evaluation across diverse IR tasks compared to previous work. The model incorporates a perceptual loss into the optimization process and substitutes the self-attention layer with a shifted window-based self-attention mechanism from the Swin Transformer, which further reduces the number of diffusion steps and enhances the model's adaptability to arbitrary resolutions during inference.
Reach us at info@study.space
[slides and audio] Efficient Diffusion Model for Image Restoration by Residual Shifting