14 Jun 2024 | Rongyuan Wu, Lingchen Sun, Zhiyuan Ma, Lei Zhang
The paper introduces OSEDiff, a one-step effective diffusion network for real-world image super-resolution (Real-ISR). Unlike traditional multi-step diffusion models, OSEDiff directly uses the given low-quality (LQ) image as the starting point for diffusion, eliminating the uncertainty introduced by random noise. The pre-trained text-to-image (T2I) model, specifically Stable Diffusion (SD), is fine-tuned with trainable LoRA layers to adapt to complex real-world image degradations. Variational score distillation (VSD) is applied in the latent space to ensure that the model's predicted scores align with those of multi-step pre-trained models, enabling OSEDiff to efficiently generate high-quality (HQ) images in just one diffusion step. Experiments demonstrate that OSEDiff achieves comparable or superior Real-ISR results in terms of both objective metrics and subjective evaluations compared to previous multi-step diffusion-based methods, while significantly reducing the number of inference steps and computational cost.The paper introduces OSEDiff, a one-step effective diffusion network for real-world image super-resolution (Real-ISR). Unlike traditional multi-step diffusion models, OSEDiff directly uses the given low-quality (LQ) image as the starting point for diffusion, eliminating the uncertainty introduced by random noise. The pre-trained text-to-image (T2I) model, specifically Stable Diffusion (SD), is fine-tuned with trainable LoRA layers to adapt to complex real-world image degradations. Variational score distillation (VSD) is applied in the latent space to ensure that the model's predicted scores align with those of multi-step pre-trained models, enabling OSEDiff to efficiently generate high-quality (HQ) images in just one diffusion step. Experiments demonstrate that OSEDiff achieves comparable or superior Real-ISR results in terms of both objective metrics and subjective evaluations compared to previous multi-step diffusion-based methods, while significantly reducing the number of inference steps and computational cost.