Upsample Guidance: Scale Up Diffusion Models without Training

Upsample Guidance: Scale Up Diffusion Models without Training

2 Apr 2024 | Juno Hwang, Yong-Hyun Park, Jungyho Jo
This paper introduces upsample guidance, a technique that enables high-resolution image generation without additional training or external models. The method adapts pre-trained diffusion models to generate higher-resolution images by adding a single term in the sampling process. It is applicable to various models, including pixel-space, latent-space, and video diffusion models. Proper selection of guidance scale improves image quality, fidelity, and prompt alignment. The core concept of upsample guidance is based on SNR matching during the downsampling process. By adjusting time and power, the method ensures consistency across different resolutions. It can be applied to any diffusion model, including LDMs, and is compatible with techniques like SDEdit, ControlNet, LoRA, and IP-Adapter. The method also allows generating images at resolutions not originally trained on, such as 64² images of CIFAR-10. The paper demonstrates the effectiveness of upsample guidance across various image generation models and applications, including spatial and temporal upsampling in video generation. It also conducts an ablation study to evaluate the impact of time and power adjustments, showing that both are essential for image quality. The guidance scale is analyzed, and it is found that for LDMs, the scale should be reduced near t=0 to avoid artifacts. The computational cost of upsample guidance is minimal, making it an efficient method for high-resolution image generation. The paper concludes that upsample guidance is a versatile and effective technique for generating high-fidelity images at high resolutions without requiring additional training or external models.This paper introduces upsample guidance, a technique that enables high-resolution image generation without additional training or external models. The method adapts pre-trained diffusion models to generate higher-resolution images by adding a single term in the sampling process. It is applicable to various models, including pixel-space, latent-space, and video diffusion models. Proper selection of guidance scale improves image quality, fidelity, and prompt alignment. The core concept of upsample guidance is based on SNR matching during the downsampling process. By adjusting time and power, the method ensures consistency across different resolutions. It can be applied to any diffusion model, including LDMs, and is compatible with techniques like SDEdit, ControlNet, LoRA, and IP-Adapter. The method also allows generating images at resolutions not originally trained on, such as 64² images of CIFAR-10. The paper demonstrates the effectiveness of upsample guidance across various image generation models and applications, including spatial and temporal upsampling in video generation. It also conducts an ablation study to evaluate the impact of time and power adjustments, showing that both are essential for image quality. The guidance scale is analyzed, and it is found that for LDMs, the scale should be reduced near t=0 to avoid artifacts. The computational cost of upsample guidance is minimal, making it an efficient method for high-resolution image generation. The paper concludes that upsample guidance is a versatile and effective technique for generating high-fidelity images at high resolutions without requiring additional training or external models.
Reach us at info@study.space
[slides] Upsample Guidance%3A Scale Up Diffusion Models without Training | StudySpace