Understanding Upsample Guidance%3A Scale Up Diffusion Models without Training

Diffusion models have shown superior performance in various generative tasks, but they struggle with generating high-resolution samples. Previous solutions involve modifying the architecture, additional training, or partitioning the sampling process, which require extra work and do not utilize pre-trained models effectively. This paper introduces *upsample guidance*, a technique that adapts pre-trained diffusion models to generate higher-resolution images by adding a single term in the sampling process without requiring additional training or external models. Upparse guidance can be applied to various models, including pixel-space, latent space, and video diffusion models. The proper selection of guidance scale can improve image quality, fidelity, and prompt alignment. The method is demonstrated to be effective across different models and applications, showing improved results in spatial and temporal upsampling in video generation. Experiments also highlight the importance of time and power adjustments in the upsample guidance process, and a quantitative analysis of the guidance scale is provided to help design optimal settings. Overall, upsample guidance is a training-free technique that enables high-fidelity image generation at resolutions not originally trained on, with minimal computational overhead.Diffusion models have shown superior performance in various generative tasks, but they struggle with generating high-resolution samples. Previous solutions involve modifying the architecture, additional training, or partitioning the sampling process, which require extra work and do not utilize pre-trained models effectively. This paper introduces *upsample guidance*, a technique that adapts pre-trained diffusion models to generate higher-resolution images by adding a single term in the sampling process without requiring additional training or external models. Upparse guidance can be applied to various models, including pixel-space, latent space, and video diffusion models. The proper selection of guidance scale can improve image quality, fidelity, and prompt alignment. The method is demonstrated to be effective across different models and applications, showing improved results in spatial and temporal upsampling in video generation. Experiments also highlight the importance of time and power adjustments in the upsample guidance process, and a quantitative analysis of the guidance scale is provided to help design optimal settings. Overall, upsample guidance is a training-free technique that enables high-fidelity image generation at resolutions not originally trained on, with minimal computational overhead.

Upsample Guidance: Scale Up Diffusion Models without Training

2 Apr 2024 | Juno Hwang, Yong-Hyun Park, Junghyo Jo