ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models

ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models

4 Mar 2024 | Jiaxiang Cheng, Pan Xie, Xin Xia, Jiashi Li, Jie Wu, Lean Fu, Yuxi Ren, Huixia Li, Xuefeng Xiao
**Abstract:** Recent advancements in text-to-image models, such as Stable Diffusion, and personalized technologies like DreamBooth and LoRA, have enabled the generation of high-quality and imaginative images. However, these models often struggle with generating images at resolutions outside their trained domain. To address this limitation, we propose ResAdapter, a domain-consistent adapter designed for diffusion models to generate images with unrestricted resolutions and aspect ratios. Unlike other multi-resolution generation methods that require complex post-processing operations, ResAdapter directly generates images with dynamic resolutions. By learning resolution priors, ResAdapter can be integrated into diffusion models to generate images with flexible resolutions and aspect ratios while preserving their original style domain. Comprehensive experiments demonstrate that ResAdapter, trained with only 0.5M parameters, can process images with flexible resolutions for various diffusion models. Additionally, ResAdapter is compatible with other modules (e.g., ControlNet, IP-Adapter, and LCM-LoRA) and can be integrated into multi-resolution models (e.g., ElasticDiffusion) to efficiently generate higher-resolution images. **Introduction:** Diffusion models, particularly Stable Diffusion and SDXL, have become powerful tools for generating high-resolution images. However, they often suffer from limitations when generating images at resolutions outside their trained domain. Existing solutions, such as post-processing methods and fine-tuning on broader resolutions, either introduce complex post-processing steps or transform the style domain of the models. ResAdapter addresses these issues by enabling diffusion models to generate images with unrestricted resolutions and aspect ratios without altering their original style domain. **Method:** ResAdapter consists of two main components: ResCLORA and ResENorm. ResCLORA is inserted into the convolution layers of UNet's blocks to learn resolution priors, while ResENorm is used for resolution extrapolation to improve the quality of high-resolution images. A mixed-resolution training strategy is proposed to enable multi-resolution image generation for single ResAdapter, preventing catastrophic forgetting. **Experiments:** ResAdapter is evaluated on various personalized models and compared with other multi-resolution generation models. Results show that ResAdapter generates high-quality images at unrestricted resolutions and aspect ratios, outperforming existing methods in terms of image quality and efficiency. Extended experiments demonstrate the compatibility of ResAdapter with other modules and its ability to optimize inference time in multi-resolution models. **Conclusion:** ResAdapter is a plug-and-play domain-consistent adapter for diffusion models, enabling the generation of images with unrestricted resolutions and aspect ratios while preserving the original style domain. Its lightweight nature and compatibility with other modules make it a valuable tool for advanced image generation tasks.**Abstract:** Recent advancements in text-to-image models, such as Stable Diffusion, and personalized technologies like DreamBooth and LoRA, have enabled the generation of high-quality and imaginative images. However, these models often struggle with generating images at resolutions outside their trained domain. To address this limitation, we propose ResAdapter, a domain-consistent adapter designed for diffusion models to generate images with unrestricted resolutions and aspect ratios. Unlike other multi-resolution generation methods that require complex post-processing operations, ResAdapter directly generates images with dynamic resolutions. By learning resolution priors, ResAdapter can be integrated into diffusion models to generate images with flexible resolutions and aspect ratios while preserving their original style domain. Comprehensive experiments demonstrate that ResAdapter, trained with only 0.5M parameters, can process images with flexible resolutions for various diffusion models. Additionally, ResAdapter is compatible with other modules (e.g., ControlNet, IP-Adapter, and LCM-LoRA) and can be integrated into multi-resolution models (e.g., ElasticDiffusion) to efficiently generate higher-resolution images. **Introduction:** Diffusion models, particularly Stable Diffusion and SDXL, have become powerful tools for generating high-resolution images. However, they often suffer from limitations when generating images at resolutions outside their trained domain. Existing solutions, such as post-processing methods and fine-tuning on broader resolutions, either introduce complex post-processing steps or transform the style domain of the models. ResAdapter addresses these issues by enabling diffusion models to generate images with unrestricted resolutions and aspect ratios without altering their original style domain. **Method:** ResAdapter consists of two main components: ResCLORA and ResENorm. ResCLORA is inserted into the convolution layers of UNet's blocks to learn resolution priors, while ResENorm is used for resolution extrapolation to improve the quality of high-resolution images. A mixed-resolution training strategy is proposed to enable multi-resolution image generation for single ResAdapter, preventing catastrophic forgetting. **Experiments:** ResAdapter is evaluated on various personalized models and compared with other multi-resolution generation models. Results show that ResAdapter generates high-quality images at unrestricted resolutions and aspect ratios, outperforming existing methods in terms of image quality and efficiency. Extended experiments demonstrate the compatibility of ResAdapter with other modules and its ability to optimize inference time in multi-resolution models. **Conclusion:** ResAdapter is a plug-and-play domain-consistent adapter for diffusion models, enabling the generation of images with unrestricted resolutions and aspect ratios while preserving the original style domain. Its lightweight nature and compatibility with other modules make it a valuable tool for advanced image generation tasks.
Reach us at info@study.space
[slides and audio] ResAdapter%3A Domain Consistent Resolution Adapter for Diffusion Models