Understanding OMG%3A Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models

OMG (Occlusion-friendly Personalized Multi-concept Generation) is a novel framework designed to address the challenges of multi-concept personalization in text-to-image generation, particularly focusing on identity preservation and occlusion handling. The method employs a two-stage sampling process: the first stage generates an image with a coherent layout and visual comprehension information, while the second stage integrates multiple concepts while considering occlusions through a concept noise blending strategy. Key contributions include: 1. **Two-Stage Framework**: The first stage generates an image with a coherent layout and visual comprehension information, while the second stage integrates multiple concepts using concept noise blending. 2. **Concept Noise Blending**: This strategy merges multiple noises from different single-concept models during sampling, effectively addressing identity degradation and occlusion issues. 3. **Computational Efficiency**: OMG can be combined with various single-concept models (e.g., LoRA and InstantID) without additional tuning, making it computationally efficient. 4. **Performance**: Extensive experiments demonstrate superior performance in multi-concept personalization, with better identity alignment and harmonious illumination compared to existing methods. The method is evaluated on datasets encompassing various concepts and compared with state-of-the-art techniques, showing its effectiveness in generating high-quality images with multiple concepts while preserving identity and handling occlusions.OMG (Occlusion-friendly Personalized Multi-concept Generation) is a novel framework designed to address the challenges of multi-concept personalization in text-to-image generation, particularly focusing on identity preservation and occlusion handling. The method employs a two-stage sampling process: the first stage generates an image with a coherent layout and visual comprehension information, while the second stage integrates multiple concepts while considering occlusions through a concept noise blending strategy. Key contributions include: 1. **Two-Stage Framework**: The first stage generates an image with a coherent layout and visual comprehension information, while the second stage integrates multiple concepts using concept noise blending. 2. **Concept Noise Blending**: This strategy merges multiple noises from different single-concept models during sampling, effectively addressing identity degradation and occlusion issues. 3. **Computational Efficiency**: OMG can be combined with various single-concept models (e.g., LoRA and InstantID) without additional tuning, making it computationally efficient. 4. **Performance**: Extensive experiments demonstrate superior performance in multi-concept personalization, with better identity alignment and harmonious illumination compared to existing methods. The method is evaluated on datasets encompassing various concepts and compared with state-of-the-art techniques, showing its effectiveness in generating high-quality images with multiple concepts while preserving identity and handling occlusions.

OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models

20 Jul 2024 | Zhe Kong, Yong Zhang, Tianyu Yang, Tao Wang, Kaihao Zhang, Bizhu Wu, Guanying Chen, Wei Liu, and Wenhan Luo