[slides and audio] NoiseCollage%3A A Layout-Aware Text-to-Image Diffusion Model Based on Noise Cropping and Merging

NoiseCollage is a novel layout-aware text-to-image diffusion model that addresses the limitations of existing models by independently estimating noises for individual objects and then cropping and merging them into a single noise. This approach helps avoid mismatches between text and layout conditions, ensuring that objects are placed correctly in the generated images. The model uses a masked cross-attention mechanism to accurately localize visual information around the specified regions, enhancing the accuracy of the generated images. Qualitative and quantitative evaluations show that NoiseCollage outperforms several state-of-the-art models in generating high-quality, multi-object images that accurately reflect both text and layout conditions. Additionally, NoiseCollage can be integrated with ControlNet to use edges, sketches, and pose skeletons as additional conditions, further improving layout accuracy. The code for NoiseCollage is available at <https://github.com/univ-esuty/noisecollage>.NoiseCollage is a novel layout-aware text-to-image diffusion model that addresses the limitations of existing models by independently estimating noises for individual objects and then cropping and merging them into a single noise. This approach helps avoid mismatches between text and layout conditions, ensuring that objects are placed correctly in the generated images. The model uses a masked cross-attention mechanism to accurately localize visual information around the specified regions, enhancing the accuracy of the generated images. Qualitative and quantitative evaluations show that NoiseCollage outperforms several state-of-the-art models in generating high-quality, multi-object images that accurately reflect both text and layout conditions. Additionally, NoiseCollage can be integrated with ControlNet to use edges, sketches, and pose skeletons as additional conditions, further improving layout accuracy. The code for NoiseCollage is available at <https://github.com/univ-esuty/noisecollage>.

NoiseCollage: A Layout-Aware Text-to-Image Diffusion Model Based on Noise Cropping and Merging

6 Mar 2024 | Takahiro Shirakawa, Seiichi Uchida