[slides and audio] MIGC%3A Multi-Instance Generation Controller for Text-to-Image Synthesis

The paper introduces a novel approach called Multi-Instance Generation Controller (MIGC) for text-to-image synthesis, which enables precise control over multiple instances in an image. MIGC addresses the challenges of Multi-Instance Generation (MIG) tasks by breaking down the task into subtasks, each focusing on shading a single instance, and then combining the results. The key contributions of MIGC include: 1. **Dividing MIG into Subtasks**: MIGC divides the MIG task into multiple instance-shading subtasks, each handled by the Cross-Attention layers of Stable Diffusion, ensuring efficient and harmonious shading. 2. **Enhancing Shading Results**: An Enhancement Attention Layer is introduced to enhance the shading results obtained from the Cross-Attention, addressing issues like instance merging and missing. 3. **Combining Shading Results**: A Layout Attention layer and a Shading Aggregation Controller are used to combine the shaded instances, ensuring global alignment and precise control over position, attributes, and quantity. The paper evaluates MIGC on the COCO-MIG benchmark, which requires strong control over position, attribute, and quantity, as well as on the COCO and DrawBench benchmarks. Experimental results show that MIGC significantly improves the success rate and accuracy compared to existing methods, while maintaining near-native inference speed. The paper also includes ablation studies to validate the effectiveness of each component of MIGC.The paper introduces a novel approach called Multi-Instance Generation Controller (MIGC) for text-to-image synthesis, which enables precise control over multiple instances in an image. MIGC addresses the challenges of Multi-Instance Generation (MIG) tasks by breaking down the task into subtasks, each focusing on shading a single instance, and then combining the results. The key contributions of MIGC include: 1. **Dividing MIG into Subtasks**: MIGC divides the MIG task into multiple instance-shading subtasks, each handled by the Cross-Attention layers of Stable Diffusion, ensuring efficient and harmonious shading. 2. **Enhancing Shading Results**: An Enhancement Attention Layer is introduced to enhance the shading results obtained from the Cross-Attention, addressing issues like instance merging and missing. 3. **Combining Shading Results**: A Layout Attention layer and a Shading Aggregation Controller are used to combine the shaded instances, ensuring global alignment and precise control over position, attributes, and quantity. The paper evaluates MIGC on the COCO-MIG benchmark, which requires strong control over position, attribute, and quantity, as well as on the COCO and DrawBench benchmarks. Experimental results show that MIGC significantly improves the success rate and accuracy compared to existing methods, while maintaining near-native inference speed. The paper also includes ablation studies to validate the effectiveness of each component of MIGC.

MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis

27 Feb 2024 | Dewei Zhou1, You Li1, Fan Ma1, Xiaoting Zhang2, Yi Yang1†

MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis

27 Feb 2024 | Dewei Zhou1*, You Li1*, Fan Ma1, Xiaoting Zhang2, Yi Yang1†

27 Feb 2024 | Dewei Zhou1, You Li1, Fan Ma1, Xiaoting Zhang2, Yi Yang1†