2 Jul 2024 | Dewei Zhou, You Li, Fan Ma, Zongxin Yang, and Yi Yang
The paper introduces the Multi-Instance Generation (MIG) task, which focuses on generating multiple instances within a single image, each placed at predefined positions with specific attributes. The main challenges in MIG include attribute leakage between instances, limited instance descriptions, and inconsistent iterative generation. To address these issues, the authors propose the Multi-Instance Generation Controller (MIGC), which uses a divide-and-conquer strategy to generate instances through single-instance tasks with singular attributes. They further enhance MIGC with MIGC++, which allows for more flexible instance descriptions using text and images, and employs boxes and masks for positioning. Additionally, they introduce the Consistent-MIG algorithm to maintain consistency in unmodified regions during iterative generation. The paper evaluates these methods using the COCO-MIG and Multimodal-MIG benchmarks, demonstrating significant improvements over existing techniques in terms of instance success ratio, mean intersection over union, and average precision. The authors also provide a detailed methodology, implementation details, and experimental results to support their claims.The paper introduces the Multi-Instance Generation (MIG) task, which focuses on generating multiple instances within a single image, each placed at predefined positions with specific attributes. The main challenges in MIG include attribute leakage between instances, limited instance descriptions, and inconsistent iterative generation. To address these issues, the authors propose the Multi-Instance Generation Controller (MIGC), which uses a divide-and-conquer strategy to generate instances through single-instance tasks with singular attributes. They further enhance MIGC with MIGC++, which allows for more flexible instance descriptions using text and images, and employs boxes and masks for positioning. Additionally, they introduce the Consistent-MIG algorithm to maintain consistency in unmodified regions during iterative generation. The paper evaluates these methods using the COCO-MIG and Multimodal-MIG benchmarks, demonstrating significant improvements over existing techniques in terms of instance success ratio, mean intersection over union, and average precision. The authors also provide a detailed methodology, implementation details, and experimental results to support their claims.