5 Feb 2024 | Xudong Wang, Trevor Darrell, Sai Saketh Rambhatla, Rohit Girdhar, Ishan Misra
InstanceDiffusion is a novel approach that enhances text-to-image diffusion models by adding precise instance-level control. It supports flexible location specifications, including points, scribbles, bounding boxes, and instance masks, allowing for detailed control over the positions and attributes of instances in generated images. The model introduces three key components: UniFusion, which fuses instance-level conditions with the backbone features; ScaleU, which improves image fidelity by re-calibrating feature maps; and Multi-instance Sampler, which reduces information leakage across multiple instances. InstanceDiffusion significantly outperforms state-of-the-art models in various evaluation metrics, demonstrating superior performance in aligning with instance locations and attributes. The method also enables iterative image generation, allowing users to selectively insert objects into precise locations while preserving the integrity of previously generated objects.InstanceDiffusion is a novel approach that enhances text-to-image diffusion models by adding precise instance-level control. It supports flexible location specifications, including points, scribbles, bounding boxes, and instance masks, allowing for detailed control over the positions and attributes of instances in generated images. The model introduces three key components: UniFusion, which fuses instance-level conditions with the backbone features; ScaleU, which improves image fidelity by re-calibrating feature maps; and Multi-instance Sampler, which reduces information leakage across multiple instances. InstanceDiffusion significantly outperforms state-of-the-art models in various evaluation metrics, demonstrating superior performance in aligning with instance locations and attributes. The method also enables iterative image generation, allowing users to selectively insert objects into precise locations while preserving the integrity of previously generated objects.