A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask Inpainting

A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask Inpainting

16 Jul 2024 | Wouter Van Gansbeke* and Bert De Brabandere
This paper introduces LDMSeg, a simple and powerful latent diffusion approach for panoptic segmentation and mask inpainting. Building upon Stable Diffusion, LDMSeg aims to simplify the complex and specialized modules typically required in panoptic segmentation networks. The approach consists of two main stages: (1) training a shallow autoencoder to project segmentation masks into latent space, and (2) training a diffusion model to allow image-conditioned sampling in latent space. This generative framework bypasses the need for specialized architectures, complex loss functions, and object detection modules, making it more computationally efficient and easier to use. The method is validated on the COCO and ADE20k datasets, demonstrating strong segmentation results and the ability to handle mask inpainting tasks. Additionally, LDMSeg can be extended to multi-task learning by introducing learnable task embeddings, making it versatile for various dense prediction tasks. The code and models are available for public use.This paper introduces LDMSeg, a simple and powerful latent diffusion approach for panoptic segmentation and mask inpainting. Building upon Stable Diffusion, LDMSeg aims to simplify the complex and specialized modules typically required in panoptic segmentation networks. The approach consists of two main stages: (1) training a shallow autoencoder to project segmentation masks into latent space, and (2) training a diffusion model to allow image-conditioned sampling in latent space. This generative framework bypasses the need for specialized architectures, complex loss functions, and object detection modules, making it more computationally efficient and easier to use. The method is validated on the COCO and ADE20k datasets, demonstrating strong segmentation results and the ability to handle mask inpainting tasks. Additionally, LDMSeg can be extended to multi-task learning by introducing learnable task embeddings, making it versatile for various dense prediction tasks. The code and models are available for public use.
Reach us at info@study.space