[slides and audio] SatSynth%3A Augmenting Image-Mask Pairs Through Diffusion Models for Aerial Semantic Segmentation

**SatSynth: Augmenting Image-Mask Pairs through Diffusion Models for Aerial Semantic Segmentation** This paper addresses the challenge of semantic segmentation for earth observation data, particularly satellite imagery, which is characterized by high visual similarity and scale variations between different object categories. The authors propose a novel approach that leverages generative diffusion models to enhance the training set for semantic segmentation. By approximating the joint distribution of images and semantic labels, the model generates novel training samples, which are then integrated with the original dataset to improve segmentation performance. **Contributions:** 1. **Joint Data Distribution Approximation:** The authors learn the joint distribution \( p(\mathbf{x}, \mathbf{y}) \) of images \(\mathbf{x}\) and labels \(\mathbf{y}\) using a diffusion model \(\mathcal{G}\). 2. **Data Augmentation:** The generated samples are used to augment the training set, enhancing downstream semantic segmentation tasks. 3. **Quantitative Improvements:** The approach significantly improves segmentation accuracy on three satellite benchmarks compared to baselines and when trained only on the original data. **Related Work:** - **Diffusion Models:** These models are effective for image generation, offering high-quality samples similar to GANs but with less mode collapse. - **Semantic Segmentation:** Traditional methods often require extensive manual annotation, making data augmentation crucial. - **Synthesizing Training Data:** Previous work has explored using generative models to synthesize labeled data, but this approach is limited by the complexity and cost of training. **Method:** - **Problem Statement:** The task involves semantic segmentation of earth observation data, focusing on a dataset \(\mathcal{D}\) of satellite images and corresponding semantic maps. - **Discrete Labels in Bit-Space:** Discrete labels are modeled in binary code to improve stability and reduce dimensionality. - **Synthesizing Satellite Segmentation Data:** The diffusion model \(\mathcal{G}\) generates novel training samples by jointly generating images and labels in bit space. - **Image Super-Resolution:** A conditional variant of the diffusion model is used to upsample low-resolution samples to higher resolutions, enhancing detail in satellite imagery. **Experiments:** - **Datasets:** The approach is evaluated on three earth observation benchmarks: iSAID, LoveDA, and OpenEarthMap. - **Visual Sample Quality:** The generated images are compared to those from alternative generative approaches, showing superior visual quality. - **Generative Segmentation:** The augmented training set significantly improves segmentation accuracy compared to the original dataset. - **Ablation Study:** The impact of specific design choices, such as binary label encoding and prediction type, is analyzed. **Conclusion:** The proposed method leverages generative diffusion models to enhance the training set for semantic segmentation, demonstrating significant improvements in both visual quality and segmentation accuracy. This approach has broad implications for data synthesis and augmentation tasks in**SatSynth: Augmenting Image-Mask Pairs through Diffusion Models for Aerial Semantic Segmentation** This paper addresses the challenge of semantic segmentation for earth observation data, particularly satellite imagery, which is characterized by high visual similarity and scale variations between different object categories. The authors propose a novel approach that leverages generative diffusion models to enhance the training set for semantic segmentation. By approximating the joint distribution of images and semantic labels, the model generates novel training samples, which are then integrated with the original dataset to improve segmentation performance. **Contributions:** 1. **Joint Data Distribution Approximation:** The authors learn the joint distribution \( p(\mathbf{x}, \mathbf{y}) \) of images \(\mathbf{x}\) and labels \(\mathbf{y}\) using a diffusion model \(\mathcal{G}\). 2. **Data Augmentation:** The generated samples are used to augment the training set, enhancing downstream semantic segmentation tasks. 3. **Quantitative Improvements:** The approach significantly improves segmentation accuracy on three satellite benchmarks compared to baselines and when trained only on the original data. **Related Work:** - **Diffusion Models:** These models are effective for image generation, offering high-quality samples similar to GANs but with less mode collapse. - **Semantic Segmentation:** Traditional methods often require extensive manual annotation, making data augmentation crucial. - **Synthesizing Training Data:** Previous work has explored using generative models to synthesize labeled data, but this approach is limited by the complexity and cost of training. **Method:** - **Problem Statement:** The task involves semantic segmentation of earth observation data, focusing on a dataset \(\mathcal{D}\) of satellite images and corresponding semantic maps. - **Discrete Labels in Bit-Space:** Discrete labels are modeled in binary code to improve stability and reduce dimensionality. - **Synthesizing Satellite Segmentation Data:** The diffusion model \(\mathcal{G}\) generates novel training samples by jointly generating images and labels in bit space. - **Image Super-Resolution:** A conditional variant of the diffusion model is used to upsample low-resolution samples to higher resolutions, enhancing detail in satellite imagery. **Experiments:** - **Datasets:** The approach is evaluated on three earth observation benchmarks: iSAID, LoveDA, and OpenEarthMap. - **Visual Sample Quality:** The generated images are compared to those from alternative generative approaches, showing superior visual quality. - **Generative Segmentation:** The augmented training set significantly improves segmentation accuracy compared to the original dataset. - **Ablation Study:** The impact of specific design choices, such as binary label encoding and prediction type, is analyzed. **Conclusion:** The proposed method leverages generative diffusion models to enhance the training set for semantic segmentation, demonstrating significant improvements in both visual quality and segmentation accuracy. This approach has broad implications for data synthesis and augmentation tasks in

SatSynth: Augmenting Image-Mask Pairs through Diffusion Models for Aerial Semantic Segmentation

25 Mar 2024 | Aysim Toker, Marvin Eisenberger, Daniel Cremers, Laura Leal-Taixé