[slides and audio] Semantic Image Synthesis With Spatially-Adaptive Normalization

This paper introduces a novel method for semantic image synthesis called Spatially-Adaptive Normalization (SPADE). The method allows users to control both semantic and style aspects when generating images. The key idea is to use a learned transformation to modulate the activations in normalization layers based on the input semantic layout, which helps preserve semantic information that would otherwise be lost in traditional normalization layers. The proposed method outperforms existing approaches in terms of visual fidelity and alignment with input layouts. It is trained on several challenging datasets including COCO-Stuff, ADE20K, and Cityscapes. The model achieves better results than state-of-the-art methods, particularly in terms of segmentation accuracy and Fréchet Inception Distance (FID) scores. The method also supports multi-modal and style-guided image synthesis, enabling controllable and diverse outputs. SPADE is implemented as a conditional normalization layer that modulates the activations using input semantic layouts through a spatially-adaptive, learned transformation. This allows the model to effectively propagate semantic information throughout the network. The generator architecture uses several ResNet blocks with upsampling layers, and the modulation parameters of all normalization layers are learned using SPADE. The method is evaluated on multiple datasets and shows superior performance compared to existing approaches. It is also capable of generating diverse scenes with high image fidelity. The model can be used for semantic manipulation and guided image synthesis, allowing users to control the global appearance of the output image using an external style image. The paper also includes an extensive ablation study that demonstrates the effectiveness of the proposed normalization layer against several variants for the semantic image synthesis task. The results show that the SPADE generator works well with different configurations, including varying the input of the generator, the convolutional kernel size acting on the segmentation map, the capacity of the network, and the parameter-free normalization method. Overall, the proposed method provides a more effective way to synthesize photorealistic images from semantic layouts, with the ability to control both semantic and style aspects of the generated images. The model is implemented in Python and is available at https://github.com/NVlabs/SPADE.This paper introduces a novel method for semantic image synthesis called Spatially-Adaptive Normalization (SPADE). The method allows users to control both semantic and style aspects when generating images. The key idea is to use a learned transformation to modulate the activations in normalization layers based on the input semantic layout, which helps preserve semantic information that would otherwise be lost in traditional normalization layers. The proposed method outperforms existing approaches in terms of visual fidelity and alignment with input layouts. It is trained on several challenging datasets including COCO-Stuff, ADE20K, and Cityscapes. The model achieves better results than state-of-the-art methods, particularly in terms of segmentation accuracy and Fréchet Inception Distance (FID) scores. The method also supports multi-modal and style-guided image synthesis, enabling controllable and diverse outputs. SPADE is implemented as a conditional normalization layer that modulates the activations using input semantic layouts through a spatially-adaptive, learned transformation. This allows the model to effectively propagate semantic information throughout the network. The generator architecture uses several ResNet blocks with upsampling layers, and the modulation parameters of all normalization layers are learned using SPADE. The method is evaluated on multiple datasets and shows superior performance compared to existing approaches. It is also capable of generating diverse scenes with high image fidelity. The model can be used for semantic manipulation and guided image synthesis, allowing users to control the global appearance of the output image using an external style image. The paper also includes an extensive ablation study that demonstrates the effectiveness of the proposed normalization layer against several variants for the semantic image synthesis task. The results show that the SPADE generator works well with different configurations, including varying the input of the generator, the convolutional kernel size acting on the segmentation map, the capacity of the network, and the parameter-free normalization method. Overall, the proposed method provides a more effective way to synthesize photorealistic images from semantic layouts, with the ability to control both semantic and style aspects of the generated images. The model is implemented in Python and is available at https://github.com/NVlabs/SPADE.

Semantic Image Synthesis with Spatially-Adaptive Normalization

5 Nov 2019 | Taesung Park, Ming-Yu Liu, Ting-Chun Wang, Jun-Yan Zhu