20 Aug 2018 | Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro
This paper presents a new method for synthesizing high-resolution photo-realistic images from semantic label maps using conditional generative adversarial networks (conditional GANs). The proposed method improves upon previous approaches by introducing a novel adversarial loss, new multi-scale generator and discriminator architectures, and additional features for interactive visual manipulation. The method generates high-resolution images (2048 × 1024) with more natural textures and details compared to previous work. It also allows users to edit the appearance of individual objects in the scene, such as changing the color of a car or the texture of a road. The method is evaluated against state-of-the-art visual synthesis systems and shows significant improvements in both quantitative evaluations and human perception studies. The method also supports interactive semantic manipulation by incorporating instance-level object segmentation information, which enables object manipulations such as removing/adding objects and changing object categories. Additionally, the method allows generating diverse results given the same input, enabling interactive object editing. The method is tested on various datasets, including Cityscapes, NYU Indoor RGBD, ADE20K, and Helen Face, demonstrating its effectiveness in generating realistic images and supporting interactive editing. The results show that the proposed method outperforms existing methods in terms of image quality, resolution, and realism. The method is also capable of generating diverse outputs and enabling interactive image manipulation given appropriate training input-output pairs. The method is applicable to various domains where high-resolution results are in demand but pre-trained networks are not available.This paper presents a new method for synthesizing high-resolution photo-realistic images from semantic label maps using conditional generative adversarial networks (conditional GANs). The proposed method improves upon previous approaches by introducing a novel adversarial loss, new multi-scale generator and discriminator architectures, and additional features for interactive visual manipulation. The method generates high-resolution images (2048 × 1024) with more natural textures and details compared to previous work. It also allows users to edit the appearance of individual objects in the scene, such as changing the color of a car or the texture of a road. The method is evaluated against state-of-the-art visual synthesis systems and shows significant improvements in both quantitative evaluations and human perception studies. The method also supports interactive semantic manipulation by incorporating instance-level object segmentation information, which enables object manipulations such as removing/adding objects and changing object categories. Additionally, the method allows generating diverse results given the same input, enabling interactive object editing. The method is tested on various datasets, including Cityscapes, NYU Indoor RGBD, ADE20K, and Helen Face, demonstrating its effectiveness in generating realistic images and supporting interactive editing. The results show that the proposed method outperforms existing methods in terms of image quality, resolution, and realism. The method is also capable of generating diverse outputs and enabling interactive image manipulation given appropriate training input-output pairs. The method is applicable to various domains where high-resolution results are in demand but pre-trained networks are not available.