Toward Multimodal Image-to-Image Translation

Toward Multimodal Image-to-Image Translation

24 Oct 2018 | Jun-Yan Zhu, Richard Zhang, Deepak Pathak, Trevor Darrell, Alexei A. Efros, Oliver Wang, Eli Shechtman
The paper "Toward Multimodal Image-to-Image Translation" by Jun-Yan Zhu et al. addresses the challenge of generating diverse and realistic outputs in image-to-image translation tasks, where a single input can correspond to multiple possible outputs. The authors propose a method that models a distribution of potential outputs using a low-dimensional latent vector, which is randomly sampled at test time. This approach aims to prevent mode collapse, a common issue in conditional generative models, by encouraging a bijective mapping between the latent space and the output space. The method is evaluated on various datasets, including edges to photos, Google Maps to satellite images, labels to images, and outdoor night to day images. The authors compare their method with several variants, including pix2pix+noise, cVAE-GAN, cLR-GAN, and their hybrid model BicycleGAN, using both qualitative and quantitative metrics such as perceptual realism and diversity. The results show that BicycleGAN produces more diverse and realistic outputs compared to other methods, demonstrating the effectiveness of their proposed approach.The paper "Toward Multimodal Image-to-Image Translation" by Jun-Yan Zhu et al. addresses the challenge of generating diverse and realistic outputs in image-to-image translation tasks, where a single input can correspond to multiple possible outputs. The authors propose a method that models a distribution of potential outputs using a low-dimensional latent vector, which is randomly sampled at test time. This approach aims to prevent mode collapse, a common issue in conditional generative models, by encouraging a bijective mapping between the latent space and the output space. The method is evaluated on various datasets, including edges to photos, Google Maps to satellite images, labels to images, and outdoor night to day images. The authors compare their method with several variants, including pix2pix+noise, cVAE-GAN, cLR-GAN, and their hybrid model BicycleGAN, using both qualitative and quantitative metrics such as perceptual realism and diversity. The results show that BicycleGAN produces more diverse and realistic outputs compared to other methods, demonstrating the effectiveness of their proposed approach.
Reach us at info@study.space
[slides and audio] Toward Multimodal Image-to-Image Translation