24 Oct 2018 | Jun-Yan Zhu, Richard Zhang, Deepak Pathak, Trevor Darrell, Alexei A. Efros, Oliver Wang, Eli Shechtman
This paper proposes a method for multimodal image-to-image translation, aiming to model a distribution of possible outputs in a conditional generative modeling setting. The method introduces a low-dimensional latent vector that captures the ambiguity in the mapping between input and output. This latent vector is randomly sampled at test time, and a generator learns to map the input and latent code to the output. The method encourages a bijective relationship between the latent code and output to prevent mode collapse and produce diverse results. The authors explore several variants of this approach using different training objectives, network architectures, and methods of injecting the latent code. They compare their method with other variants on both perceptual realism and diversity. The proposed method, called BicycleGAN, combines two approaches to enforce a connection between the latent encoding and output in both directions, achieving improved performance. The method is evaluated on various image-to-image translation tasks, showing that it produces both diverse and visually appealing results. The authors also discuss the implementation details, including network architecture, training procedures, and methods of injecting the latent code into the generator. The experiments demonstrate that the proposed method outperforms other baselines in terms of diversity and realism. The paper also provides a comprehensive review of related work in generative modeling and conditional image generation.This paper proposes a method for multimodal image-to-image translation, aiming to model a distribution of possible outputs in a conditional generative modeling setting. The method introduces a low-dimensional latent vector that captures the ambiguity in the mapping between input and output. This latent vector is randomly sampled at test time, and a generator learns to map the input and latent code to the output. The method encourages a bijective relationship between the latent code and output to prevent mode collapse and produce diverse results. The authors explore several variants of this approach using different training objectives, network architectures, and methods of injecting the latent code. They compare their method with other variants on both perceptual realism and diversity. The proposed method, called BicycleGAN, combines two approaches to enforce a connection between the latent encoding and output in both directions, achieving improved performance. The method is evaluated on various image-to-image translation tasks, showing that it produces both diverse and visually appealing results. The authors also discuss the implementation details, including network architecture, training procedures, and methods of injecting the latent code into the generator. The experiments demonstrate that the proposed method outperforms other baselines in terms of diversity and realism. The paper also provides a comprehensive review of related work in generative modeling and conditional image generation.