14 Aug 2018 | Xun Huang, Ming-Yu Liu, Serge Belongie, Jan Kautz
The paper introduces a framework for multimodal unsupervised image-to-image translation, addressing the challenge of generating diverse outputs from a given source domain image without supervision. The proposed framework, called Multimodal Unsupervised Image-to-Image Translation (MUNIT), assumes that image representations can be decomposed into a content code that is domain-invariant and a style code that captures domain-specific properties. To translate an image to another domain, MUNIT recombines the content code with a random style code sampled from the target domain's style space. The authors analyze the theoretical properties of the framework, establishing that it matches latent distributions, joint distributions, and enforcing a weak form of cycle consistency. Extensive experiments demonstrate the effectiveness of MUNIT in generating diverse and high-quality outputs, outperforming existing unsupervised methods and achieving comparable results to state-of-the-art supervised approaches. The framework also allows for example-guided image translation, where the style of the translation output is controlled by a user-provided example image.The paper introduces a framework for multimodal unsupervised image-to-image translation, addressing the challenge of generating diverse outputs from a given source domain image without supervision. The proposed framework, called Multimodal Unsupervised Image-to-Image Translation (MUNIT), assumes that image representations can be decomposed into a content code that is domain-invariant and a style code that captures domain-specific properties. To translate an image to another domain, MUNIT recombines the content code with a random style code sampled from the target domain's style space. The authors analyze the theoretical properties of the framework, establishing that it matches latent distributions, joint distributions, and enforcing a weak form of cycle consistency. Extensive experiments demonstrate the effectiveness of MUNIT in generating diverse and high-quality outputs, outperforming existing unsupervised methods and achieving comparable results to state-of-the-art supervised approaches. The framework also allows for example-guided image translation, where the style of the translation output is controlled by a user-provided example image.