Diverse Image-to-Image Translation via Disentangled Representations

Diverse Image-to-Image Translation via Disentangled Representations

2 Aug 2018 | Hsin-Ying Lee*1, Hung-Yu Tseng*1, Jia-Bin Huang2, Maneesh Singh3, Ming-Hsuan Yang1,4
This paper proposes a disentangled representation framework for image-to-image translation without paired training data. The framework learns to generate diverse outputs by embedding images into two spaces: a domain-invariant content space and a domain-specific attribute space. The content space captures shared information across domains, while the attribute space models variations within a domain. The model uses a content adversarial loss to ensure content features do not carry domain-specific cues and a latent regression loss to enforce invertible mapping between latent attribute vectors and outputs. To handle unpaired data, a cross-cycle consistency loss is introduced, which enforces consistency between original and reconstructed images through cyclic translation. At test time, the model can generate diverse outputs by sampling random attribute vectors or using transferred attribute vectors from existing images. The model is evaluated on various tasks, including domain adaptation, and shows competitive performance compared to state-of-the-art methods. Qualitative and quantitative results demonstrate that the model produces realistic and diverse images. The framework enables both inter-domain and intra-domain attribute transfer, and the model is applied to domain adaptation tasks, achieving competitive results on MNIST-M and Cropped LineMod datasets. The code and results are available at https://github.com/HsinYingLee/DRIT/.This paper proposes a disentangled representation framework for image-to-image translation without paired training data. The framework learns to generate diverse outputs by embedding images into two spaces: a domain-invariant content space and a domain-specific attribute space. The content space captures shared information across domains, while the attribute space models variations within a domain. The model uses a content adversarial loss to ensure content features do not carry domain-specific cues and a latent regression loss to enforce invertible mapping between latent attribute vectors and outputs. To handle unpaired data, a cross-cycle consistency loss is introduced, which enforces consistency between original and reconstructed images through cyclic translation. At test time, the model can generate diverse outputs by sampling random attribute vectors or using transferred attribute vectors from existing images. The model is evaluated on various tasks, including domain adaptation, and shows competitive performance compared to state-of-the-art methods. Qualitative and quantitative results demonstrate that the model produces realistic and diverse images. The framework enables both inter-domain and intra-domain attribute transfer, and the model is applied to domain adaptation tasks, achieving competitive results on MNIST-M and Cropped LineMod datasets. The code and results are available at https://github.com/HsinYingLee/DRIT/.
Reach us at info@study.space
[slides] Diverse Image-to-Image Translation via Disentangled Representations | StudySpace