20 Aug 2020 | Taesung Park, Alexei A. Efros, Richard Zhang, Jun-Yan Zhu
Contrastive Learning for Unpaired Image-to-Image Translation proposes a method to enhance content preservation in unpaired image-to-image translation by maximizing mutual information between corresponding patches using contrastive learning. The method uses a multilayer, patch-based approach, with negatives drawn from the input image itself rather than from the dataset. This enables one-sided translation while improving quality and reducing training time. The framework uses a contrastive loss function, InfoNCE, to learn an embedding that associates corresponding patches while disassociating them from others. The method can be extended to single-image training and outperforms prior methods in terms of speed and quality. The approach is effective in both paired and unpaired settings, and can be applied to various domains. The method is compared to existing techniques on multiple datasets, showing superior performance in terms of image quality and semantic segmentation. The method is also efficient in terms of memory and training time, making it suitable for practical applications. The paper discusses the importance of using multiple layers of the encoder and the benefits of using internal negatives over external ones. The method is evaluated on several datasets, including horse-to-zebra, cat-to-dog, and Cityscapes, and is shown to produce high-quality results. The method is also tested on high-resolution single-image translation tasks, demonstrating its effectiveness in generating realistic images. The paper concludes that contrastive learning provides a powerful tool for unpaired image-to-image translation, enabling the generation of high-quality images while preserving content.Contrastive Learning for Unpaired Image-to-Image Translation proposes a method to enhance content preservation in unpaired image-to-image translation by maximizing mutual information between corresponding patches using contrastive learning. The method uses a multilayer, patch-based approach, with negatives drawn from the input image itself rather than from the dataset. This enables one-sided translation while improving quality and reducing training time. The framework uses a contrastive loss function, InfoNCE, to learn an embedding that associates corresponding patches while disassociating them from others. The method can be extended to single-image training and outperforms prior methods in terms of speed and quality. The approach is effective in both paired and unpaired settings, and can be applied to various domains. The method is compared to existing techniques on multiple datasets, showing superior performance in terms of image quality and semantic segmentation. The method is also efficient in terms of memory and training time, making it suitable for practical applications. The paper discusses the importance of using multiple layers of the encoder and the benefits of using internal negatives over external ones. The method is evaluated on several datasets, including horse-to-zebra, cat-to-dog, and Cityscapes, and is shown to produce high-quality results. The method is also tested on high-resolution single-image translation tasks, demonstrating its effectiveness in generating realistic images. The paper concludes that contrastive learning provides a powerful tool for unpaired image-to-image translation, enabling the generation of high-quality images while preserving content.