20 Aug 2020 | Taesung Park, Alexei A. Efros, Richard Zhang, Jun-Yan Zhu
The paper "Contrastive Learning for Unpaired Image-to-Image Translation" proposes a method to improve the quality and efficiency of unpaired image-to-image translation tasks. The authors introduce a contrastive learning framework that maximizes mutual information between corresponding patches in the input and output images, encouraging the preservation of content while allowing for changes in appearance. Key design choices include using a multilayer, patch-based approach and drawing negatives from within the input image itself. The method, named Contrastive Unpaired Translation (CUT), outperforms existing one-sided translation methods and state-of-the-art models that rely on multiple auxiliary networks and loss functions. Experiments on various datasets, including Cat→Dog, Horse→Zebra, and Cityscapes, demonstrate the effectiveness of CUT in terms of image quality and correspondence discovery. The paper also explores single-image training and provides ablation studies to highlight the importance of internal negatives and multiple layers of the encoder.The paper "Contrastive Learning for Unpaired Image-to-Image Translation" proposes a method to improve the quality and efficiency of unpaired image-to-image translation tasks. The authors introduce a contrastive learning framework that maximizes mutual information between corresponding patches in the input and output images, encouraging the preservation of content while allowing for changes in appearance. Key design choices include using a multilayer, patch-based approach and drawing negatives from within the input image itself. The method, named Contrastive Unpaired Translation (CUT), outperforms existing one-sided translation methods and state-of-the-art models that rely on multiple auxiliary networks and loss functions. Experiments on various datasets, including Cat→Dog, Horse→Zebra, and Cityscapes, demonstrate the effectiveness of CUT in terms of image quality and correspondence discovery. The paper also explores single-image training and provides ablation studies to highlight the importance of internal negatives and multiple layers of the encoder.