[slides] One-Step Image Translation with Text-to-Image Models

This paper addresses the limitations of existing conditional diffusion models, particularly their slow inference speed and reliance on paired data for fine-tuning. To overcome these issues, the authors introduce a general method for adapting a single-step diffusion model to new tasks and domains through adversarial learning objectives. They consolidate various modules of the vanilla latent diffusion model into a single end-to-end generator network with small trainable weights, enhancing its ability to preserve input image structure while reducing overfitting. The proposed method, named CycleGAN-Turbo, outperforms existing GAN-based and diffusion-based methods for various scene translation tasks, such as day-to-night conversion and adding/removing weather effects. For paired settings, the method pix2pix-Turbo achieves comparable results to recent works like ControlNet but with a single-step inference. The work suggests that single-step diffusion models can serve as strong backbones for a range of GAN learning objectives. The code and models are available at <https://github.com/GaParmar/img2img-turbo>.This paper addresses the limitations of existing conditional diffusion models, particularly their slow inference speed and reliance on paired data for fine-tuning. To overcome these issues, the authors introduce a general method for adapting a single-step diffusion model to new tasks and domains through adversarial learning objectives. They consolidate various modules of the vanilla latent diffusion model into a single end-to-end generator network with small trainable weights, enhancing its ability to preserve input image structure while reducing overfitting. The proposed method, named CycleGAN-Turbo, outperforms existing GAN-based and diffusion-based methods for various scene translation tasks, such as day-to-night conversion and adding/removing weather effects. For paired settings, the method pix2pix-Turbo achieves comparable results to recent works like ControlNet but with a single-step inference. The work suggests that single-step diffusion models can serve as strong backbones for a range of GAN learning objectives. The code and models are available at <https://github.com/GaParmar/img2img-turbo>.

One-Step Image Translation with Text-to-Image Models

18 Mar 2024 | Gaurav Parmar, Taesung Park, Srinivasa Narasimhan, Jun-Yan Zhu