21 Apr 2021 | Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, Daniel Cohen-Or
This paper introduces a novel image-to-image translation framework called pixel2style2pixel (pSp), which leverages a StyleGAN encoder to directly map real images into the extended latent space W+. The pSp framework allows for a wide range of image-to-image translation tasks, including StyleGAN inversion, multi-modal conditional image synthesis, facial frontalization, inpainting, and super-resolution. Unlike previous methods that require pixel-to-pixel correspondence, pSp operates globally in the style domain, enabling multi-modal synthesis and supporting tasks where the input image is not in the StyleGAN domain. The encoder is based on a Feature Pyramid Network, where style vectors are extracted from different pyramid scales and inserted into a pretrained StyleGAN generator. The framework uses a combination of losses, including pixel-wise L2 loss, LPIPS loss, identity loss, and regularization loss, to ensure accurate and high-quality image reconstruction. The pSp framework is shown to outperform existing methods in various tasks, including face frontalization, conditional image synthesis, and super-resolution. The framework is also applicable to a wide range of other tasks beyond facial images, demonstrating its versatility and effectiveness in image-to-image translation.This paper introduces a novel image-to-image translation framework called pixel2style2pixel (pSp), which leverages a StyleGAN encoder to directly map real images into the extended latent space W+. The pSp framework allows for a wide range of image-to-image translation tasks, including StyleGAN inversion, multi-modal conditional image synthesis, facial frontalization, inpainting, and super-resolution. Unlike previous methods that require pixel-to-pixel correspondence, pSp operates globally in the style domain, enabling multi-modal synthesis and supporting tasks where the input image is not in the StyleGAN domain. The encoder is based on a Feature Pyramid Network, where style vectors are extracted from different pyramid scales and inserted into a pretrained StyleGAN generator. The framework uses a combination of losses, including pixel-wise L2 loss, LPIPS loss, identity loss, and regularization loss, to ensure accurate and high-quality image reconstruction. The pSp framework is shown to outperform existing methods in various tasks, including face frontalization, conditional image synthesis, and super-resolution. The framework is also applicable to a wide range of other tasks beyond facial images, demonstrating its versatility and effectiveness in image-to-image translation.