StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery

StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery

31 Mar 2021 | Or Patashnik†*, Zongze Wu†*, Eli Shechtman§, Daniel Cohen-Or†, Dani Lischinski‡
StyleCLIP is a text-driven method for manipulating images generated by StyleGAN. It leverages the power of CLIP models to enable intuitive text-based image manipulation without requiring manual effort. The method introduces three techniques: (1) text-guided latent optimization, where a CLIP-based loss is used to modify an input latent vector in response to a text prompt; (2) a latent residual mapper, which infers a text-guided latent manipulation step for a given input image; and (3) a method for mapping a text prompt to input-agnostic directions in StyleGAN's style space. These techniques allow for a wide range of semantic manipulations on images of human faces, animals, cars, and churches, ranging from abstract to specific, and from extensive to fine-grained. The results demonstrate the effectiveness of the approaches, showing that they can achieve high-quality manipulations with minimal manual effort. The method is compared to other text-driven image manipulation techniques, showing that it provides more flexible and controllable manipulations, especially for complex and specific attributes. The approach is also evaluated against other StyleGAN manipulation methods, showing that it achieves better results in terms of disentanglement and manipulation strength. The method is found to be effective in a variety of domains and can be applied to a wide range of image manipulation tasks.StyleCLIP is a text-driven method for manipulating images generated by StyleGAN. It leverages the power of CLIP models to enable intuitive text-based image manipulation without requiring manual effort. The method introduces three techniques: (1) text-guided latent optimization, where a CLIP-based loss is used to modify an input latent vector in response to a text prompt; (2) a latent residual mapper, which infers a text-guided latent manipulation step for a given input image; and (3) a method for mapping a text prompt to input-agnostic directions in StyleGAN's style space. These techniques allow for a wide range of semantic manipulations on images of human faces, animals, cars, and churches, ranging from abstract to specific, and from extensive to fine-grained. The results demonstrate the effectiveness of the approaches, showing that they can achieve high-quality manipulations with minimal manual effort. The method is compared to other text-driven image manipulation techniques, showing that it provides more flexible and controllable manipulations, especially for complex and specific attributes. The approach is also evaluated against other StyleGAN manipulation methods, showing that it achieves better results in terms of disentanglement and manipulation strength. The method is found to be effective in a variety of domains and can be applied to a wide range of image manipulation tasks.
Reach us at info@study.space
[slides] StyleCLIP%3A Text-Driven Manipulation of StyleGAN Imagery | StudySpace