18 Jul 2024 | Feihong He, Gang Li, Mengyuan Zhang, Leilei Yan, Lingyu Si, Fanzhang Li, Li Shen
FreeStyle is an innovative text-guided style transfer method that leverages pre-trained large diffusion models. Unlike traditional methods that require iterative optimization or reference style images, FreeStyle achieves style transfer solely through text descriptions. The method employs a dual-stream encoder and a single-stream decoder architecture, where the dual-stream encoder processes both the content image and the style text prompt, decoupling content and style information. The decoder then modulates these features to achieve precise style transfer. Key contributions include a novel feature fusion module that balances content preservation and artistic consistency, and a simple and efficient framework that requires minimal adjustments of scaling factors. Experimental results demonstrate high-quality synthesis and fidelity across various content images and style text prompts, outperforming state-of-the-art methods in metrics such as CLIP Aesthetic Score, CLIP Score, and Preference. The method is also computationally efficient, reducing the need for extensive training iterations.FreeStyle is an innovative text-guided style transfer method that leverages pre-trained large diffusion models. Unlike traditional methods that require iterative optimization or reference style images, FreeStyle achieves style transfer solely through text descriptions. The method employs a dual-stream encoder and a single-stream decoder architecture, where the dual-stream encoder processes both the content image and the style text prompt, decoupling content and style information. The decoder then modulates these features to achieve precise style transfer. Key contributions include a novel feature fusion module that balances content preservation and artistic consistency, and a simple and efficient framework that requires minimal adjustments of scaling factors. Experimental results demonstrate high-quality synthesis and fidelity across various content images and style text prompts, outperforming state-of-the-art methods in metrics such as CLIP Aesthetic Score, CLIP Score, and Preference. The method is also computationally efficient, reducing the need for extensive training iterations.