FreeStyle: Free Lunch for Text-guided Style Transfer using Diffusion Models

FreeStyle: Free Lunch for Text-guided Style Transfer using Diffusion Models

18 Jul 2024 | Feihong He, Gang Li, Mengyuan Zhang, Leilei Yan, Lingyu Si, Fanzhang Li, Li Shen
FreeStyle is a novel text-guided style transfer method based on pre-trained large diffusion models, requiring no further optimization or style reference images. The method uses a dual-stream encoder and single-stream decoder architecture, where the dual-stream encoder separately encodes content images and style text prompts, enabling content and style decoupling. The decoder modulates features from both streams to achieve precise style transfer. The method achieves high-quality synthesis and fidelity across various content images and style text prompts, outperforming state-of-the-art methods in metrics like CLIP Aesthetic Score and Preference. FreeStyle reduces computational burden by thousands of iterations while maintaining comparable or superior performance. The method's key contributions include a simple and efficient framework that decouples content and style without optimization, a novel feature fusion module for balancing content preservation and artistic consistency, and comprehensive experiments showing accurate style expression and high-quality content-style fusion. FreeStyle is trained-free, requiring only parameter adjustment for style transfer. The method demonstrates robustness and effectiveness in various domains, including portraits and objects, with clear style differences and natural fusion of style and content. FreeStyle's feature modulation module enhances style and content features through frequency domain processing, enabling effective style transfer without optimization. The method's performance is validated through extensive experiments, showing superior results in qualitative and quantitative comparisons with other state-of-the-art methods. FreeStyle's ablation studies indicate low sensitivity to hyperparameters and effective style transfer across diverse styles. The method's ability to disentangle content and style information is validated through noise addition experiments, demonstrating its powerful capability in separating content and style. FreeStyle's training-free approach and efficient performance make it a promising solution for text-guided style transfer.FreeStyle is a novel text-guided style transfer method based on pre-trained large diffusion models, requiring no further optimization or style reference images. The method uses a dual-stream encoder and single-stream decoder architecture, where the dual-stream encoder separately encodes content images and style text prompts, enabling content and style decoupling. The decoder modulates features from both streams to achieve precise style transfer. The method achieves high-quality synthesis and fidelity across various content images and style text prompts, outperforming state-of-the-art methods in metrics like CLIP Aesthetic Score and Preference. FreeStyle reduces computational burden by thousands of iterations while maintaining comparable or superior performance. The method's key contributions include a simple and efficient framework that decouples content and style without optimization, a novel feature fusion module for balancing content preservation and artistic consistency, and comprehensive experiments showing accurate style expression and high-quality content-style fusion. FreeStyle is trained-free, requiring only parameter adjustment for style transfer. The method demonstrates robustness and effectiveness in various domains, including portraits and objects, with clear style differences and natural fusion of style and content. FreeStyle's feature modulation module enhances style and content features through frequency domain processing, enabling effective style transfer without optimization. The method's performance is validated through extensive experiments, showing superior results in qualitative and quantitative comparisons with other state-of-the-art methods. FreeStyle's ablation studies indicate low sensitivity to hyperparameters and effective style transfer across diverse styles. The method's ability to disentangle content and style information is validated through noise addition experiments, demonstrating its powerful capability in separating content and style. FreeStyle's training-free approach and efficient performance make it a promising solution for text-guided style transfer.
Reach us at info@study.space
[slides] FreeStyle%3A Free Lunch for Text-guided Style Transfer using Diffusion Models | StudySpace