Rethink Arbitrary Style Transfer with Transformer and Contrastive Learning

Rethink Arbitrary Style Transfer with Transformer and Contrastive Learning

2024 | Zhanjie Zhang, Jiakai Sun, Guangyuan Li, Lei Zhao, Quanwei Zhang, Zehua Lan, Haolin Yin, Wei Xing, Huaizhong Lin, Zhiwen Zuo
This paper proposes a novel method for arbitrary style transfer using a combination of transformer-based models and contrastive learning. The method introduces Style Consistency Instance Normalization (SCIN) to align content and style features, and Instance-based Contrastive Learning (ICL) to enhance the quality of stylized images by learning the relationships between different styles. Additionally, a Perception Encoder (PE) is introduced to capture style features more effectively, as VGG networks are not well-suited for capturing style information. The proposed method generates high-quality stylized images with minimal artifacts and effectively preserves the content structure and style texture. The method is evaluated against state-of-the-art methods, showing superior performance in terms of content fidelity, global effects, and local patterns. The results demonstrate that the proposed method can generate more realistic and high-quality stylized images compared to existing methods. The method also includes ablation studies to validate the effectiveness of the proposed components. The paper concludes that the proposed method provides a unified network architecture for generating high-quality stylized images.This paper proposes a novel method for arbitrary style transfer using a combination of transformer-based models and contrastive learning. The method introduces Style Consistency Instance Normalization (SCIN) to align content and style features, and Instance-based Contrastive Learning (ICL) to enhance the quality of stylized images by learning the relationships between different styles. Additionally, a Perception Encoder (PE) is introduced to capture style features more effectively, as VGG networks are not well-suited for capturing style information. The proposed method generates high-quality stylized images with minimal artifacts and effectively preserves the content structure and style texture. The method is evaluated against state-of-the-art methods, showing superior performance in terms of content fidelity, global effects, and local patterns. The results demonstrate that the proposed method can generate more realistic and high-quality stylized images compared to existing methods. The method also includes ablation studies to validate the effectiveness of the proposed components. The paper concludes that the proposed method provides a unified network architecture for generating high-quality stylized images.
Reach us at info@study.space
[slides and audio] Rethink arbitrary style transfer with transformer and contrastive learning