18 Mar 2024 | Yi Wu, Ziqiang Li, Heliang Zheng, Chaoyue Wang, and Bin Li
Infinite-ID is an innovative method for identity-preserved personalization in text-to-image generation, addressing the challenge of balancing identity fidelity and semantic consistency. The method introduces an ID-semantics decoupling paradigm, which separates image and text information during training and inference. Specifically, it employs identity-enhanced training with an additional image cross-attention module to capture identity information while deactivating the original text cross-attention module. This ensures that the image stream accurately represents the identity provided by the reference image, while mitigating interference from textual input. Additionally, a feature interaction mechanism combines a mixed attention module with an AdaIN-mean operation to seamlessly merge the two streams, enhancing both identity fidelity and semantic consistency. Experimental results on raw photo and style image generation demonstrate the superior performance of Infinite-ID compared to existing methods. The method is evaluated using metrics such as CLIP-T, CLIP-I, and MFaceNet, showing strong identity fidelity and high-quality semantic consistency.Infinite-ID is an innovative method for identity-preserved personalization in text-to-image generation, addressing the challenge of balancing identity fidelity and semantic consistency. The method introduces an ID-semantics decoupling paradigm, which separates image and text information during training and inference. Specifically, it employs identity-enhanced training with an additional image cross-attention module to capture identity information while deactivating the original text cross-attention module. This ensures that the image stream accurately represents the identity provided by the reference image, while mitigating interference from textual input. Additionally, a feature interaction mechanism combines a mixed attention module with an AdaIN-mean operation to seamlessly merge the two streams, enhancing both identity fidelity and semantic consistency. Experimental results on raw photo and style image generation demonstrate the superior performance of Infinite-ID compared to existing methods. The method is evaluated using metrics such as CLIP-T, CLIP-I, and MFaceNet, showing strong identity fidelity and high-quality semantic consistency.