29 Jan 2024 | Qinghe Wang, Xu Jia, Xiaomin Li, Taiqing Li, Liqian Ma, Yunzhi Zhuge, Huchuan Lu
StableIdentity is a novel framework that enables identity-consistent recontextualization with just one face image. The method integrates identity and editability priors to allow the learned identity to be injected into various contexts. It employs a face encoder with an identity prior to encode the input face, then maps the face representation into a space with an editable prior constructed from celebrity names. A masked two-phase diffusion loss is designed to enhance pixel-level perception and maintain generation diversity. Extensive experiments show that StableIdentity outperforms existing customization methods and can be flexibly combined with off-the-shelf modules like ControlNet. Notably, it is the first to directly inject identity learned from a single image into video and 3D generation without fine-tuning. The method achieves stable generalization and can be seamlessly integrated with image, video, and 3D generation models. The framework demonstrates superior performance in identity preservation, editability, and generation quality, and can generate diverse customized images across various contexts and artistic styles. The method also shows strong performance in zero-shot identity-driven video and 3D generation. The proposed approach provides a unified solution for customized generation across image, video, and 3D domains.StableIdentity is a novel framework that enables identity-consistent recontextualization with just one face image. The method integrates identity and editability priors to allow the learned identity to be injected into various contexts. It employs a face encoder with an identity prior to encode the input face, then maps the face representation into a space with an editable prior constructed from celebrity names. A masked two-phase diffusion loss is designed to enhance pixel-level perception and maintain generation diversity. Extensive experiments show that StableIdentity outperforms existing customization methods and can be flexibly combined with off-the-shelf modules like ControlNet. Notably, it is the first to directly inject identity learned from a single image into video and 3D generation without fine-tuning. The method achieves stable generalization and can be seamlessly integrated with image, video, and 3D generation models. The framework demonstrates superior performance in identity preservation, editability, and generation quality, and can generate diverse customized images across various contexts and artistic styles. The method also shows strong performance in zero-shot identity-driven video and 3D generation. The proposed approach provides a unified solution for customized generation across image, video, and 3D domains.