25 Jun 2024 | Xuanhua He1,2; Quande Liu3†; Shengju Qian3, Xin Wang3, Tao Hu1,2, Ke Cao1,2, Keyu Yan1,2, Jie Zhang2†
ID-Animator is a novel framework designed to generate identity-specific human videos using a single reference facial image without additional training. The method leverages pre-trained text-to-video diffusion models and a lightweight face adapter to encode ID-relevant embeddings from learnable facial latent queries. To address the challenges of high training costs, dataset scarcity, and the influence of ID-irrelevant features, ID-Animator introduces an ID-oriented dataset construction pipeline that includes unified human attributes and action captioning techniques. A random reference training strategy is also devised to precisely capture ID-relevant embeddings with an ID-preserving loss, improving the fidelity and generalization of the model. Extensive experiments demonstrate that ID-Animator outperforms previous models in generating personalized human videos, showing superior identity preservation and real-world applicability. The method is highly compatible with popular pre-trained T2V models and community backbone models, making it a robust solution for video generation tasks where identity preservation is crucial.ID-Animator is a novel framework designed to generate identity-specific human videos using a single reference facial image without additional training. The method leverages pre-trained text-to-video diffusion models and a lightweight face adapter to encode ID-relevant embeddings from learnable facial latent queries. To address the challenges of high training costs, dataset scarcity, and the influence of ID-irrelevant features, ID-Animator introduces an ID-oriented dataset construction pipeline that includes unified human attributes and action captioning techniques. A random reference training strategy is also devised to precisely capture ID-relevant embeddings with an ID-preserving loss, improving the fidelity and generalization of the model. Extensive experiments demonstrate that ID-Animator outperforms previous models in generating personalized human videos, showing superior identity preservation and real-world applicability. The method is highly compatible with popular pre-trained T2V models and community backbone models, making it a robust solution for video generation tasks where identity preservation is crucial.