Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation

Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation

7 Apr 2024 | Renshuai Liu, Bowen Ma, Wei Zhang, Zhipeng Hu, Changjie Fan, Tangjie Lv, Yu Ding, Xuan Cheng
The paper introduces a novel multi-modal face generation framework that enables simultaneous control of identity and expression, along with fine-grained expression synthesis. The framework takes three inputs: a prompt describing the background, a selfie photo, and a text related to fine-grained expression labels. The proposed framework uses a diffusion model to achieve simultaneous face swapping and reenactment, addressing the challenge of separating and precisely controlling identity and expression in a unified framework. Key innovations include balancing identity and expression encoders, improved midpoint sampling, and explicit background conditioning. Extensive experiments demonstrate the controllability and scalability of the framework compared to state-of-the-art text-to-image, face swapping, and face reenactment methods. The framework can produce high-fidelity portraits that preserve both identity and expression, with fine-grained expression synthesis capabilities.The paper introduces a novel multi-modal face generation framework that enables simultaneous control of identity and expression, along with fine-grained expression synthesis. The framework takes three inputs: a prompt describing the background, a selfie photo, and a text related to fine-grained expression labels. The proposed framework uses a diffusion model to achieve simultaneous face swapping and reenactment, addressing the challenge of separating and precisely controlling identity and expression in a unified framework. Key innovations include balancing identity and expression encoders, improved midpoint sampling, and explicit background conditioning. Extensive experiments demonstrate the controllability and scalability of the framework compared to state-of-the-art text-to-image, face swapping, and face reenactment methods. The framework can produce high-fidelity portraits that preserve both identity and expression, with fine-grained expression synthesis capabilities.
Reach us at info@study.space
Understanding Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation