[slides and audio] X-Portrait%3A Expressive Portrait Animation with Hierarchical Motion Attention

X-Portrait is an innovative conditional diffusion model designed to generate expressive and temporally coherent portrait animations. Given a single portrait as the appearance reference, X-Portrait aims to animate it using motion derived from a driving video, capturing both dynamic facial expressions and wide-range head movements. The core of X-Portrait leverages the generative prior of a pre-trained diffusion model as the rendering backbone, while achieving fine-grained control of head pose and expression through novel controlling signals within the ControlNet framework. Unlike conventional methods that rely on coarse explicit controls such as facial landmarks, X-Portrait's motion control module is trained to interpret the dynamics directly from the original driving RGB inputs. This enhances motion accuracy with a patch-based local control module that focuses on small-scale nuances like eyeball positions. To mitigate identity leakage from the driving signals, X-Portrait trains its motion control modules using scaling-augmented cross-identity images, ensuring maximum disentanglement from the appearance reference modules. Experimental results demonstrate the effectiveness of X-Portrait across a diverse range of facial portraits and expressive driving sequences, showcasing its ability to generate captivating portrait animations with consistent identity characteristics.X-Portrait is an innovative conditional diffusion model designed to generate expressive and temporally coherent portrait animations. Given a single portrait as the appearance reference, X-Portrait aims to animate it using motion derived from a driving video, capturing both dynamic facial expressions and wide-range head movements. The core of X-Portrait leverages the generative prior of a pre-trained diffusion model as the rendering backbone, while achieving fine-grained control of head pose and expression through novel controlling signals within the ControlNet framework. Unlike conventional methods that rely on coarse explicit controls such as facial landmarks, X-Portrait's motion control module is trained to interpret the dynamics directly from the original driving RGB inputs. This enhances motion accuracy with a patch-based local control module that focuses on small-scale nuances like eyeball positions. To mitigate identity leakage from the driving signals, X-Portrait trains its motion control modules using scaling-augmented cross-identity images, ensuring maximum disentanglement from the appearance reference modules. Experimental results demonstrate the effectiveness of X-Portrait across a diverse range of facial portraits and expressive driving sequences, showcasing its ability to generate captivating portrait animations with consistent identity characteristics.

X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention

24, July 27-August 1, 2024, Denver, CO, USA | You Xie, Hongyi Xu, Guoxian Song, Chao Wang, Yichun Shi, Linjie Luo