FlowVQTalker: High-Quality Emotional Talking Face Generation through Normalizing Flow and Quantization

FlowVQTalker: High-Quality Emotional Talking Face Generation through Normalizing Flow and Quantization

23 Apr 2024 | Shuai Tan, Bin Ji, and Ye Pan
FlowVQTalker is a system for generating high-quality emotional talking face videos that incorporate diverse facial dynamics and fine-grained expressions. The system combines normalizing flow and vector-quantization modeling to address two key insights: 1) the non-deterministic nature of facial dynamics in response to audio, and 2) the importance of emotion-aware textures and clear teeth in conveying emotional expressions. FlowVQTalker consists of two main components: a flow-based coefficient generator (FCG) and a vector-quantized image generator (VQIG). The FCG uses normalizing flow to model emotional expression and pose coefficients, enabling diverse and realistic facial dynamics. The VQIG leverages a codebook to generate high-quality, emotion-aware textures and clear teeth. The system is trained on datasets such as MEAD and HDTF, and extensive experiments demonstrate its effectiveness in generating realistic and expressive talking faces. FlowVQTalker outperforms existing methods in both quantitative and qualitative evaluations, achieving superior results in terms of lip synchronization, emotional expression, and video quality. The system is capable of generating diverse facial dynamics and supports emotion transfer through an emotion reference. The method is evaluated against state-of-the-art methods, showing its ability to generate high-quality, expressive talking faces with clear textures and realistic motion.FlowVQTalker is a system for generating high-quality emotional talking face videos that incorporate diverse facial dynamics and fine-grained expressions. The system combines normalizing flow and vector-quantization modeling to address two key insights: 1) the non-deterministic nature of facial dynamics in response to audio, and 2) the importance of emotion-aware textures and clear teeth in conveying emotional expressions. FlowVQTalker consists of two main components: a flow-based coefficient generator (FCG) and a vector-quantized image generator (VQIG). The FCG uses normalizing flow to model emotional expression and pose coefficients, enabling diverse and realistic facial dynamics. The VQIG leverages a codebook to generate high-quality, emotion-aware textures and clear teeth. The system is trained on datasets such as MEAD and HDTF, and extensive experiments demonstrate its effectiveness in generating realistic and expressive talking faces. FlowVQTalker outperforms existing methods in both quantitative and qualitative evaluations, achieving superior results in terms of lip synchronization, emotional expression, and video quality. The system is capable of generating diverse facial dynamics and supports emotion transfer through an emotion reference. The method is evaluated against state-of-the-art methods, showing its ability to generate high-quality, expressive talking faces with clear textures and realistic motion.
Reach us at info@study.space