FlowVQTalker: High-Quality Emotional Talking Face Generation through Normalizing Flow and Quantization

FlowVQTalker: High-Quality Emotional Talking Face Generation through Normalizing Flow and Quantization

23 Apr 2024 | Shuai Tan, Bin Ji, and Ye Pan*
The paper "FlowVQTalker: High-Quality Emotional Talking Face Generation through Normalizing Flow and Quantization" addresses the challenge of generating lifelike and emotionally expressive talking faces. The authors propose a novel method, FlowVQTalker, which combines normalizing flow and vector-quantization modeling to produce high-quality emotional talking faces. The key contributions include: 1. **Flow-based Coeff. Generator (FCG)**: This module uses normalizing flow to model the dynamics of facial emotions, encoding them into a multi-emotion-class latent space represented as a mixture distribution. This enables the generation of diverse and synchronized facial expressions, lip synchronization, and nonverbal cues. 2. **Vector-Quantized Image Generator (VQIG)**: This module treats the creation of expressive facial images as a code query task, utilizing a learned codebook to provide rich, high-definition textures and clear teeth, enhancing the emotional perception of the generated faces. The paper also discusses the challenges in existing methods, such as the deterministic nature of nonverbal facial dynamics and the lack of expressive textures, and how FlowVQTalker addresses these issues. Extensive experiments demonstrate the effectiveness of the proposed method, showing superior performance in both quantitative and qualitative evaluations compared to state-of-the-art methods.The paper "FlowVQTalker: High-Quality Emotional Talking Face Generation through Normalizing Flow and Quantization" addresses the challenge of generating lifelike and emotionally expressive talking faces. The authors propose a novel method, FlowVQTalker, which combines normalizing flow and vector-quantization modeling to produce high-quality emotional talking faces. The key contributions include: 1. **Flow-based Coeff. Generator (FCG)**: This module uses normalizing flow to model the dynamics of facial emotions, encoding them into a multi-emotion-class latent space represented as a mixture distribution. This enables the generation of diverse and synchronized facial expressions, lip synchronization, and nonverbal cues. 2. **Vector-Quantized Image Generator (VQIG)**: This module treats the creation of expressive facial images as a code query task, utilizing a learned codebook to provide rich, high-definition textures and clear teeth, enhancing the emotional perception of the generated faces. The paper also discusses the challenges in existing methods, such as the deterministic nature of nonverbal facial dynamics and the lack of expressive textures, and how FlowVQTalker addresses these issues. Extensive experiments demonstrate the effectiveness of the proposed method, showing superior performance in both quantitative and qualitative evaluations compared to state-of-the-art methods.
Reach us at info@study.space