Towards Variable and Coordinated Holistic Co-Speech Motion Generation

Towards Variable and Coordinated Holistic Co-Speech Motion Generation

15 Apr 2024 | Yifei Liu, Qiong Cao, Yandong Wen, Huaiguang Jiang, Changxing Ding
This paper addresses the challenge of generating lifelike holistic co-speech motions for 3D avatars, focusing on variability and coordination. The authors propose ProbTalk, a unified probabilistic framework designed to jointly model facial, hand, and body movements in speech. ProbTalk builds on the variational autoencoder (VAE) architecture and incorporates three core designs: product quantization (PQ), a novel non-autoregressive model, and a secondary refinement stage. PQ enriches the representation of complex holistic motion, while the non-autoregressive model preserves essential structure information. The secondary stage refines the preliminary prediction, enhancing high-frequency details. Experimental results demonstrate that ProbTalk outperforms state-of-the-art methods in both qualitative and quantitative evaluations, particularly in terms of realism. The code and model will be released for research purposes.This paper addresses the challenge of generating lifelike holistic co-speech motions for 3D avatars, focusing on variability and coordination. The authors propose ProbTalk, a unified probabilistic framework designed to jointly model facial, hand, and body movements in speech. ProbTalk builds on the variational autoencoder (VAE) architecture and incorporates three core designs: product quantization (PQ), a novel non-autoregressive model, and a secondary refinement stage. PQ enriches the representation of complex holistic motion, while the non-autoregressive model preserves essential structure information. The secondary stage refines the preliminary prediction, enhancing high-frequency details. Experimental results demonstrate that ProbTalk outperforms state-of-the-art methods in both qualitative and quantitative evaluations, particularly in terms of realism. The code and model will be released for research purposes.
Reach us at info@study.space
Understanding Towards Variable and Coordinated Holistic Co-Speech Motion Generation