Generative Human Motion Stylization in Latent Space

Generative Human Motion Stylization in Latent Space

2024 | Chuan Guo*, Yuxuan Mu*, Xin Xin Zuo, Peng Dai, Youliang Yan, Juwei Lu, Li Cheng
This paper introduces a novel generative framework for 3D human motion stylization, which leverages the latent space of pretrained autoencoders to achieve more expressive and robust motion representation. Unlike existing methods that operate directly in pose space, our approach decomposes motion codes into deterministic content codes and probabilistic style codes, enabling diverse stylization results from a single motion code. The model is trained to reconstruct motion codes by combining content and style codes, with a focus on disentangling content and style representations. Our approach supports both supervised and unsupervised settings, allowing for stylization using style cues from reference motions or labels, or even without explicit style input by sampling from a prior distribution. The model is evaluated on three motion datasets, demonstrating superior performance in style reenactment, content preservation, and generalization across various applications and settings. The proposed framework is lightweight and efficient, achieving state-of-the-art performance while being 14 times faster than the most advanced prior work. The model's ability to generate diverse and novel stylization results is validated through extensive experiments and visual comparisons. Additionally, the model is capable of generating stylized text-to-motion results, showcasing its versatility in different applications. The approach is also effective in preserving semantic information and capturing style characteristics, making it a promising solution for realistic character animation in film and game industries.This paper introduces a novel generative framework for 3D human motion stylization, which leverages the latent space of pretrained autoencoders to achieve more expressive and robust motion representation. Unlike existing methods that operate directly in pose space, our approach decomposes motion codes into deterministic content codes and probabilistic style codes, enabling diverse stylization results from a single motion code. The model is trained to reconstruct motion codes by combining content and style codes, with a focus on disentangling content and style representations. Our approach supports both supervised and unsupervised settings, allowing for stylization using style cues from reference motions or labels, or even without explicit style input by sampling from a prior distribution. The model is evaluated on three motion datasets, demonstrating superior performance in style reenactment, content preservation, and generalization across various applications and settings. The proposed framework is lightweight and efficient, achieving state-of-the-art performance while being 14 times faster than the most advanced prior work. The model's ability to generate diverse and novel stylization results is validated through extensive experiments and visual comparisons. Additionally, the model is capable of generating stylized text-to-motion results, showcasing its versatility in different applications. The approach is also effective in preserving semantic information and capturing style characteristics, making it a promising solution for realistic character animation in film and game industries.
Reach us at info@study.space