ParCo: Part-Coordinating Text-to-Motion Synthesis

ParCo: Part-Coordinating Text-to-Motion Synthesis

23 Jul 2024 | Qiran Zou*1, Shangyuan Yuan*1, Shian Du1, Yu Wang2, Chang Liu1, Yi Xu3, Jie Chen2, and Xiangyang Ji1
The paper introduces ParCo (Part-Coordinating Text-to-Motion Synthesis), a novel approach to text-to-motion synthesis that aims to generate motions that align with textual descriptions and exhibit coordinated movements. ParCo addresses the challenges of part-based methods, which often struggle with coordination between different part motions and understanding part concepts. The proposed method discretizes whole-body motion into multiple part motions, using VQ-VAEs to establish prior knowledge of different parts. Multiple lightweight generators are designed to synthesize these part motions, and a Part Coordination module ensures communication and coordination among them. This approach enables the generation of coordinated and fine-grained motions while maintaining computational efficiency. Extensive experiments on the HumanML3D and KIT-ML datasets demonstrate superior performance in terms of R-Precision, MM-Dist, and FID metrics, showing that ParCo can generate realistic and accurate motions that align well with the semantic descriptions in the text. The method also exhibits adaptability to different part separation schemes and has lower computational complexity, making it a promising solution for text-to-motion synthesis.The paper introduces ParCo (Part-Coordinating Text-to-Motion Synthesis), a novel approach to text-to-motion synthesis that aims to generate motions that align with textual descriptions and exhibit coordinated movements. ParCo addresses the challenges of part-based methods, which often struggle with coordination between different part motions and understanding part concepts. The proposed method discretizes whole-body motion into multiple part motions, using VQ-VAEs to establish prior knowledge of different parts. Multiple lightweight generators are designed to synthesize these part motions, and a Part Coordination module ensures communication and coordination among them. This approach enables the generation of coordinated and fine-grained motions while maintaining computational efficiency. Extensive experiments on the HumanML3D and KIT-ML datasets demonstrate superior performance in terms of R-Precision, MM-Dist, and FID metrics, showing that ParCo can generate realistic and accurate motions that align well with the semantic descriptions in the text. The method also exhibits adaptability to different part separation schemes and has lower computational complexity, making it a promising solution for text-to-motion synthesis.
Reach us at info@study.space