Understanding Large Motion Model for Unified Multi-Modal Motion Generation

The paper introduces the Large Motion Model (LMM), a unified multi-modal motion generation framework that can perform multiple motion generation tasks simultaneously. LMM addresses the challenges of heterogeneous motion data and tasks by consolidating datasets from different modalities, formats, and tasks into a comprehensive dataset called MotionVerse. MotionVerse includes 10 tasks, 16 datasets, and 320k sequences, totaling 100 million frames. The architecture of LMM incorporates an articulated attention mechanism called ArtAttention, which allows for body part-aware modeling and multi-condition control. The pre-training strategy for LMM involves random frame rates and various masking techniques to enhance knowledge absorption from diverse training data. Extensive experiments demonstrate that LMM achieves competitive performance across various standard motion generation tasks, showcasing strong generalization capabilities and emerging properties across unseen tasks. The paper also includes ablation studies that provide insights into the training and scaling of large motion models.The paper introduces the Large Motion Model (LMM), a unified multi-modal motion generation framework that can perform multiple motion generation tasks simultaneously. LMM addresses the challenges of heterogeneous motion data and tasks by consolidating datasets from different modalities, formats, and tasks into a comprehensive dataset called MotionVerse. MotionVerse includes 10 tasks, 16 datasets, and 320k sequences, totaling 100 million frames. The architecture of LMM incorporates an articulated attention mechanism called ArtAttention, which allows for body part-aware modeling and multi-condition control. The pre-training strategy for LMM involves random frame rates and various masking techniques to enhance knowledge absorption from diverse training data. Extensive experiments demonstrate that LMM achieves competitive performance across various standard motion generation tasks, showcasing strong generalization capabilities and emerging properties across unseen tasks. The paper also includes ablation studies that provide insights into the training and scaling of large motion models.

Large Motion Model for Unified Multi-Modal Motion Generation

1 Apr 2024 | Mingyuan Zhang*,†, Daisheng Jin*,†, Chenyang Gu*,†, Fangzhou Hong†, Zhongang Cai†,2, Jingfang Huang†, Chongzhi Zhang†, Xinying Guo†, Lei Yang2, Ying He†, Ziwei Liu†,†

1 Apr 2024 | Mingyuan Zhang,†, Daisheng Jin,†, Chenyang Gu*,†, Fangzhou Hong†, Zhongang Cai†,2, Jingfang Huang†, Chongzhi Zhang†, Xinying Guo†, Lei Yang2, Ying He†, Ziwei Liu†,†