Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs

Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs

6 Oct 2024 | Qi Wu¹, Yubo Zhao¹, Yifan Wang¹, Xinhang Liu¹, Yu-Wing Tai², Chi-Keung Tang¹
Motion-Agent is a conversational framework for human motion generation using large language models (LLMs). It enables efficient, customizable motion generation, editing, and understanding through a generative agent called MotionLLM, which bridges the gap between motion and text. MotionLLM uses a pre-trained LLM and a motion tokenizer to encode and quantize motions into discrete tokens, allowing for seamless interaction with the LLM. With only 1–3% of the model's parameters fine-tuned, MotionLLM achieves performance comparable to diffusion models and other transformer-based methods. By integrating MotionLLM with GPT-4, Motion-Agent can generate complex motion sequences through multi-turn conversations, a capability that previous models have struggled to achieve. Motion-Agent supports a wide range of motion-language tasks, offering versatile capabilities for generating and customizing human motion through interactive conversations. The framework leverages pre-trained LLMs to enable bidirectional translation between text and motion, achieving state-of-the-art results in motion captioning and generating semantically accurate and contextually appropriate text descriptions. Motion-Agent is efficient, customizable, and capable of handling long, complex motion sequences through iterative refinement and multi-turn interactions. It demonstrates strong performance in motion generation and captioning, outperforming other models in terms of diversity, accuracy, and smoothness of generated motions. The framework is flexible and can be adapted to different LLMs, making it a versatile solution for motion-language learning applications.Motion-Agent is a conversational framework for human motion generation using large language models (LLMs). It enables efficient, customizable motion generation, editing, and understanding through a generative agent called MotionLLM, which bridges the gap between motion and text. MotionLLM uses a pre-trained LLM and a motion tokenizer to encode and quantize motions into discrete tokens, allowing for seamless interaction with the LLM. With only 1–3% of the model's parameters fine-tuned, MotionLLM achieves performance comparable to diffusion models and other transformer-based methods. By integrating MotionLLM with GPT-4, Motion-Agent can generate complex motion sequences through multi-turn conversations, a capability that previous models have struggled to achieve. Motion-Agent supports a wide range of motion-language tasks, offering versatile capabilities for generating and customizing human motion through interactive conversations. The framework leverages pre-trained LLMs to enable bidirectional translation between text and motion, achieving state-of-the-art results in motion captioning and generating semantically accurate and contextually appropriate text descriptions. Motion-Agent is efficient, customizable, and capable of handling long, complex motion sequences through iterative refinement and multi-turn interactions. It demonstrates strong performance in motion generation and captioning, outperforming other models in terms of diversity, accuracy, and smoothness of generated motions. The framework is flexible and can be adapted to different LLMs, making it a versatile solution for motion-language learning applications.
Reach us at info@study.space
[slides and audio] Motion-Agent%3A A Conversational Framework for Human Motion Generation with LLMs