FreeMotion: A Unified Framework for Number-free Text-to-Motion Synthesis

FreeMotion: A Unified Framework for Number-free Text-to-Motion Synthesis

24 May 2024 | Ke Fan, Junshu Tang, Weijian Cao, Ran Yi, Moran Li, Jingyu Gong, Jiangning Zhang, Yabiao Wang, Chengjie Wang, Lizhuang Ma
**FreeMotion: A Unified Framework for Number-free Text-to-Motion Synthesis** This paper addresses the challenge of generating human motions from text, particularly focusing on multi-person scenarios. Existing methods are limited to single or two-person scenarios and lack universality. To overcome this, the authors propose FreeMotion, a unified framework that enables the generation of motions for any number of individuals. The key innovation is the use of conditional motion distribution to decouple the process of conditional motion generation, allowing for flexible and precise control over multi-person motions. The framework consists of two main modules: the generation module and the interaction module. The generation module generates diverse single-person motions based on text prompts, while the interaction module injects condition signals (motions of other individuals) into the motion generation process. This decoupling allows for the seamless integration of existing single-person motion control methods, enabling precise spatial control over multi-person motions. Experiments demonstrate the superior performance of FreeMotion in generating both single and multi-person motions, outperforming state-of-the-art methods. The framework also supports flexible spatial control, allowing for explicit and implicit guidance to control the spatial locations of multi-human motions. **Contributions:** 1. A new paradigm for unified motion synthesis for single and multiple people. 2. Decoupled generation and interaction modules for conditional motion generation. 3. Precise control of multi-person motion through flexible spatial signals. **Keywords:** Text-to-motion synthesis, Diffusion models **Related Work:** - Single-person motion synthesis: Align-based models and condition-based models. - Multi-person motion synthesis: Motion graphs, momentum-based inverse kinematics, and diffusion-based models. - Diffusion models: Stochastic diffusion processes and their applications in motion synthesis. **Preliminaries:** - Diffusion model for motion synthesis: Gradual denoising of Gaussian noise. - Motion interaction representation: Non-canonical representation to maintain relative positions between individuals. **Method:** - Overview: Decomposing joint motion into conditional distributions. - Number-free motion generation: Generation and interaction modules. - Training process: Two stages for single-person and conditional motion generation. - Spatial control: Explicit and implicit guidance for multi-human motion control. - Loss function: Reconstruction loss and regularization losses. **Experiments:** - Datasets and metrics: InterHuman dataset and evaluation metrics. - Implementation details: Hyperparameters and training settings. - Quantitative results: Comparison with state-of-the-art methods. - Ablation studies: Impact of various components on performance. - Qualitative results: Single, two, and three-person motion synthesis. **Conclusion and Limitations:** - FreeMotion achieves universal motion synthesis for any number of individuals. - Limitations include potential mismatches between text and movements and limited interaction understanding in multi-person scenarios.**FreeMotion: A Unified Framework for Number-free Text-to-Motion Synthesis** This paper addresses the challenge of generating human motions from text, particularly focusing on multi-person scenarios. Existing methods are limited to single or two-person scenarios and lack universality. To overcome this, the authors propose FreeMotion, a unified framework that enables the generation of motions for any number of individuals. The key innovation is the use of conditional motion distribution to decouple the process of conditional motion generation, allowing for flexible and precise control over multi-person motions. The framework consists of two main modules: the generation module and the interaction module. The generation module generates diverse single-person motions based on text prompts, while the interaction module injects condition signals (motions of other individuals) into the motion generation process. This decoupling allows for the seamless integration of existing single-person motion control methods, enabling precise spatial control over multi-person motions. Experiments demonstrate the superior performance of FreeMotion in generating both single and multi-person motions, outperforming state-of-the-art methods. The framework also supports flexible spatial control, allowing for explicit and implicit guidance to control the spatial locations of multi-human motions. **Contributions:** 1. A new paradigm for unified motion synthesis for single and multiple people. 2. Decoupled generation and interaction modules for conditional motion generation. 3. Precise control of multi-person motion through flexible spatial signals. **Keywords:** Text-to-motion synthesis, Diffusion models **Related Work:** - Single-person motion synthesis: Align-based models and condition-based models. - Multi-person motion synthesis: Motion graphs, momentum-based inverse kinematics, and diffusion-based models. - Diffusion models: Stochastic diffusion processes and their applications in motion synthesis. **Preliminaries:** - Diffusion model for motion synthesis: Gradual denoising of Gaussian noise. - Motion interaction representation: Non-canonical representation to maintain relative positions between individuals. **Method:** - Overview: Decomposing joint motion into conditional distributions. - Number-free motion generation: Generation and interaction modules. - Training process: Two stages for single-person and conditional motion generation. - Spatial control: Explicit and implicit guidance for multi-human motion control. - Loss function: Reconstruction loss and regularization losses. **Experiments:** - Datasets and metrics: InterHuman dataset and evaluation metrics. - Implementation details: Hyperparameters and training settings. - Quantitative results: Comparison with state-of-the-art methods. - Ablation studies: Impact of various components on performance. - Qualitative results: Single, two, and three-person motion synthesis. **Conclusion and Limitations:** - FreeMotion achieves universal motion synthesis for any number of individuals. - Limitations include potential mismatches between text and movements and limited interaction understanding in multi-person scenarios.
Reach us at info@study.space
Understanding FreeMotion%3A A Unified Framework for Number-free Text-to-Motion Synthesis