Understanding Seamless Human Motion Composition with Blended Positional Encodings

FlowMDM is a diffusion-based model that generates seamless human motion compositions from textual descriptions without postprocessing or redundant denoising steps. It introduces Blended Positional Encodings (BPE), combining absolute and relative positional encodings to achieve global motion coherence and smooth transitions. The model also employs Pose-Centric Cross-ATtention (PCCAT) to handle varying text descriptions during inference. FlowMDM excels in generating realistic and smooth transitions for periodic motions like walking, and introduces new metrics, Peak Jerk (PJ) and Area Under the Jerk (AUJ), to evaluate transition quality. It outperforms existing methods on the Babel and HumanML3D datasets in terms of accuracy, realism, and smoothness. FlowMDM avoids redundant denoising steps, making it more efficient than other diffusion-based models. It also demonstrates superior performance in extrapolating human motion sequences and maintaining motion coherence across transitions. The model's ability to generate long, continuous motion sequences with seamless transitions makes it a significant advancement in human motion generation.FlowMDM is a diffusion-based model that generates seamless human motion compositions from textual descriptions without postprocessing or redundant denoising steps. It introduces Blended Positional Encodings (BPE), combining absolute and relative positional encodings to achieve global motion coherence and smooth transitions. The model also employs Pose-Centric Cross-ATtention (PCCAT) to handle varying text descriptions during inference. FlowMDM excels in generating realistic and smooth transitions for periodic motions like walking, and introduces new metrics, Peak Jerk (PJ) and Area Under the Jerk (AUJ), to evaluate transition quality. It outperforms existing methods on the Babel and HumanML3D datasets in terms of accuracy, realism, and smoothness. FlowMDM avoids redundant denoising steps, making it more efficient than other diffusion-based models. It also demonstrates superior performance in extrapolating human motion sequences and maintaining motion coherence across transitions. The model's ability to generate long, continuous motion sequences with seamless transitions makes it a significant advancement in human motion generation.

Seamless Human Motion Composition with Blended Positional Encodings

23 Feb 2024 | German Barquero, Sergio Escalera, Cristina Palmero