Seamless Human Motion Composition with Blended Positional Encodings

Seamless Human Motion Composition with Blended Positional Encodings

23 Feb 2024 | German Barquero, Sergio Escalera, Cristina Palmero
FlowMDM is a novel diffusion-based model designed to generate seamless and continuous sequences of human motion from textual descriptions. It addresses the challenge of generating long, realistic motion sequences by introducing Blended Positional Encodings (BPE), which combine absolute and relative positional encodings to ensure global motion coherence and smooth transitions between actions. BPE leverages the iterative nature of diffusion models to recover global motion coherence at the absolute stage and build smooth transitions at the relative stage. Additionally, FlowMDM introduces Pose-Centric Cross-Attention (PCCAT), a new attention technique that ensures each pose is denoised based on its own condition and neighboring poses, making it robust to varying text descriptions at inference time. The model achieves state-of-the-art results on the Babel and HumanML3D datasets in terms of accuracy, realism, and smoothness. To address the limitations of existing Human Motion Composition (HMC) metrics, two new metrics, Peak Jerk (PJ) and Area Under the Jerk (AUJ), are proposed to detect abrupt transitions. FlowMDM excels in generating long, continuous motion sequences with seamless transitions, outperforming other methods in terms of smoothness and realism.FlowMDM is a novel diffusion-based model designed to generate seamless and continuous sequences of human motion from textual descriptions. It addresses the challenge of generating long, realistic motion sequences by introducing Blended Positional Encodings (BPE), which combine absolute and relative positional encodings to ensure global motion coherence and smooth transitions between actions. BPE leverages the iterative nature of diffusion models to recover global motion coherence at the absolute stage and build smooth transitions at the relative stage. Additionally, FlowMDM introduces Pose-Centric Cross-Attention (PCCAT), a new attention technique that ensures each pose is denoised based on its own condition and neighboring poses, making it robust to varying text descriptions at inference time. The model achieves state-of-the-art results on the Babel and HumanML3D datasets in terms of accuracy, realism, and smoothness. To address the limitations of existing Human Motion Composition (HMC) metrics, two new metrics, Peak Jerk (PJ) and Area Under the Jerk (AUJ), are proposed to detect abrupt transitions. FlowMDM excels in generating long, continuous motion sequences with seamless transitions, outperforming other methods in terms of smoothness and realism.
Reach us at info@study.space