Flexible Motion In-betweening with Diffusion Models

Flexible Motion In-betweening with Diffusion Models

July 27-August 1, 2024 | Setareh Cohan, Guy Tevet, Daniele Reda, Xue Bin Peng, Michiel van de Panne
This paper introduces a diffusion-based method for flexible motion in-betweening, called Conditional Motion Diffusion In-betweening (CondMDI). The method allows for generating diverse and coherent human motions based on sparse keyframes and text prompts. Unlike previous approaches, CondMDI is a unified model that can handle both spatial and text-based conditioning, enabling flexible keyframe placement and partial keyframe constraints. The model is trained on randomly sampled keyframes and uses a masked conditional diffusion approach to generate high-quality motion sequences that align with the specified constraints. The method is evaluated on the text-conditioned HumanML3D dataset, demonstrating its versatility and effectiveness for keyframe in-betweening. The paper also explores alternative design choices, including imputation and reconstruction guidance methods, for inference-time keyframing. The results show that CondMDI outperforms other methods in terms of motion quality and diversity, while maintaining fast inference speed. The method is flexible and can be applied to various motion generation tasks, including sparse and dense keyframes, partial keyframes, and text conditioning. The paper also discusses limitations and future work, including improving the keyframe selection algorithm and addressing issues related to partial keyframe conditioning.This paper introduces a diffusion-based method for flexible motion in-betweening, called Conditional Motion Diffusion In-betweening (CondMDI). The method allows for generating diverse and coherent human motions based on sparse keyframes and text prompts. Unlike previous approaches, CondMDI is a unified model that can handle both spatial and text-based conditioning, enabling flexible keyframe placement and partial keyframe constraints. The model is trained on randomly sampled keyframes and uses a masked conditional diffusion approach to generate high-quality motion sequences that align with the specified constraints. The method is evaluated on the text-conditioned HumanML3D dataset, demonstrating its versatility and effectiveness for keyframe in-betweening. The paper also explores alternative design choices, including imputation and reconstruction guidance methods, for inference-time keyframing. The results show that CondMDI outperforms other methods in terms of motion quality and diversity, while maintaining fast inference speed. The method is flexible and can be applied to various motion generation tasks, including sparse and dense keyframes, partial keyframes, and text conditioning. The paper also discusses limitations and future work, including improving the keyframe selection algorithm and addressing issues related to partial keyframe conditioning.
Reach us at info@study.space