15 Apr 2024 | Siwei Zhang, Bharat Lal Bhatnagar, Yuanlu Xu, Alexander Winkler, Petr Kadlecak, Siyu Tang, Federica Bogo
RoHM (Robust Human Motion Reconstruction via Diffusion) is a novel approach for robust 3D human motion reconstruction from monocular RGB(-D) videos, even in the presence of noise and occlusions. Unlike previous methods that either directly regress motion or combine data-driven priors with optimization, RoHM leverages the iterative denoising nature of diffusion models. The method decomposes the problem into two sub-tasks: global trajectory reconstruction and local motion prediction, each learned by a separate diffusion model. To capture the correlations between global and local motion, a flexible conditioning module, TrajControl, is introduced, which fine-tunes the global trajectory model using denoised local motion. This module, combined with an iterative inference scheme, improves the quality of both global and local motion. Extensive experiments on three datasets show that RoHM outperforms state-of-the-art methods in terms of both qualitative and quantitative metrics, while being significantly faster at test time.RoHM (Robust Human Motion Reconstruction via Diffusion) is a novel approach for robust 3D human motion reconstruction from monocular RGB(-D) videos, even in the presence of noise and occlusions. Unlike previous methods that either directly regress motion or combine data-driven priors with optimization, RoHM leverages the iterative denoising nature of diffusion models. The method decomposes the problem into two sub-tasks: global trajectory reconstruction and local motion prediction, each learned by a separate diffusion model. To capture the correlations between global and local motion, a flexible conditioning module, TrajControl, is introduced, which fine-tunes the global trajectory model using denoised local motion. This module, combined with an iterative inference scheme, improves the quality of both global and local motion. Extensive experiments on three datasets show that RoHM outperforms state-of-the-art methods in terms of both qualitative and quantitative metrics, while being significantly faster at test time.