TC4D: Trajectory-Conditioned Text-to-4D Generation

TC4D: Trajectory-Conditioned Text-to-4D Generation

11 Apr 2024 | Sherwin Bahmani*, Xian Liu*, Yifan Wang*, Ivan Skorokhodov, Victor Rong, Ziwei Liu, Xihui Liu, Jeong Joon Park, Sergey Tulyakov, Gordon Wetzstein, Andrea Tagliasacchi*, and David B. Lindell*
TC4D is a trajectory-conditioned text-to-4D generation method that decomposes motion into global and local components. It uses a rigid transformation along a trajectory parameterized by a spline to represent global motion and learns local deformations using supervision from a text-to-video model. The method enables the synthesis of scenes animated along arbitrary trajectories, compositional scene generation, and significant improvements in realism and motion amount. The approach is evaluated qualitatively and through a user study, demonstrating state-of-the-art results for text-to-4D generation. The method addresses the limitations of existing techniques by decoupling motion into global and local components, allowing for more realistic and extensive motion synthesis. It uses trajectory-aware video score distillation sampling (VSDS) to optimize local deformations, with an annealing procedure to improve temporal consistency. The method is evaluated against existing approaches, showing statistically significant improvements in motion quality, structure quality, and overall preference. The results demonstrate that TC4D produces more realistic and coherent motion compared to existing methods, with a clear advantage in motion quality and overall preference. The method is supported by extensive experiments and ablation studies, showing the effectiveness of trajectory conditioning in generating high-quality 4D scenes.TC4D is a trajectory-conditioned text-to-4D generation method that decomposes motion into global and local components. It uses a rigid transformation along a trajectory parameterized by a spline to represent global motion and learns local deformations using supervision from a text-to-video model. The method enables the synthesis of scenes animated along arbitrary trajectories, compositional scene generation, and significant improvements in realism and motion amount. The approach is evaluated qualitatively and through a user study, demonstrating state-of-the-art results for text-to-4D generation. The method addresses the limitations of existing techniques by decoupling motion into global and local components, allowing for more realistic and extensive motion synthesis. It uses trajectory-aware video score distillation sampling (VSDS) to optimize local deformations, with an annealing procedure to improve temporal consistency. The method is evaluated against existing approaches, showing statistically significant improvements in motion quality, structure quality, and overall preference. The results demonstrate that TC4D produces more realistic and coherent motion compared to existing methods, with a clear advantage in motion quality and overall preference. The method is supported by extensive experiments and ablation studies, showing the effectiveness of trajectory conditioning in generating high-quality 4D scenes.
Reach us at info@study.space
[slides] TC4D%3A Trajectory-Conditioned Text-to-4D Generation | StudySpace