Animate3D: Animating Any 3D Model with Multi-view Video Diffusion

Animate3D: Animating Any 3D Model with Multi-view Video Diffusion

2024-07-16 | Yanqin Jiang, Chaohui Yu, Chenjie Cao, Fan Wang, Weiming Hu, Jin Gao
Animate3D is a novel framework for animating any 3D model using multi-view video diffusion. The framework introduces a multi-view video diffusion model (MV-VDM) trained on a large-scale multi-view video dataset (MV-Video), which enables spatiotemporal consistency in 4D generation. MV-VDM is combined with a 4D Score Distillation Sampling (4D-SDS) method to animate 3D objects. The framework also incorporates a spatiotemporal attention module to enhance spatial and temporal consistency by integrating 3D and video diffusion models. Additionally, the static 3D model's multi-view renderings are used as conditions to preserve its identity. An effective two-stage pipeline is proposed for animating 3D models: first, motions are reconstructed directly from generated multi-view videos, followed by 4D-SDS to refine both appearance and motion. Qualitative and quantitative experiments show that Animate3D significantly outperforms previous approaches. The dataset, code, and models will be open-sourced. The framework addresses two main challenges in 4D generation: the lack of foundational models for spatiotemporal consistency and the inability to animate existing 3D assets through multi-view conditions. Animate3D leverages multi-view video diffusion priors to animate 3D objects, enabling the creation of high-quality, dynamic 3D content. The framework is evaluated on various metrics and compared with state-of-the-art methods, demonstrating its effectiveness in generating high-quality, spatiotemporally consistent 4D content.Animate3D is a novel framework for animating any 3D model using multi-view video diffusion. The framework introduces a multi-view video diffusion model (MV-VDM) trained on a large-scale multi-view video dataset (MV-Video), which enables spatiotemporal consistency in 4D generation. MV-VDM is combined with a 4D Score Distillation Sampling (4D-SDS) method to animate 3D objects. The framework also incorporates a spatiotemporal attention module to enhance spatial and temporal consistency by integrating 3D and video diffusion models. Additionally, the static 3D model's multi-view renderings are used as conditions to preserve its identity. An effective two-stage pipeline is proposed for animating 3D models: first, motions are reconstructed directly from generated multi-view videos, followed by 4D-SDS to refine both appearance and motion. Qualitative and quantitative experiments show that Animate3D significantly outperforms previous approaches. The dataset, code, and models will be open-sourced. The framework addresses two main challenges in 4D generation: the lack of foundational models for spatiotemporal consistency and the inability to animate existing 3D assets through multi-view conditions. Animate3D leverages multi-view video diffusion priors to animate 3D objects, enabling the creation of high-quality, dynamic 3D content. The framework is evaluated on various metrics and compared with state-of-the-art methods, demonstrating its effectiveness in generating high-quality, spatiotemporally consistent 4D content.
Reach us at info@study.space