16 Jul 2024 | Yanqin Jiang, Chaohui Yu, Chenjie Cao, Fan Wang, Weiming Hu, Jin Gao
Animate3D is a novel framework for animating any static 3D model using multi-view video diffusion. The core idea of Animate3D is twofold: 1) propose a multi-view video diffusion model (MV-VDM) conditioned on multi-view renderings of the static 3D object, trained on a large-scale multi-view video dataset (MV-Video); 2) introduce a framework combining reconstruction and 4D Score Distillation Sampling (4D-SDS) to leverage the multi-view video diffusion priors for animating 3D objects. MV-VDM enhances spatial and temporal consistency by integrating 3D and video diffusion models, while preserving the identity of the static 3D model through its multi-view renderings. Animate3D's two-stage pipeline first reconstructs motions from generated multi-view videos, followed by 4D-SDS to refine both appearance and motion. Extensive experiments demonstrate that Animate3D significantly outperforms previous approaches in terms of spatiotemporal consistency, motion smoothness, dynamic degree, and aesthetic quality. The data, code, and models will be open-released.Animate3D is a novel framework for animating any static 3D model using multi-view video diffusion. The core idea of Animate3D is twofold: 1) propose a multi-view video diffusion model (MV-VDM) conditioned on multi-view renderings of the static 3D object, trained on a large-scale multi-view video dataset (MV-Video); 2) introduce a framework combining reconstruction and 4D Score Distillation Sampling (4D-SDS) to leverage the multi-view video diffusion priors for animating 3D objects. MV-VDM enhances spatial and temporal consistency by integrating 3D and video diffusion models, while preserving the identity of the static 3D model through its multi-view renderings. Animate3D's two-stage pipeline first reconstructs motions from generated multi-view videos, followed by 4D-SDS to refine both appearance and motion. Extensive experiments demonstrate that Animate3D significantly outperforms previous approaches in terms of spatiotemporal consistency, motion smoothness, dynamic degree, and aesthetic quality. The data, code, and models will be open-released.