IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation

IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation

13 Feb 2024 | Luke Melas-Kyriazi, Iro Laina, Christian Rupprecht, Natalia Neverova, Andrea Vedaldi, Oran Gafni, Filippos Kokkinos
IM-3D is a text-to-3D generation method that improves multi-view generation by using video diffusion models instead of image-based ones. It leverages a video generator, Emu Video, to produce multiple consistent views of an object, which are then used to reconstruct a 3D model using Gaussian splatting. This approach avoids the need for Score Distillation Sampling (SDS) and large reconstruction networks, resulting in a more efficient and higher-quality 3D generation process. The method reduces the number of evaluations of the 2D generator by up to 10-100 times, leading to faster and more robust 3D reconstructions. IM-3D generates high-quality 3D assets directly from text and image prompts, using a combination of video diffusion and 3D reconstruction techniques. The method is compared to other state-of-the-art approaches and shows superior performance in terms of faithfulness to text and visual prompts, as well as in terms of quality and efficiency. The key contributions of IM-3D include the use of video diffusion for multi-view generation, the application of Gaussian splatting for 3D reconstruction, and the iterative refinement process that enhances the quality of the generated 3D assets. The method is efficient, robust, and produces high-quality 3D results without the need for large reconstruction networks or SDS losses.IM-3D is a text-to-3D generation method that improves multi-view generation by using video diffusion models instead of image-based ones. It leverages a video generator, Emu Video, to produce multiple consistent views of an object, which are then used to reconstruct a 3D model using Gaussian splatting. This approach avoids the need for Score Distillation Sampling (SDS) and large reconstruction networks, resulting in a more efficient and higher-quality 3D generation process. The method reduces the number of evaluations of the 2D generator by up to 10-100 times, leading to faster and more robust 3D reconstructions. IM-3D generates high-quality 3D assets directly from text and image prompts, using a combination of video diffusion and 3D reconstruction techniques. The method is compared to other state-of-the-art approaches and shows superior performance in terms of faithfulness to text and visual prompts, as well as in terms of quality and efficiency. The key contributions of IM-3D include the use of video diffusion for multi-view generation, the application of Gaussian splatting for 3D reconstruction, and the iterative refinement process that enhances the quality of the generated 3D assets. The method is efficient, robust, and produces high-quality 3D results without the need for large reconstruction networks or SDS losses.
Reach us at info@study.space
[slides] IM-3D%3A Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation | StudySpace