22 Feb 2024 | XIN-YANG ZHENG, HAO PAN, YU-XIAO GUO, XIN TONG, YANG LIU
MVD² is an efficient method for 3D reconstruction from multiview diffusion (MVD) images. It addresses the challenges of sparse and inconsistent views in MVD images, which hinder traditional 3D reconstruction. MVD² aggregates image features into a 3D feature volume using projection and convolution, then decodes the volumetric features into a 3D mesh. It is trained with 3D shape collections and MVD images generated by rendered views of 3D shapes. A view-dependent training scheme is designed to align the inferred shape with the ground-truth geometry at the prompt view and maintain local structural similarity at other views. MVD² improves the 3D generation quality of MVD and is fast and robust to various MVD methods. It can efficiently decode 3D meshes from multiview images within one second. MVD² is trained with Zero-123++ and ObjectVerse-LVIS 3D datasets and demonstrates superior performance in generating 3D models from multiview images generated by different MVD methods, using both synthetic and real images as prompts. The method is lightweight, efficient, and generalizable across different MVD models. It is evaluated on unseen multiview images generated by Zero-123++ and other MVD methods, showing significant improvements in quality and efficiency. The results demonstrate that MVD² effectively reconstructs 3D shapes from MVD images, with high-quality geometry and texture mapping. The method is also robust to minor inconsistencies in generated images and can handle diverse view configurations. However, it has limitations in reconstructing thin geometric structures due to GPU memory constraints. MVD² is a promising approach for 3D reconstruction from MVD images, offering high-quality and efficient results.MVD² is an efficient method for 3D reconstruction from multiview diffusion (MVD) images. It addresses the challenges of sparse and inconsistent views in MVD images, which hinder traditional 3D reconstruction. MVD² aggregates image features into a 3D feature volume using projection and convolution, then decodes the volumetric features into a 3D mesh. It is trained with 3D shape collections and MVD images generated by rendered views of 3D shapes. A view-dependent training scheme is designed to align the inferred shape with the ground-truth geometry at the prompt view and maintain local structural similarity at other views. MVD² improves the 3D generation quality of MVD and is fast and robust to various MVD methods. It can efficiently decode 3D meshes from multiview images within one second. MVD² is trained with Zero-123++ and ObjectVerse-LVIS 3D datasets and demonstrates superior performance in generating 3D models from multiview images generated by different MVD methods, using both synthetic and real images as prompts. The method is lightweight, efficient, and generalizable across different MVD models. It is evaluated on unseen multiview images generated by Zero-123++ and other MVD methods, showing significant improvements in quality and efficiency. The results demonstrate that MVD² effectively reconstructs 3D shapes from MVD images, with high-quality geometry and texture mapping. The method is also robust to minor inconsistencies in generated images and can handle diverse view configurations. However, it has limitations in reconstructing thin geometric structures due to GPU memory constraints. MVD² is a promising approach for 3D reconstruction from MVD images, offering high-quality and efficient results.