16 May 2024 | Ruiqi Gao, Aleksander Holyński, Philipp Henzler, Arthur Brussee, Ricardo Martin-Brualla, Pratul Srinivasan, Jonathan T. Barron, Ben Poole
CAT3D is a novel method for creating 3D scenes from any number of input images, including real or generated images. It leverages a multi-view diffusion model to generate highly consistent novel views of a 3D scene, which are then used in a robust 3D reconstruction pipeline to produce high-quality 3D representations. This approach reduces the need for extensive multi-view captures, making 3D content creation more accessible and efficient. CAT3D can generate entire 3D scenes in as little as one minute and outperforms existing methods for single image and few-view 3D scene creation. The system is evaluated on various datasets, demonstrating superior performance in few-view reconstruction and single image to 3D tasks. Key contributions include the use of multi-view diffusion models and a modified NeRF training procedure to improve robustness to inconsistent input views.CAT3D is a novel method for creating 3D scenes from any number of input images, including real or generated images. It leverages a multi-view diffusion model to generate highly consistent novel views of a 3D scene, which are then used in a robust 3D reconstruction pipeline to produce high-quality 3D representations. This approach reduces the need for extensive multi-view captures, making 3D content creation more accessible and efficient. CAT3D can generate entire 3D scenes in as little as one minute and outperforms existing methods for single image and few-view 3D scene creation. The system is evaluated on various datasets, demonstrating superior performance in few-view reconstruction and single image to 3D tasks. Key contributions include the use of multi-view diffusion models and a modified NeRF training procedure to improve robustness to inconsistent input views.