23 Mar 2020 | Michael Niemeyer, Lars Mescheder, Michael Oechsle, Andreas Geiger
This paper introduces Differentiable Volumetric Rendering (DVR), a method for learning implicit 3D shape and texture representations from 2D images without requiring 3D supervision. The key insight is that depth gradients can be derived analytically using implicit differentiation, enabling the learning of implicit representations directly from RGB images. The method allows for single-view and multi-view 3D reconstruction, producing watertight meshes. DVR is based on implicit neural representations, which do not require discretization and have a constant memory footprint. The method uses a differentiable renderer that can compute gradients of the loss function with respect to the network parameters, enabling efficient learning. The approach is tested on various datasets and shows that it can rival methods with full 3D supervision. The method is implemented using reverse-mode automatic differentiation and is able to handle both single-view and multi-view reconstruction tasks. The results show that DVR can produce accurate 3D shape and texture representations from 2D images, outperforming existing methods in some cases. The paper also discusses related work and provides implementation details for the proposed method.This paper introduces Differentiable Volumetric Rendering (DVR), a method for learning implicit 3D shape and texture representations from 2D images without requiring 3D supervision. The key insight is that depth gradients can be derived analytically using implicit differentiation, enabling the learning of implicit representations directly from RGB images. The method allows for single-view and multi-view 3D reconstruction, producing watertight meshes. DVR is based on implicit neural representations, which do not require discretization and have a constant memory footprint. The method uses a differentiable renderer that can compute gradients of the loss function with respect to the network parameters, enabling efficient learning. The approach is tested on various datasets and shows that it can rival methods with full 3D supervision. The method is implemented using reverse-mode automatic differentiation and is able to handle both single-view and multi-view reconstruction tasks. The results show that DVR can produce accurate 3D shape and texture representations from 2D images, outperforming existing methods in some cases. The paper also discusses related work and provides implementation details for the proposed method.