28 Jan 2020 | Vincent Sitzmann, Michael Zollhöfer, Gordon Wetzstein
Scene Representation Networks (SRNs) are a novel approach to neural scene representation that explicitly models both 3D geometry and appearance. Unlike existing methods that either require explicit 3D supervision or fail to enforce 3D structure, SRNs represent scenes as continuous, differentiable functions mapping world coordinates to feature representations of local scene properties. This formulation allows SRNs to be trained end-to-end from only 2D images and their camera poses, without access to depth or shape information. SRNs can generate high-quality images at arbitrary resolutions and generalize to unseen camera transformations and intrinsic parameters. The key contributions of SRNs include:
1. **Continuous, 3D-Structure-Aware Representation**: SRNs represent scenes as continuous functions that map world coordinates to feature representations, enforcing 3D structure and allowing generalization of shape and appearance priors across scenes.
2. **End-to-End Training**: SRNs can be trained end-to-end from posed 2D images and their camera poses, without explicit 3D supervision.
3. **Performance on Multiple Tasks**: SRNs demonstrate superior performance in novel view synthesis, few-shot reconstruction, joint shape and appearance interpolation, and unsupervised discovery of a non-rigid face model.
SRNs are evaluated on various challenging 3D computer vision problems, showing significant improvements over recent baselines. The paper also discusses future work, including probabilistic frameworks, modeling view- and lighting-dependent effects, and extending SRNs to other image formation models.Scene Representation Networks (SRNs) are a novel approach to neural scene representation that explicitly models both 3D geometry and appearance. Unlike existing methods that either require explicit 3D supervision or fail to enforce 3D structure, SRNs represent scenes as continuous, differentiable functions mapping world coordinates to feature representations of local scene properties. This formulation allows SRNs to be trained end-to-end from only 2D images and their camera poses, without access to depth or shape information. SRNs can generate high-quality images at arbitrary resolutions and generalize to unseen camera transformations and intrinsic parameters. The key contributions of SRNs include:
1. **Continuous, 3D-Structure-Aware Representation**: SRNs represent scenes as continuous functions that map world coordinates to feature representations, enforcing 3D structure and allowing generalization of shape and appearance priors across scenes.
2. **End-to-End Training**: SRNs can be trained end-to-end from posed 2D images and their camera poses, without explicit 3D supervision.
3. **Performance on Multiple Tasks**: SRNs demonstrate superior performance in novel view synthesis, few-shot reconstruction, joint shape and appearance interpolation, and unsupervised discovery of a non-rigid face model.
SRNs are evaluated on various challenging 3D computer vision problems, showing significant improvements over recent baselines. The paper also discusses future work, including probabilistic frameworks, modeling view- and lighting-dependent effects, and extending SRNs to other image formation models.