Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations

Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations

28 Jan 2020 | Vincent Sitzmann, Michael Zollhöfer, Gordon Wetzstein
Scene Representation Networks (SRNs) are a continuous, 3D-structure-aware neural scene representation that encodes both geometry and appearance. SRNs represent scenes as continuous functions that map world coordinates to a feature representation of local scene properties. By formulating image formation as a differentiable ray-marching algorithm, SRNs can be trained end-to-end from 2D images and camera poses without access to depth or shape. This formulation allows for generalization across scenes, learning powerful geometry and appearance priors. SRNs are evaluated for novel view synthesis, few-shot reconstruction, joint shape and appearance interpolation, and unsupervised discovery of a non-rigid face model, outperforming baselines. SRNs are trained using a differentiable ray-marching algorithm and a pixel generator network. The ray-marching algorithm uses a learned LSTM to predict step lengths, while the pixel generator maps feature vectors to color. SRNs generalize across scenes and can handle camera transformations and intrinsic parameters unseen during training. They are evaluated on various 3D computer vision tasks, including novel view synthesis, few-shot reconstruction, and non-rigid face model discovery. SRNs do not require explicit 3D supervision and can be trained from 2D images alone. They are capable of generating high-quality images without 2D convolutions and can handle arbitrary resolutions. SRNs are also capable of reconstructing geometry in a fully unsupervised manner and can be used for tasks such as pose extrapolation and latent space interpolation. However, they may fail in cases where objects are far from the training distribution or when surfaces are occluded. SRNs have potential applications in robotics and computer graphics, and can represent room-scale scenes. Future work includes extending SRNs to probabilistic frameworks, modeling view- and lighting-dependent effects, and integrating them with camera pose estimation algorithms.Scene Representation Networks (SRNs) are a continuous, 3D-structure-aware neural scene representation that encodes both geometry and appearance. SRNs represent scenes as continuous functions that map world coordinates to a feature representation of local scene properties. By formulating image formation as a differentiable ray-marching algorithm, SRNs can be trained end-to-end from 2D images and camera poses without access to depth or shape. This formulation allows for generalization across scenes, learning powerful geometry and appearance priors. SRNs are evaluated for novel view synthesis, few-shot reconstruction, joint shape and appearance interpolation, and unsupervised discovery of a non-rigid face model, outperforming baselines. SRNs are trained using a differentiable ray-marching algorithm and a pixel generator network. The ray-marching algorithm uses a learned LSTM to predict step lengths, while the pixel generator maps feature vectors to color. SRNs generalize across scenes and can handle camera transformations and intrinsic parameters unseen during training. They are evaluated on various 3D computer vision tasks, including novel view synthesis, few-shot reconstruction, and non-rigid face model discovery. SRNs do not require explicit 3D supervision and can be trained from 2D images alone. They are capable of generating high-quality images without 2D convolutions and can handle arbitrary resolutions. SRNs are also capable of reconstructing geometry in a fully unsupervised manner and can be used for tasks such as pose extrapolation and latent space interpolation. However, they may fail in cases where objects are far from the training distribution or when surfaces are occluded. SRNs have potential applications in robotics and computer graphics, and can represent room-scale scenes. Future work includes extending SRNs to probabilistic frameworks, modeling view- and lighting-dependent effects, and integrating them with camera pose estimation algorithms.
Reach us at info@study.space
[slides] Scene Representation Networks%3A Continuous 3D-Structure-Aware Neural Scene Representations | StudySpace