January 2022 | Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
This paper presents a method for synthesizing novel views of complex scenes by optimizing a continuous volumetric scene function using a sparse set of input views. The method represents a scene using a fully connected deep network that takes a single continuous 5D coordinate (spatial location and viewing direction) as input and outputs the volume density and view-dependent emitted radiance at that spatial location. The network is optimized by minimizing the error between observed images and the corresponding views rendered from the representation. The method uses classical volume rendering techniques to project the output colors and densities into an image. Because volume rendering is naturally differentiable, the only input required is a set of images with known camera poses.
The method represents a static scene as a continuous 5D function that outputs the radiance emitted in each direction at each point in space and a density at each point which acts like a differential opacity controlling how much radiance is accumulated by a ray passing through that point. The method optimizes a deep fully connected neural network without any convolutional layers to represent this function by regressing from a single 5D coordinate to a single volume density and view-dependent RGB color. To render this neural radiance field (NeRF) from a particular viewpoint, the method marches camera rays through the scene to generate a sampled set of 3D points, uses those points and their corresponding 2D viewing directions as input to the neural network to produce an output set of colors and densities, and uses classical volume rendering techniques to accumulate those colors and densities into a 2D image.
The method finds that the basic implementation of optimizing a neural radiance field representation for a complex scene does not converge to a sufficiently high-resolution representation. To address this issue, the method transforms input 5D coordinates with a positional encoding that enables the MLP to represent higher frequency functions. The method can represent complex real-world geometry and appearance and is well suited for gradient-based optimization using projected images. By storing a scene in the parameters of a neural network, the method overcomes the prohibitive storage costs of discretized voxel grids when modeling complex scenes at high resolutions. The method demonstrates that its resulting neural radiance field method quantitatively and qualitatively outperforms state-of-the-art view synthesis methods.
The method is compared to other approaches in the literature, including neural 3D shape representations and view synthesis and image-based rendering methods. The method is shown to outperform these approaches in terms of both quantitative and qualitative results. The method is able to render high-resolution photorealistic novel views of real objects and scenes from RGB images captured in natural settings. The method is also shown to be more efficient in terms of storage and computation compared to other methods. The method is able to represent complex scenes with high resolution and detail, and it is able to render novel views from any viewpoint. The method is able to handle complex geometry and appearance, andNeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
This paper presents a method for synthesizing novel views of complex scenes by optimizing a continuous volumetric scene function using a sparse set of input views. The method represents a scene using a fully connected deep network that takes a single continuous 5D coordinate (spatial location and viewing direction) as input and outputs the volume density and view-dependent emitted radiance at that spatial location. The network is optimized by minimizing the error between observed images and the corresponding views rendered from the representation. The method uses classical volume rendering techniques to project the output colors and densities into an image. Because volume rendering is naturally differentiable, the only input required is a set of images with known camera poses.
The method represents a static scene as a continuous 5D function that outputs the radiance emitted in each direction at each point in space and a density at each point which acts like a differential opacity controlling how much radiance is accumulated by a ray passing through that point. The method optimizes a deep fully connected neural network without any convolutional layers to represent this function by regressing from a single 5D coordinate to a single volume density and view-dependent RGB color. To render this neural radiance field (NeRF) from a particular viewpoint, the method marches camera rays through the scene to generate a sampled set of 3D points, uses those points and their corresponding 2D viewing directions as input to the neural network to produce an output set of colors and densities, and uses classical volume rendering techniques to accumulate those colors and densities into a 2D image.
The method finds that the basic implementation of optimizing a neural radiance field representation for a complex scene does not converge to a sufficiently high-resolution representation. To address this issue, the method transforms input 5D coordinates with a positional encoding that enables the MLP to represent higher frequency functions. The method can represent complex real-world geometry and appearance and is well suited for gradient-based optimization using projected images. By storing a scene in the parameters of a neural network, the method overcomes the prohibitive storage costs of discretized voxel grids when modeling complex scenes at high resolutions. The method demonstrates that its resulting neural radiance field method quantitatively and qualitatively outperforms state-of-the-art view synthesis methods.
The method is compared to other approaches in the literature, including neural 3D shape representations and view synthesis and image-based rendering methods. The method is shown to outperform these approaches in terms of both quantitative and qualitative results. The method is able to render high-resolution photorealistic novel views of real objects and scenes from RGB images captured in natural settings. The method is also shown to be more efficient in terms of storage and computation compared to other methods. The method is able to represent complex scenes with high resolution and detail, and it is able to render novel views from any viewpoint. The method is able to handle complex geometry and appearance, and