Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields

Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields

| Jonathan T. Barron, Ben Mildenhall, Matthew Tancik, Peter Hedman, Ricardo Martin-Brualla, Pratul P. Srinivasan
Mip-NeRF is a multiscale representation for anti-aliasing neural radiance fields (NeRF). NeRF samples a scene with a single ray per pixel, which can lead to excessive blurring or aliasing when training or testing images observe scene content at different resolutions. Mip-NeRF extends NeRF to represent the scene at a continuously-valued scale by efficiently rendering anti-aliased conical frustums instead of rays. This approach reduces objectionable aliasing artifacts and significantly improves NeRF's ability to represent fine details. Mip-NeRF is 7% faster than NeRF and half the size. Compared to NeRF, Mip-NeRF reduces average error rates by 17% on the dataset presented with NeRF and by 60% on a challenging multiscale variant of that dataset. Mip-NeRF also matches the accuracy of a brute-force supersampled NeRF on our multiscale dataset while being 22× faster. Mip-NeRF is inspired by the mipmap approach used in computer graphics to prevent aliasing. A mipmap represents a signal at different discrete scales and selects the appropriate scale for a ray based on the projection of the pixel footprint onto the geometry intersected by that ray. Mip-NeRF uses a 3D Gaussian to represent the region over which the radiance field should be integrated. It then renders a prefiltered pixel by querying Mip-NeRF at intervals along a cone, using Gaussians that approximate the conical frustums corresponding to the pixel. To encode a 3D position and its surrounding Gaussian region, Mip-NeRF proposes an integrated positional encoding (IPE), which is a generalization of NeRF's positional encoding (PE) that allows a region of space to be compactly featurized. Mip-NeRF substantially improves upon the accuracy of NeRF, especially in situations where scene content is observed at different resolutions. On a challenging multiresolution benchmark, Mip-NeRF reduces error rates relative to NeRF by 60% on average. Mip-NeRF's scale-aware structure also allows merging the separate "coarse" and "fine" MLPs used by NeRF into a single MLP, making it slightly faster and reducing model size by 50%. Mip-NeRF is implemented on top of JaxNeRF, a JAX reimplementation of NeRF that achieves better accuracy and trains faster than the original TensorFlow implementation. Mip-NeRF outperforms NeRF on both the single-scale and multiscale Blender datasets, reducing average error by 17% on the single-scale dataset and 60% on the multiscale dataset. Mip-NeRF is also able to match the accuracy of a brute-force supersampled NeRF variant while being 22× faster.Mip-NeRF is a multiscale representation for anti-aliasing neural radiance fields (NeRF). NeRF samples a scene with a single ray per pixel, which can lead to excessive blurring or aliasing when training or testing images observe scene content at different resolutions. Mip-NeRF extends NeRF to represent the scene at a continuously-valued scale by efficiently rendering anti-aliased conical frustums instead of rays. This approach reduces objectionable aliasing artifacts and significantly improves NeRF's ability to represent fine details. Mip-NeRF is 7% faster than NeRF and half the size. Compared to NeRF, Mip-NeRF reduces average error rates by 17% on the dataset presented with NeRF and by 60% on a challenging multiscale variant of that dataset. Mip-NeRF also matches the accuracy of a brute-force supersampled NeRF on our multiscale dataset while being 22× faster. Mip-NeRF is inspired by the mipmap approach used in computer graphics to prevent aliasing. A mipmap represents a signal at different discrete scales and selects the appropriate scale for a ray based on the projection of the pixel footprint onto the geometry intersected by that ray. Mip-NeRF uses a 3D Gaussian to represent the region over which the radiance field should be integrated. It then renders a prefiltered pixel by querying Mip-NeRF at intervals along a cone, using Gaussians that approximate the conical frustums corresponding to the pixel. To encode a 3D position and its surrounding Gaussian region, Mip-NeRF proposes an integrated positional encoding (IPE), which is a generalization of NeRF's positional encoding (PE) that allows a region of space to be compactly featurized. Mip-NeRF substantially improves upon the accuracy of NeRF, especially in situations where scene content is observed at different resolutions. On a challenging multiresolution benchmark, Mip-NeRF reduces error rates relative to NeRF by 60% on average. Mip-NeRF's scale-aware structure also allows merging the separate "coarse" and "fine" MLPs used by NeRF into a single MLP, making it slightly faster and reducing model size by 50%. Mip-NeRF is implemented on top of JaxNeRF, a JAX reimplementation of NeRF that achieves better accuracy and trains faster than the original TensorFlow implementation. Mip-NeRF outperforms NeRF on both the single-scale and multiscale Blender datasets, reducing average error by 17% on the single-scale dataset and 60% on the multiscale dataset. Mip-NeRF is also able to match the accuracy of a brute-force supersampled NeRF variant while being 22× faster.
Reach us at info@study.space