pixelNeRF: Neural Radiance Fields from One or Few Images

pixelNeRF: Neural Radiance Fields from One or Few Images

30 May 2021 | Alex Yu, Vickie Ye, Matthew Tancik, Angjoo Kanazawa
pixelNeRF is a learning framework that predicts a Neural Radiance Field (NeRF) representation from one or few input images. Unlike traditional NeRF, which requires many calibrated views and significant compute time, pixelNeRF uses a fully convolutional architecture to condition NeRF on image inputs. This allows the network to be trained across multiple scenes to learn a scene prior, enabling it to perform novel view synthesis from a sparse set of views. The model is trained directly from images with no explicit 3D supervision, and can generate plausible novel views from very few input images without test-time optimization. pixelNeRF outperforms current state-of-the-art baselines for novel view synthesis and single image 3D reconstruction. It is fully convolutional, allowing it to preserve the spatial alignment between the image and the output 3D representation. pixelNeRF can incorporate a variable number of posed input views at test time without requiring any test-time optimization. The model is trained on a dataset of multi-view images without additional supervision such as ground truth 3D shape or object masks. pixelNeRF predicts a NeRF representation in the camera coordinate system of the input image instead of a canonical coordinate frame. This is not only integral for generalization to unseen scenes and object categories but also for flexibility, since no clear canonical coordinate system exists on scenes with multiple objects or real scenes. pixelNeRF has been tested on ShapeNet benchmarks for single image novel view synthesis tasks with held-out objects as well as entire unseen categories. It has also been demonstrated on multi-object ShapeNet scenes and real scenes from the DTU dataset. In all cases, pixelNeRF outperforms current state-of-the-art baselines for novel view synthesis and single image 3D reconstruction. The model is trained with the volume rendering method and loss described in the paper. pixelNeRF is able to generate novel views from a single image input for both category-specific and category-agnostic settings, even in the case of unseen object categories. The model is also tested on real car images and real scenes from the DTU dataset. pixelNeRF is able to generate plausible novel views of a real scene from three posed input views despite being trained on under 100 scenes. The model is fully convolutional, allowing it to preserve the spatial alignment between the image and the output 3D representation. pixelNeRF can incorporate a variable number of posed input views at test time without requiring any test-time optimization. The model is trained on a dataset of multi-view images without additional supervision such as ground truth 3D shape or object masks. pixelNeRF predicts a NeRF representation in the camera coordinate system of the input image instead of a canonical coordinate frame. This is not only integral for generalization to unseen scenes and object categories but also for flexibility, since no clear canonical coordinate system exists on scenes with multiple objects or real scenes. pixelNeRF has been tested on ShapeNet benchmarks for single image novel viewpixelNeRF is a learning framework that predicts a Neural Radiance Field (NeRF) representation from one or few input images. Unlike traditional NeRF, which requires many calibrated views and significant compute time, pixelNeRF uses a fully convolutional architecture to condition NeRF on image inputs. This allows the network to be trained across multiple scenes to learn a scene prior, enabling it to perform novel view synthesis from a sparse set of views. The model is trained directly from images with no explicit 3D supervision, and can generate plausible novel views from very few input images without test-time optimization. pixelNeRF outperforms current state-of-the-art baselines for novel view synthesis and single image 3D reconstruction. It is fully convolutional, allowing it to preserve the spatial alignment between the image and the output 3D representation. pixelNeRF can incorporate a variable number of posed input views at test time without requiring any test-time optimization. The model is trained on a dataset of multi-view images without additional supervision such as ground truth 3D shape or object masks. pixelNeRF predicts a NeRF representation in the camera coordinate system of the input image instead of a canonical coordinate frame. This is not only integral for generalization to unseen scenes and object categories but also for flexibility, since no clear canonical coordinate system exists on scenes with multiple objects or real scenes. pixelNeRF has been tested on ShapeNet benchmarks for single image novel view synthesis tasks with held-out objects as well as entire unseen categories. It has also been demonstrated on multi-object ShapeNet scenes and real scenes from the DTU dataset. In all cases, pixelNeRF outperforms current state-of-the-art baselines for novel view synthesis and single image 3D reconstruction. The model is trained with the volume rendering method and loss described in the paper. pixelNeRF is able to generate novel views from a single image input for both category-specific and category-agnostic settings, even in the case of unseen object categories. The model is also tested on real car images and real scenes from the DTU dataset. pixelNeRF is able to generate plausible novel views of a real scene from three posed input views despite being trained on under 100 scenes. The model is fully convolutional, allowing it to preserve the spatial alignment between the image and the output 3D representation. pixelNeRF can incorporate a variable number of posed input views at test time without requiring any test-time optimization. The model is trained on a dataset of multi-view images without additional supervision such as ground truth 3D shape or object masks. pixelNeRF predicts a NeRF representation in the camera coordinate system of the input image instead of a canonical coordinate frame. This is not only integral for generalization to unseen scenes and object categories but also for flexibility, since no clear canonical coordinate system exists on scenes with multiple objects or real scenes. pixelNeRF has been tested on ShapeNet benchmarks for single image novel view
Reach us at info@study.space