27 Nov 2020 | Albert Pumarola, Enric Corona, Gerard Pons-Moll, Francesc Moreno-Noguer
D-NeRF is a neural radiance field method for dynamic scenes that enables the synthesis of novel views at arbitrary times. It extends NeRF to handle dynamic scenes with non-rigid geometries, allowing reconstruction and rendering of objects under rigid and non-rigid motions from a single moving camera. The method introduces time as an additional input and splits the learning process into two stages: one that encodes the scene into a canonical space and another that maps this canonical representation to the deformed scene at a particular time. Both mappings are learned using fully connected networks. Once trained, D-NeRF can render novel images, controlling both the camera view and time variable, thus controlling object movement. The method is evaluated on scenes with objects under rigid, articulated, and non-rigid motions, demonstrating its effectiveness in rendering high-quality images while controlling both camera view and time components. It also produces complete 3D meshes that capture time-varying geometry. D-NeRF is trained solely on monocular data without the need for 3D ground-truth supervision or multi-view camera settings. It is different from prior work in that it does not require 3D reconstruction, can be learned end-to-end, and requires a single view per time instance. Another appealing characteristic is that it inherently learns a time-varying 3D volume density and emitted radiance, turning novel view synthesis into a ray-casting process, which is more robust to rendering images from arbitrary viewpoints. The method is evaluated on dynamic scenes with different types of deformation, showing its ability to synthesize novel views at arbitrary times. It is compared against NeRF and T-NeRF, demonstrating superior performance in capturing dynamic scenes. The results show that D-NeRF can retain high details of the original image in novel views, even when each deformation state has only been seen from a single viewpoint. The method is trained with 400x400 images over 800k iterations using a batch size of 4096 rays, each sampled 64 times along the ray. The model is trained with a single Nvidia GTX 1080 for 2 days. The results show that D-NeRF can synthesize high-quality novel views of scenes undergoing different types of deformation, from articulated objects to human bodies performing complex body postures.D-NeRF is a neural radiance field method for dynamic scenes that enables the synthesis of novel views at arbitrary times. It extends NeRF to handle dynamic scenes with non-rigid geometries, allowing reconstruction and rendering of objects under rigid and non-rigid motions from a single moving camera. The method introduces time as an additional input and splits the learning process into two stages: one that encodes the scene into a canonical space and another that maps this canonical representation to the deformed scene at a particular time. Both mappings are learned using fully connected networks. Once trained, D-NeRF can render novel images, controlling both the camera view and time variable, thus controlling object movement. The method is evaluated on scenes with objects under rigid, articulated, and non-rigid motions, demonstrating its effectiveness in rendering high-quality images while controlling both camera view and time components. It also produces complete 3D meshes that capture time-varying geometry. D-NeRF is trained solely on monocular data without the need for 3D ground-truth supervision or multi-view camera settings. It is different from prior work in that it does not require 3D reconstruction, can be learned end-to-end, and requires a single view per time instance. Another appealing characteristic is that it inherently learns a time-varying 3D volume density and emitted radiance, turning novel view synthesis into a ray-casting process, which is more robust to rendering images from arbitrary viewpoints. The method is evaluated on dynamic scenes with different types of deformation, showing its ability to synthesize novel views at arbitrary times. It is compared against NeRF and T-NeRF, demonstrating superior performance in capturing dynamic scenes. The results show that D-NeRF can retain high details of the original image in novel views, even when each deformation state has only been seen from a single viewpoint. The method is trained with 400x400 images over 800k iterations using a batch size of 4096 rays, each sampled 64 times along the ray. The model is trained with a single Nvidia GTX 1080 for 2 days. The results show that D-NeRF can synthesize high-quality novel views of scenes undergoing different types of deformation, from articulated objects to human bodies performing complex body postures.