Nerfies: Deformable Neural Radiance Fields

Nerfies: Deformable Neural Radiance Fields

2021-09-10 | Keunhong Park, Utkarsh Sinha, Jonathan T. Barron, Sofien Bouaziz, Dan B Goldman, Steven M. Seitz, Ricardo Martin-Brualla
Nerfies: Deformable Neural Radiance Fields This paper presents a method for photorealistically reconstructing deformable scenes using photos/videos captured casually from mobile phones. The method extends Neural Radiance Fields (NeRF) by adding a continuous volumetric deformation field that warps each observed point into a canonical 5D NeRF. The deformation field is optimized using a coarse-to-fine approach, and elastic regularization is introduced to improve robustness. The method can turn casually captured selfie photos/videos into deformable NeRF models that allow for photorealistic renderings of the subject from arbitrary viewpoints, which are called "nerfies." The method is evaluated using time-synchronized data from a rig with two mobile phones, yielding train/validation images of the same pose at different viewpoints. The method faithfully reconstructs non-rigidly deforming scenes and reproduces unseen views with high fidelity. The method is based on the idea of decomposing a non-rigidly deforming scene into a template volume represented as a neural radiance field (NeRF) and a per-observation deformation field. The deformation field is our key extension to NeRF and allows us to represent moving subjects. Jointly optimizing a NeRF together with a deformation field leads to an under-constrained optimization problem. To address this, the method introduces elastic regularization, background regularization, and a continuous, coarse-to-fine annealing technique that avoids bad local minima. The method uses a neural radiance field (NeRF) to represent the scene, which is a continuous, volumetric representation that maps a 3D position and viewing direction to a color and density. The method also uses a neural deformation field to represent the deformation of the scene. The deformation field is modeled as a continuous function using an MLP, and is conditioned on a per-frame learned latent deformation code. The method also uses a per-image learned latent code to modulate the color output to handle appearance variations between input frames. The method uses elastic regularization to control the local behavior of the deformation through the Jacobian of the deformation. The method also uses background regularization to prevent the background from moving. The method also uses a coarse-to-fine regularization approach that modulates the capacity of the deformation field to model high frequencies during optimization. The method is evaluated using a rig with two synchronized, rigidly attached, calibrated cameras. The method is able to reconstruct high quality models of human subjects from casually captured selfies, which are called "nerfies." The method is able to handle dynamic scenes with deliberate motions of a human subject, a dog wagging its tail, and two moving objects. The method is able to create smooth animations by interpolating the deformation latent codes of any input state. The method is able to handle quasi-static scenes with subjects attempting to stay as still as possible during capture. The method is able to handle scenes with topological changes, such as opening/closing of the mouth. The method is able toNerfies: Deformable Neural Radiance Fields This paper presents a method for photorealistically reconstructing deformable scenes using photos/videos captured casually from mobile phones. The method extends Neural Radiance Fields (NeRF) by adding a continuous volumetric deformation field that warps each observed point into a canonical 5D NeRF. The deformation field is optimized using a coarse-to-fine approach, and elastic regularization is introduced to improve robustness. The method can turn casually captured selfie photos/videos into deformable NeRF models that allow for photorealistic renderings of the subject from arbitrary viewpoints, which are called "nerfies." The method is evaluated using time-synchronized data from a rig with two mobile phones, yielding train/validation images of the same pose at different viewpoints. The method faithfully reconstructs non-rigidly deforming scenes and reproduces unseen views with high fidelity. The method is based on the idea of decomposing a non-rigidly deforming scene into a template volume represented as a neural radiance field (NeRF) and a per-observation deformation field. The deformation field is our key extension to NeRF and allows us to represent moving subjects. Jointly optimizing a NeRF together with a deformation field leads to an under-constrained optimization problem. To address this, the method introduces elastic regularization, background regularization, and a continuous, coarse-to-fine annealing technique that avoids bad local minima. The method uses a neural radiance field (NeRF) to represent the scene, which is a continuous, volumetric representation that maps a 3D position and viewing direction to a color and density. The method also uses a neural deformation field to represent the deformation of the scene. The deformation field is modeled as a continuous function using an MLP, and is conditioned on a per-frame learned latent deformation code. The method also uses a per-image learned latent code to modulate the color output to handle appearance variations between input frames. The method uses elastic regularization to control the local behavior of the deformation through the Jacobian of the deformation. The method also uses background regularization to prevent the background from moving. The method also uses a coarse-to-fine regularization approach that modulates the capacity of the deformation field to model high frequencies during optimization. The method is evaluated using a rig with two synchronized, rigidly attached, calibrated cameras. The method is able to reconstruct high quality models of human subjects from casually captured selfies, which are called "nerfies." The method is able to handle dynamic scenes with deliberate motions of a human subject, a dog wagging its tail, and two moving objects. The method is able to create smooth animations by interpolating the deformation latent codes of any input state. The method is able to handle quasi-static scenes with subjects attempting to stay as still as possible during capture. The method is able to handle scenes with topological changes, such as opening/closing of the mouth. The method is able to
Reach us at info@study.space
Understanding Nerfies%3A Deformable Neural Radiance Fields