6 Jan 2021 | Ricardo Martin-Brualla*, Noha Radwan*, Mehdi S. M. Sajjadi*, Jonathan T. Barron, Alexey Dosovitskiy, and Daniel Duckworth
NeRF-W is a neural rendering method that extends Neural Radiance Fields (NeRF) to handle unstructured internet photo collections. The method addresses the limitations of NeRF, which assumes static scenes and is ineffective in scenarios with variable lighting or moving objects. NeRF-W introduces two key improvements: latent appearance modeling and transient object handling. Latent appearance modeling allows the system to account for variations in lighting, exposure, and post-processing by learning a low-dimensional latent space. Transient object handling enables the separation of static and dynamic elements in the scene, allowing for more accurate reconstructions. NeRF-W is applied to real-world photo collections of cultural landmarks, producing high-fidelity, temporally consistent renderings that outperform previous state-of-the-art methods in terms of PSNR and MS-SSIM. The method also demonstrates smooth appearance interpolation and temporal consistency, even for wide camera trajectories. NeRF-W is able to handle a wide range of real-world scenarios, including varying lighting conditions and moving objects, by decomposing the scene into static and transient components. The system is trained using a combination of appearance embeddings and uncertainty fields, allowing it to adapt to different lighting conditions and ignore unreliable pixels. The method is evaluated on a variety of datasets, including the Phototourism dataset, and shows significant improvements over previous approaches in terms of both qualitative and quantitative performance. The results demonstrate that NeRF-W is capable of generating realistic, high-fidelity renderings of complex scenes from unstructured photo collections.NeRF-W is a neural rendering method that extends Neural Radiance Fields (NeRF) to handle unstructured internet photo collections. The method addresses the limitations of NeRF, which assumes static scenes and is ineffective in scenarios with variable lighting or moving objects. NeRF-W introduces two key improvements: latent appearance modeling and transient object handling. Latent appearance modeling allows the system to account for variations in lighting, exposure, and post-processing by learning a low-dimensional latent space. Transient object handling enables the separation of static and dynamic elements in the scene, allowing for more accurate reconstructions. NeRF-W is applied to real-world photo collections of cultural landmarks, producing high-fidelity, temporally consistent renderings that outperform previous state-of-the-art methods in terms of PSNR and MS-SSIM. The method also demonstrates smooth appearance interpolation and temporal consistency, even for wide camera trajectories. NeRF-W is able to handle a wide range of real-world scenarios, including varying lighting conditions and moving objects, by decomposing the scene into static and transient components. The system is trained using a combination of appearance embeddings and uncertainty fields, allowing it to adapt to different lighting conditions and ignore unreliable pixels. The method is evaluated on a variety of datasets, including the Phototourism dataset, and shows significant improvements over previous approaches in terms of both qualitative and quantitative performance. The results demonstrate that NeRF-W is capable of generating realistic, high-fidelity renderings of complex scenes from unstructured photo collections.