Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines

Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines

July 2019 | BEN MILDENHALL, PRATUL P. SRINIVASAN, RODRIGO ORTIZ-CAYON, NIMA KHADEMI KALANTARI, RAVI RAMAMOORTH, REN NG, ABHISHEK KAR
This paper presents a practical and robust method for view synthesis from a set of input images captured by a handheld camera in an irregular grid pattern. The method is based on a plenoptic sampling framework and uses a deep learning pipeline to promote each sampled view to a layered representation of the scene that can render a limited range of views. The method then synthesizes novel views by blending renderings from adjacent layered representations. Theoretical analysis shows that the number of input views required by the method decreases quadratically with the number of planes predicted for each layered scene representation, up to limits set by the camera field of view. Empirical validation shows that the method can achieve the perceptual quality of Nyquist rate view sampling while using up to 4000 times fewer views. The method is demonstrated with an augmented reality smartphone app that guides users to capture input images and enables real-time virtual exploration on desktop and mobile platforms. The paper also extends traditional plenoptic sampling theory to derive a bound that specifies precisely how densely users should sample views of a given scene when using the algorithm. The method is shown to outperform traditional light field reconstruction methods and state-of-the-art view interpolation algorithms across a range of sub-Nyquist view sampling rates. The paper also demonstrates that carefully crafted deep learning pipelines using local layered scene representations achieve state-of-the-art view synthesis results. The method is validated with extensive experiments on a synthetic test set and real-world scenes, showing that it can render high fidelity novel views of light fields that have been undersampled by up to 4000 times. The paper also compares the method to state-of-the-art view synthesis techniques and view-dependent texture mapping using a global mesh as proxy geometry, showing that it produces superior renderings, particularly for non-Lambertian effects, without the artifacts seen in renderings from competing methods. The method is shown to be effective for rendering non-Lambertian effects because it can accurately model the apparent depth of specularities, which varies with the observation viewpoint. The paper also discusses the practicality of the method, showing that it can be used to capture and render complex real-world scenes for virtual exploration. The method is implemented using a 3D convolutional neural network (CNN) architecture that dynamically adjusts the number of depth planes based on the input view sampling rate, rather than a 2D CNN with a fixed number of output planes. The method is trained on renderings of natural scenes to estimate high quality layered scene representations that produce locally consistent light fields. The paper concludes that the method provides a practical and robust solution for capturing and rendering complex real-world scenes for virtual exploration, and that it achieves state-of-the-art results in view synthesis.This paper presents a practical and robust method for view synthesis from a set of input images captured by a handheld camera in an irregular grid pattern. The method is based on a plenoptic sampling framework and uses a deep learning pipeline to promote each sampled view to a layered representation of the scene that can render a limited range of views. The method then synthesizes novel views by blending renderings from adjacent layered representations. Theoretical analysis shows that the number of input views required by the method decreases quadratically with the number of planes predicted for each layered scene representation, up to limits set by the camera field of view. Empirical validation shows that the method can achieve the perceptual quality of Nyquist rate view sampling while using up to 4000 times fewer views. The method is demonstrated with an augmented reality smartphone app that guides users to capture input images and enables real-time virtual exploration on desktop and mobile platforms. The paper also extends traditional plenoptic sampling theory to derive a bound that specifies precisely how densely users should sample views of a given scene when using the algorithm. The method is shown to outperform traditional light field reconstruction methods and state-of-the-art view interpolation algorithms across a range of sub-Nyquist view sampling rates. The paper also demonstrates that carefully crafted deep learning pipelines using local layered scene representations achieve state-of-the-art view synthesis results. The method is validated with extensive experiments on a synthetic test set and real-world scenes, showing that it can render high fidelity novel views of light fields that have been undersampled by up to 4000 times. The paper also compares the method to state-of-the-art view synthesis techniques and view-dependent texture mapping using a global mesh as proxy geometry, showing that it produces superior renderings, particularly for non-Lambertian effects, without the artifacts seen in renderings from competing methods. The method is shown to be effective for rendering non-Lambertian effects because it can accurately model the apparent depth of specularities, which varies with the observation viewpoint. The paper also discusses the practicality of the method, showing that it can be used to capture and render complex real-world scenes for virtual exploration. The method is implemented using a 3D convolutional neural network (CNN) architecture that dynamically adjusts the number of depth planes based on the input view sampling rate, rather than a 2D CNN with a fixed number of output planes. The method is trained on renderings of natural scenes to estimate high quality layered scene representations that produce locally consistent light fields. The paper concludes that the method provides a practical and robust solution for capturing and rendering complex real-world scenes for virtual exploration, and that it achieves state-of-the-art results in view synthesis.
Reach us at info@study.space