Dynamic 3D Gaussian Fields for Urban Areas

Dynamic 3D Gaussian Fields for Urban Areas

5 Jun 2024 | Tobias Fischer, Jonas Kulhanek, Marc Pollefeys, Samuel Rota Bulò, Peter Kontschieder, Lorenzo Porzi
This paper introduces 4DGF, a neural scene representation for dynamic urban areas that combines 3D Gaussians as an efficient geometry scaffold with compact and flexible neural fields for appearance modeling. The method addresses the challenges of synthesizing novel views from heterogeneous input sequences capturing dynamic urban environments under varying weather, lighting, and seasonal conditions. 4DGF integrates scene dynamics via a scene graph at a global scale and models articulated motions locally through deformations. This decomposition enables flexible scene composition suitable for real-world applications. The method surpasses the state-of-the-art by over 3 dB in PSNR and more than 200× in rendering speed. It is evaluated on four dynamic outdoor benchmarks, demonstrating superior performance and scalability. The approach uses a hybrid representation that avoids storing appearance as a per-primitive attribute, reducing memory usage by over 80%. Neural fields are used to model appearance and geometry variations, enabling efficient rendering of large-scale dynamic urban areas. The method also addresses non-rigid object motion by predicting deformation terms for 3D Gaussians. The paper also discusses limitations, including the need to model physical image formation phenomena for accurate reconstructions, and broader impacts on real-world applications like robotic simulation and mixed reality.This paper introduces 4DGF, a neural scene representation for dynamic urban areas that combines 3D Gaussians as an efficient geometry scaffold with compact and flexible neural fields for appearance modeling. The method addresses the challenges of synthesizing novel views from heterogeneous input sequences capturing dynamic urban environments under varying weather, lighting, and seasonal conditions. 4DGF integrates scene dynamics via a scene graph at a global scale and models articulated motions locally through deformations. This decomposition enables flexible scene composition suitable for real-world applications. The method surpasses the state-of-the-art by over 3 dB in PSNR and more than 200× in rendering speed. It is evaluated on four dynamic outdoor benchmarks, demonstrating superior performance and scalability. The approach uses a hybrid representation that avoids storing appearance as a per-primitive attribute, reducing memory usage by over 80%. Neural fields are used to model appearance and geometry variations, enabling efficient rendering of large-scale dynamic urban areas. The method also addresses non-rigid object motion by predicting deformation terms for 3D Gaussians. The paper also discusses limitations, including the need to model physical image formation phenomena for accurate reconstructions, and broader impacts on real-world applications like robotic simulation and mixed reality.
Reach us at info@study.space