May 30, 2024 | Nan Huang, Xiaobao Wei, Wenzhao Zheng, Pengju An, Ming Lu, Wei Zhan, Masayoshi Tomizuka, Kurt Keutzer, Shanghang Zhang
S³Gaussian is a self-supervised method for 3D reconstruction of street scenes in autonomous driving. It uses 3D Gaussians and a spatial-temporal field network to model 4D dynamics, enabling the decomposition of static and dynamic elements without requiring 3D annotations. The method introduces a multi-resolution Hexplane structure encoder and a multi-head Gaussian decoder to efficiently capture spatial-temporal information and deform Gaussian points. It is evaluated on the Waymo-Open dataset, demonstrating superior performance in scene reconstruction and novel view synthesis. The method achieves state-of-the-art results without explicit 3D supervision, offering a robust solution for dynamic street scenes. The approach is self-supervised, eliminating the need for additional annotations, and enables high-fidelity and real-time neural rendering of dynamic urban street scenes, which is crucial for autonomous driving simulation. The method's key contributions include the first self-supervised method for decomposing dynamic and static 3D Gaussians in street scenes, an efficient spatial-temporal decomposition network for modeling complex changes in driving scenes, and comprehensive experiments on challenging datasets showing state-of-the-art rendering quality. The method is designed to handle the complex spatial-temporal deformations inherent in driving scenes, providing a robust solution for dynamic street scenes without requiring 3D supervision.S³Gaussian is a self-supervised method for 3D reconstruction of street scenes in autonomous driving. It uses 3D Gaussians and a spatial-temporal field network to model 4D dynamics, enabling the decomposition of static and dynamic elements without requiring 3D annotations. The method introduces a multi-resolution Hexplane structure encoder and a multi-head Gaussian decoder to efficiently capture spatial-temporal information and deform Gaussian points. It is evaluated on the Waymo-Open dataset, demonstrating superior performance in scene reconstruction and novel view synthesis. The method achieves state-of-the-art results without explicit 3D supervision, offering a robust solution for dynamic street scenes. The approach is self-supervised, eliminating the need for additional annotations, and enables high-fidelity and real-time neural rendering of dynamic urban street scenes, which is crucial for autonomous driving simulation. The method's key contributions include the first self-supervised method for decomposing dynamic and static 3D Gaussians in street scenes, an efficient spatial-temporal decomposition network for modeling complex changes in driving scenes, and comprehensive experiments on challenging datasets showing state-of-the-art rendering quality. The method is designed to handle the complex spatial-temporal deformations inherent in driving scenes, providing a robust solution for dynamic street scenes without requiring 3D supervision.