18 Aug 2024 | Yunzhi Yan, Haotong Lin, Chenxu Zhou, Weijie Wang, Haiyang Sun, Kun Zhan, Xianpeng Lang, Xiaowei Zhou, Sida Peng, Li Auto
This paper introduces Street Gaussians, a novel explicit scene representation for modeling dynamic urban street scenes. The method decomposes the scene into a static background and moving foreground vehicles, each represented as a set of neural point clouds equipped with semantic logits and 3D Gaussians. Each point in the point cloud stores learnable 3D Gaussian parameters, including position, opacity, and covariance, to represent geometry. The appearance of the foreground vehicles is modeled using a 4D spherical harmonics model, which predicts spherical harmonics coefficients at any time step. This explicit representation allows for easy composition of object vehicles and the background, enabling real-time rendering at 135 FPS within half an hour of training. The method is evaluated on the Waymo Open and KITTI datasets, demonstrating superior performance in rendering quality and speed compared to state-of-the-art methods. The paper also includes ablation studies and applications in scene editing, object decomposition, and semantic segmentation.This paper introduces Street Gaussians, a novel explicit scene representation for modeling dynamic urban street scenes. The method decomposes the scene into a static background and moving foreground vehicles, each represented as a set of neural point clouds equipped with semantic logits and 3D Gaussians. Each point in the point cloud stores learnable 3D Gaussian parameters, including position, opacity, and covariance, to represent geometry. The appearance of the foreground vehicles is modeled using a 4D spherical harmonics model, which predicts spherical harmonics coefficients at any time step. This explicit representation allows for easy composition of object vehicles and the background, enabling real-time rendering at 135 FPS within half an hour of training. The method is evaluated on the Waymo Open and KITTI datasets, demonstrating superior performance in rendering quality and speed compared to state-of-the-art methods. The paper also includes ablation studies and applications in scene editing, object decomposition, and semantic segmentation.