HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting

HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting

19 Mar 2024 | Hongyu Zhou, Jiahao Shao, Lu Xu, Dongfeng Bai, Weichao Qiu, Bingbing Liu, Yue Wang, Andreas Geiger, Yiyi Liao
**HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting** **Authors:** Hongyu Zhou, Jiahao Shao, Lu Xu, Dongfeng Bai, Weichao Qiu, Bingbing Liu, Yue Wang, Andreas Geiger, Yiyi Liao **Abstract:** This paper introduces HUGS, a novel pipeline for holistic urban scene understanding using 3D Gaussian Splatting. The method jointly optimizes geometry, appearance, semantics, and motion using static and dynamic 3D Gaussians, with moving object poses regularized by physical constraints. HUGS enables real-time rendering of new viewpoints, high-accuracy 2D and 3D semantic information, and dynamic scene reconstruction even with noisy 3D bounding box predictions. Experimental results on KITTI, KITTI-360, and Virtual KITTI demonstrate the effectiveness of HUGS. **Introduction:** Reconstructing urban scenes is crucial for autonomous driving applications. Existing methods often focus on specific aspects of the task and require additional inputs like LiDAR scans or manually annotated 3D bounding boxes. HUGS addresses dynamic 3D urban scene understanding by extending Gaussian Splatting to model additional modalities, including semantic, flow, and camera exposure, without relying on ground truth 3D bounding boxes. **Method:** HUGS decomposes the scene into static regions and rigidly moving dynamic objects, represented using 3D Gaussians. The motion of dynamic objects is modeled using a unicycle model, which integrates physical constraints to reduce noise during tracking. This decomposition allows for rendering RGB images, semantic maps, and optical flow through volume rendering. **Experiments:** HUGS is evaluated on various tasks, including novel view synthesis, semantic synthesis, and 3D semantic reconstruction. Results show that HUGS outperforms state-of-the-art methods on KITTI, Virtual KITTI, and KITTI-360 datasets, achieving high-quality novel view synthesis and accurate 3D semantic reconstruction. **Conclusion:** HUGS provides a holistic approach to urban scene understanding, enabling real-time rendering and accurate 3D reconstruction. Future work may explore category-level priors and more degrees of freedom for enhanced scene editing capabilities.**HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting** **Authors:** Hongyu Zhou, Jiahao Shao, Lu Xu, Dongfeng Bai, Weichao Qiu, Bingbing Liu, Yue Wang, Andreas Geiger, Yiyi Liao **Abstract:** This paper introduces HUGS, a novel pipeline for holistic urban scene understanding using 3D Gaussian Splatting. The method jointly optimizes geometry, appearance, semantics, and motion using static and dynamic 3D Gaussians, with moving object poses regularized by physical constraints. HUGS enables real-time rendering of new viewpoints, high-accuracy 2D and 3D semantic information, and dynamic scene reconstruction even with noisy 3D bounding box predictions. Experimental results on KITTI, KITTI-360, and Virtual KITTI demonstrate the effectiveness of HUGS. **Introduction:** Reconstructing urban scenes is crucial for autonomous driving applications. Existing methods often focus on specific aspects of the task and require additional inputs like LiDAR scans or manually annotated 3D bounding boxes. HUGS addresses dynamic 3D urban scene understanding by extending Gaussian Splatting to model additional modalities, including semantic, flow, and camera exposure, without relying on ground truth 3D bounding boxes. **Method:** HUGS decomposes the scene into static regions and rigidly moving dynamic objects, represented using 3D Gaussians. The motion of dynamic objects is modeled using a unicycle model, which integrates physical constraints to reduce noise during tracking. This decomposition allows for rendering RGB images, semantic maps, and optical flow through volume rendering. **Experiments:** HUGS is evaluated on various tasks, including novel view synthesis, semantic synthesis, and 3D semantic reconstruction. Results show that HUGS outperforms state-of-the-art methods on KITTI, Virtual KITTI, and KITTI-360 datasets, achieving high-quality novel view synthesis and accurate 3D semantic reconstruction. **Conclusion:** HUGS provides a holistic approach to urban scene understanding, enabling real-time rendering and accurate 3D reconstruction. Future work may explore category-level priors and more degrees of freedom for enhanced scene editing capabilities.
Reach us at info@study.space
[slides and audio] HUGS%3A Holistic Urban 3D Scene Understanding via Gaussian Splatting