10 Jun 2024 | Haozhe Xie, Zhaoxi Chen, Fangzhou Hong, Ziwei Liu
GaussianCity is a generative Gaussian splatting framework for unbounded 3D city generation. It addresses the computational inefficiency of NeRF-based methods by introducing a compact scene representation called BEV-Point, which allows for efficient synthesis of large-scale 3D cities with a single feed-forward pass. The key innovations include a compact BEV-Point representation that maintains constant VRAM usage regardless of scene size, and a spatial-aware Gaussian attribute decoder that leverages Point Serializer to capture structural and contextual characteristics of BEV points. GaussianCity achieves state-of-the-art results in both drone-view and street-view 3D city generation, with a 60-fold speedup compared to CityDreamer. Extensive experiments on GoogleEarth and KITTI-360 datasets demonstrate its effectiveness in generating high-quality, large-scale 3D cities. The framework also shows superior performance in terms of both generation quality and efficiency, with lower depth error and camera error compared to other methods. The BEV-Point decoder is designed to generate Gaussian attributes from BEV-Point features, consisting of five key modules: positional encoder, point serializer, point transformer, modulated MLP, and Gaussian rasterizer. The method is evaluated using FID and KID metrics, with GaussianCity achieving lower scores than other methods. The framework also includes ablation studies to evaluate the effectiveness of different components, showing that the BEV-Point representation and decoder significantly improve performance. The method is limited by assumptions about building structures and the expressive capacity of 3D-GS. Overall, GaussianCity provides a highly efficient and effective solution for unbounded 3D city generation.GaussianCity is a generative Gaussian splatting framework for unbounded 3D city generation. It addresses the computational inefficiency of NeRF-based methods by introducing a compact scene representation called BEV-Point, which allows for efficient synthesis of large-scale 3D cities with a single feed-forward pass. The key innovations include a compact BEV-Point representation that maintains constant VRAM usage regardless of scene size, and a spatial-aware Gaussian attribute decoder that leverages Point Serializer to capture structural and contextual characteristics of BEV points. GaussianCity achieves state-of-the-art results in both drone-view and street-view 3D city generation, with a 60-fold speedup compared to CityDreamer. Extensive experiments on GoogleEarth and KITTI-360 datasets demonstrate its effectiveness in generating high-quality, large-scale 3D cities. The framework also shows superior performance in terms of both generation quality and efficiency, with lower depth error and camera error compared to other methods. The BEV-Point decoder is designed to generate Gaussian attributes from BEV-Point features, consisting of five key modules: positional encoder, point serializer, point transformer, modulated MLP, and Gaussian rasterizer. The method is evaluated using FID and KID metrics, with GaussianCity achieving lower scores than other methods. The framework also includes ablation studies to evaluate the effectiveness of different components, showing that the BEV-Point representation and decoder significantly improve performance. The method is limited by assumptions about building structures and the expressive capacity of 3D-GS. Overall, GaussianCity provides a highly efficient and effective solution for unbounded 3D city generation.