On Scaling Up 3D Gaussian Splatting Training

On Scaling Up 3D Gaussian Splatting Training

26 Jun 2024 | Hexu Zhao, Haoyang Weng, Daohan Lu, Ang Li, Jinyang Li, Aurojit Panda, Saining Xie
Grendel is a distributed training system for 3D Gaussian Splatting (3DGS) that enables scaling up training to handle large-scale, high-resolution scenes. Traditional 3DGS training is limited by single-GPU memory constraints, but Grendel partitions 3DGS parameters and parallelizes computation across multiple GPUs. It uses sparse all-to-all communication to transfer Gaussians to pixel partitions and performs dynamic load balancing. Unlike existing systems that train with one camera view at a time, Grendel supports batched training with multiple views. It also employs a simple sqrt(batch_size) scaling rule for learning rates, which is effective for large batch sizes. Evaluations on the Rubble dataset show that Grendel achieves a PSNR of 27.28 by distributing 40.4 million Gaussians across 16 GPUs, compared to 26.28 with 11.2 million Gaussians on a single GPU. Grendel is an open-source project available at https://github.com/nyu-systems/Grendel-GS. Grendel exploits mixed parallelism and spatial locality in 3DGS to address dynamic and unbalanced workloads. It distributes Gaussians across GPUs for Gaussian-wise parallelism and pixels across GPUs for pixel-wise parallelism. It uses sparse all-to-all communication to transfer Gaussians between GPUs and dynamic load balancing to minimize workload imbalance. Grendel also scales training by batching multiple images, which requires adjusting hyperparameters. It introduces an automatic hyperparameter scaling rule based on the independent gradients hypothesis, which enables efficient, hyperparameter-tuning-free training for large batch sizes. Grendel's system design includes Gaussian-wise and pixel-wise distribution, sparse all-to-all communication, and iterative workload rebalancing. It uses a square-root learning rate scaling rule and exponential momentum scaling rule to adapt to larger batch sizes. Empirical results show that these scaling rules maintain high cosine similarity to batch-size 1 updates and have norms that are roughly invariant to the batch size. Grendel's evaluation demonstrates its scalability, showing that it can render high-resolution images from large scenes and that its performance scales with additional hardware resources. It also shows that using more Gaussians improves reconstruction quality, and that multi-GPU training systems like Grendel are necessary for high-quality 3DGS training.Grendel is a distributed training system for 3D Gaussian Splatting (3DGS) that enables scaling up training to handle large-scale, high-resolution scenes. Traditional 3DGS training is limited by single-GPU memory constraints, but Grendel partitions 3DGS parameters and parallelizes computation across multiple GPUs. It uses sparse all-to-all communication to transfer Gaussians to pixel partitions and performs dynamic load balancing. Unlike existing systems that train with one camera view at a time, Grendel supports batched training with multiple views. It also employs a simple sqrt(batch_size) scaling rule for learning rates, which is effective for large batch sizes. Evaluations on the Rubble dataset show that Grendel achieves a PSNR of 27.28 by distributing 40.4 million Gaussians across 16 GPUs, compared to 26.28 with 11.2 million Gaussians on a single GPU. Grendel is an open-source project available at https://github.com/nyu-systems/Grendel-GS. Grendel exploits mixed parallelism and spatial locality in 3DGS to address dynamic and unbalanced workloads. It distributes Gaussians across GPUs for Gaussian-wise parallelism and pixels across GPUs for pixel-wise parallelism. It uses sparse all-to-all communication to transfer Gaussians between GPUs and dynamic load balancing to minimize workload imbalance. Grendel also scales training by batching multiple images, which requires adjusting hyperparameters. It introduces an automatic hyperparameter scaling rule based on the independent gradients hypothesis, which enables efficient, hyperparameter-tuning-free training for large batch sizes. Grendel's system design includes Gaussian-wise and pixel-wise distribution, sparse all-to-all communication, and iterative workload rebalancing. It uses a square-root learning rate scaling rule and exponential momentum scaling rule to adapt to larger batch sizes. Empirical results show that these scaling rules maintain high cosine similarity to batch-size 1 updates and have norms that are roughly invariant to the batch size. Grendel's evaluation demonstrates its scalability, showing that it can render high-resolution images from large scenes and that its performance scales with additional hardware resources. It also shows that using more Gaussians improves reconstruction quality, and that multi-GPU training systems like Grendel are necessary for high-quality 3DGS training.
Reach us at info@study.space