9 Jun 2024 | Yunsong Wang, Tianxin Huang, Hanlin Chen, Gim Hee Lee
FreeSplat is a novel framework designed for generalizable 3D Gaussian Splatting, enabling free-view synthesis of indoor scenes. Unlike existing methods that are limited to narrow-range interpolation between stereo images, FreeSplat can accurately localize 3D Gaussians from long sequence inputs and support free-view synthesis across wide view ranges. The key contributions of FreeSplat include:
1. **Low-cost Cross-View Aggregation**: This method efficiently extracts features from nearby views using adaptive cost volumes and a multi-scale structure, broadening the receptive field for depth estimation and Gaussian triplet prediction.
2. **Pixel-wise Triplet Fusion**: This module reduces redundancy in overlapping view regions by fusing local Gaussian triplets from multiple views, ensuring robust feature aggregation.
3. **Free-View Training (FVT)**: This strategy disentangles the performance of generalizable 3DGS with specific numbers of views, allowing training on long sequences and ensuring accurate depth estimation from novel views.
Empirical results on the ScanNet and Replica datasets demonstrate that FreeSplat outperforms existing methods in both novel view rendering quality and depth map accuracy, even with different numbers of input views. FreeSplat also shows superior efficiency in inference and reduces redundant Gaussians, making it suitable for feed-forward large scene reconstruction without depth priors. The code for FreeSplat will be made open-source upon paper acceptance.FreeSplat is a novel framework designed for generalizable 3D Gaussian Splatting, enabling free-view synthesis of indoor scenes. Unlike existing methods that are limited to narrow-range interpolation between stereo images, FreeSplat can accurately localize 3D Gaussians from long sequence inputs and support free-view synthesis across wide view ranges. The key contributions of FreeSplat include:
1. **Low-cost Cross-View Aggregation**: This method efficiently extracts features from nearby views using adaptive cost volumes and a multi-scale structure, broadening the receptive field for depth estimation and Gaussian triplet prediction.
2. **Pixel-wise Triplet Fusion**: This module reduces redundancy in overlapping view regions by fusing local Gaussian triplets from multiple views, ensuring robust feature aggregation.
3. **Free-View Training (FVT)**: This strategy disentangles the performance of generalizable 3DGS with specific numbers of views, allowing training on long sequences and ensuring accurate depth estimation from novel views.
Empirical results on the ScanNet and Replica datasets demonstrate that FreeSplat outperforms existing methods in both novel view rendering quality and depth map accuracy, even with different numbers of input views. FreeSplat also shows superior efficiency in inference and reduces redundant Gaussians, making it suitable for feed-forward large scene reconstruction without depth priors. The code for FreeSplat will be made open-source upon paper acceptance.