24, July 27-August 1, 2024, Denver, CO, USA | Kaiwen Jiang, Yang Fu, Mukund Varma T, Yash Belhe, Xiaolong Wang, Hao Su, Ravi Ramamoorthi
This paper presents a novel approach to sparse view synthesis without relying on estimated camera poses. The method, called construct-and-optimize, leverages 3D Gaussian splatting to progressively construct a solution using monocular depth and back-project pixels into the 3D world. During construction, 2D correspondences between training views and rendered images are detected to optimize the solution. A unified differentiable pipeline is developed for camera registration and adjustment of camera poses and depths, followed by back-projection. The paper introduces a novel notion of an expected surface in Gaussian splatting, which is crucial for effective optimization. The method is evaluated on the Tanks & Temples and Static Hikes datasets, demonstrating significant improvements over existing pose-free and pose-required methods in terms of both qualitative and quantitative metrics. The results show that the method achieves better quality with fewer views and outperforms previous algorithms even when using half the dataset. The paper also includes a detailed analysis of the impact of the number of training views and an ablation study to validate the key components of the method.This paper presents a novel approach to sparse view synthesis without relying on estimated camera poses. The method, called construct-and-optimize, leverages 3D Gaussian splatting to progressively construct a solution using monocular depth and back-project pixels into the 3D world. During construction, 2D correspondences between training views and rendered images are detected to optimize the solution. A unified differentiable pipeline is developed for camera registration and adjustment of camera poses and depths, followed by back-projection. The paper introduces a novel notion of an expected surface in Gaussian splatting, which is crucial for effective optimization. The method is evaluated on the Tanks & Temples and Static Hikes datasets, demonstrating significant improvements over existing pose-free and pose-required methods in terms of both qualitative and quantitative metrics. The results show that the method achieves better quality with fewer views and outperforms previous algorithms even when using half the dataset. The paper also includes a detailed analysis of the impact of the number of training views and an ablation study to validate the key components of the method.