The paper introduces *EndoSparse*, a framework for real-time sparse view synthesis of endoscopic scenes using Gaussian Splatting (3D-GS). The goal is to reconstruct 3D surgical scenes from sparse observations, which are common in clinical settings due to equipment instability and variable conditions. *EndoSparse* leverages prior knowledge from multiple foundation models to improve geometric and appearance quality under challenging sparse-view conditions. Key contributions include:
1. **Deformable Endoscopic Reconstruction with 3D-GS**: Utilizes 3D Gaussians with attributes like position, covariance, opacity, and spherical harmonic coefficients to represent the scene.
2. **Instilling Diffusion Prior for Plausible Appearance**: Introduces random noise to the rendered images and uses a diffusion model to predict the original image, providing plausible guidance gradients.
3. **Distilling Geometric Prior for Accurate Geometry**: Uses a depth estimation model to generate monocular depth maps and align them with the estimated depth maps, ensuring geometric coherence.
The method is evaluated on two datasets, EndoNeRF-D and SCARED, and compared with state-of-the-art methods. *EndoSparse* demonstrates superior performance in terms of rendering efficiency, geometric precision, and visual quality, even with only a few views. Ablation studies confirm the effectiveness of the key components and the impact of the number of training views. Overall, *EndoSparse* shows promise for practical deployment in clinical scenarios.The paper introduces *EndoSparse*, a framework for real-time sparse view synthesis of endoscopic scenes using Gaussian Splatting (3D-GS). The goal is to reconstruct 3D surgical scenes from sparse observations, which are common in clinical settings due to equipment instability and variable conditions. *EndoSparse* leverages prior knowledge from multiple foundation models to improve geometric and appearance quality under challenging sparse-view conditions. Key contributions include:
1. **Deformable Endoscopic Reconstruction with 3D-GS**: Utilizes 3D Gaussians with attributes like position, covariance, opacity, and spherical harmonic coefficients to represent the scene.
2. **Instilling Diffusion Prior for Plausible Appearance**: Introduces random noise to the rendered images and uses a diffusion model to predict the original image, providing plausible guidance gradients.
3. **Distilling Geometric Prior for Accurate Geometry**: Uses a depth estimation model to generate monocular depth maps and align them with the estimated depth maps, ensuring geometric coherence.
The method is evaluated on two datasets, EndoNeRF-D and SCARED, and compared with state-of-the-art methods. *EndoSparse* demonstrates superior performance in terms of rendering efficiency, geometric precision, and visual quality, even with only a few views. Ablation studies confirm the effectiveness of the key components and the impact of the number of training views. Overall, *EndoSparse* shows promise for practical deployment in clinical scenarios.