NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer

NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer

2 Apr 2025 | Meng You1†, Zhiyu Zhu1†*, Hui Liu2 & Junhui Hou1
The paper "NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer" introduces a novel view synthesis (NVS) paradigm that leverages pre-trained large video diffusion models without additional training. The method adaptively modulates the diffusion sampling process using given views to generate visually pleasing results from single or multiple views of static scenes or monocular videos of dynamic scenes. The authors theoretically formulate the NVS-oriented reverse diffusion sampling by modulating the score function with the given scene priors, which are represented by warped input views. They also theoretically explore the boundary of estimation errors to achieve adaptive modulation in the reverse diffusion process, reducing potential errors from view pose and diffusion steps. Extensive evaluations on static and dynamic scenes demonstrate the method's superior performance over state-of-the-art methods in both quantitative and qualitative metrics. The source code is available on GitHub.The paper "NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer" introduces a novel view synthesis (NVS) paradigm that leverages pre-trained large video diffusion models without additional training. The method adaptively modulates the diffusion sampling process using given views to generate visually pleasing results from single or multiple views of static scenes or monocular videos of dynamic scenes. The authors theoretically formulate the NVS-oriented reverse diffusion sampling by modulating the score function with the given scene priors, which are represented by warped input views. They also theoretically explore the boundary of estimation errors to achieve adaptive modulation in the reverse diffusion process, reducing potential errors from view pose and diffusion steps. Extensive evaluations on static and dynamic scenes demonstrate the method's superior performance over state-of-the-art methods in both quantitative and qualitative metrics. The source code is available on GitHub.
Reach us at info@study.space