NVS-SOLVER: VIDEO DIFFUSION MODEL AS ZERO-SHOT NOVEL VIEW SYNTHESIZER
This paper proposes a novel view synthesis method that leverages pre-trained large video diffusion models without requiring additional training. The method adaptively modulates the diffusion sampling process using given views to generate visually pleasing results from single or multiple views of static scenes or monocular videos of dynamic scenes. The approach is based on theoretical modeling and iteratively modulates the score function with scene priors represented by warped input views to control the video diffusion process. The method also theoretically explores the boundary of estimation error to achieve adaptive modulation based on view pose and number of diffusion steps. Extensive evaluations on both static and dynamic scenes demonstrate the method's significant superiority over state-of-the-art methods in both quantitative and qualitative terms. The source code is available at https://github.com/ZHU-Zhiyu/NVS_Solver.
The method is designed to synthesize novel views from single or multiple views of static scenes or monocular videos of dynamic scenes. It leverages pre-trained video diffusion models and uses the given views to guide the reverse sampling process. The method theoretically formulates the diffusion process as guided sampling, modulating intermediate results with scene information from the given views. It also investigates the potential distribution of the error map to achieve adaptive modulation in the reverse diffusion process, reducing the estimation error boundary.
The method is evaluated on both static and dynamic scenes, showing significant improvements over existing methods. It is also effective for NVS from multiple views and monocular videos. The method uses a pre-trained image-to-video diffusion model, SVD, and leverages latent representations from sparse or single-view inputs to overcome the limitations of existing methods and produce high-quality novel views with improved realism.
The method is based on diffusion models, which are deep generative models that iteratively add noise to image data and then reverse the process to remove noise and transfer to noise-free data distribution. The method uses the pre-trained video diffusion model to synthesize novel views without additional training. The method theoretically formulates the diffusion process as guided sampling, modulating intermediate results with scene information from the given views. It also investigates the potential distribution of the error map to achieve adaptive modulation in the reverse diffusion process, reducing the estimation error boundary.
The method is evaluated on both static and dynamic scenes, showing significant improvements over existing methods. It is also effective for NVS from multiple views and monocular videos. The method uses a pre-trained image-to-video diffusion model, SVD, and leverages latent representations from sparse or single-view inputs to overcome the limitations of existing methods and produce high-quality novel views with improved realism.NVS-SOLVER: VIDEO DIFFUSION MODEL AS ZERO-SHOT NOVEL VIEW SYNTHESIZER
This paper proposes a novel view synthesis method that leverages pre-trained large video diffusion models without requiring additional training. The method adaptively modulates the diffusion sampling process using given views to generate visually pleasing results from single or multiple views of static scenes or monocular videos of dynamic scenes. The approach is based on theoretical modeling and iteratively modulates the score function with scene priors represented by warped input views to control the video diffusion process. The method also theoretically explores the boundary of estimation error to achieve adaptive modulation based on view pose and number of diffusion steps. Extensive evaluations on both static and dynamic scenes demonstrate the method's significant superiority over state-of-the-art methods in both quantitative and qualitative terms. The source code is available at https://github.com/ZHU-Zhiyu/NVS_Solver.
The method is designed to synthesize novel views from single or multiple views of static scenes or monocular videos of dynamic scenes. It leverages pre-trained video diffusion models and uses the given views to guide the reverse sampling process. The method theoretically formulates the diffusion process as guided sampling, modulating intermediate results with scene information from the given views. It also investigates the potential distribution of the error map to achieve adaptive modulation in the reverse diffusion process, reducing the estimation error boundary.
The method is evaluated on both static and dynamic scenes, showing significant improvements over existing methods. It is also effective for NVS from multiple views and monocular videos. The method uses a pre-trained image-to-video diffusion model, SVD, and leverages latent representations from sparse or single-view inputs to overcome the limitations of existing methods and produce high-quality novel views with improved realism.
The method is based on diffusion models, which are deep generative models that iteratively add noise to image data and then reverse the process to remove noise and transfer to noise-free data distribution. The method uses the pre-trained video diffusion model to synthesize novel views without additional training. The method theoretically formulates the diffusion process as guided sampling, modulating intermediate results with scene information from the given views. It also investigates the potential distribution of the error map to achieve adaptive modulation in the reverse diffusion process, reducing the estimation error boundary.
The method is evaluated on both static and dynamic scenes, showing significant improvements over existing methods. It is also effective for NVS from multiple views and monocular videos. The method uses a pre-trained image-to-video diffusion model, SVD, and leverages latent representations from sparse or single-view inputs to overcome the limitations of existing methods and produce high-quality novel views with improved realism.