27 May 2024 | Junyoung Seo, Kazumi Fukuda, Takashi Shibuya, Takuya Narihira, Naoki Murata, Shoukang Hu, Chieh-Hsin Lai, Seungryong Kim, Yuki Mitsufuji
GenWarp is a novel method for generating novel views from a single image while preserving semantic details. The approach combines semantic-preserving generative warping with cross-view attention to enable T2I generative models to learn where to warp and where to generate. Unlike existing methods that rely on inpainting unreliable warped images, GenWarp integrates view warping and occlusion inpainting into a unified process. By using estimated depth maps and incorporating geometric warping signals, the model learns to warp and generate in a way that preserves the semantic details of the input view. The method is evaluated on RealEstate10K, ScanNet, and in-the-wild images, demonstrating superior performance in both in-domain and out-of-domain scenarios. GenWarp outperforms existing methods in terms of generation quality and robustness to variations in scene types and camera viewpoints. The model is trained using a fine-tuned Stable Diffusion model and leverages cross-view attention to enhance the generative process. The approach addresses the limitations of existing methods by eliminating the dependency on unreliable warped images and integrating semantic features from the source view. The results show that GenWarp generates high-quality novel views consistent with the input views, even for challenging viewpoint changes.GenWarp is a novel method for generating novel views from a single image while preserving semantic details. The approach combines semantic-preserving generative warping with cross-view attention to enable T2I generative models to learn where to warp and where to generate. Unlike existing methods that rely on inpainting unreliable warped images, GenWarp integrates view warping and occlusion inpainting into a unified process. By using estimated depth maps and incorporating geometric warping signals, the model learns to warp and generate in a way that preserves the semantic details of the input view. The method is evaluated on RealEstate10K, ScanNet, and in-the-wild images, demonstrating superior performance in both in-domain and out-of-domain scenarios. GenWarp outperforms existing methods in terms of generation quality and robustness to variations in scene types and camera viewpoints. The model is trained using a fine-tuned Stable Diffusion model and leverages cross-view attention to enhance the generative process. The approach addresses the limitations of existing methods by eliminating the dependency on unreliable warped images and integrating semantic features from the source view. The results show that GenWarp generates high-quality novel views consistent with the input views, even for challenging viewpoint changes.