**GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping** introduces a novel approach for generating high-quality novel views from a single input image, while preserving semantic details. The method addresses the limitations of existing techniques, which often struggle with noisy depth maps and loss of semantic information during geometric warping. GenWarp integrates view warping and occlusion inpainting into a unified process, using a two-stream architecture that includes a semantic preserver network and a diffusion model. By augmenting self-attention with cross-view attention, the model learns to determine where to warp and where to generate, effectively handling both in-domain and out-of-domain images. Extensive experiments on datasets like RealEstate10K, ScanNet, and in-the-wild images demonstrate that GenWarp outperforms existing methods in terms of both qualitative and quantitative metrics. The approach leverages large-scale text-to-image (T2I) models and monocular depth estimation (MDE) to achieve high-quality novel view synthesis, making it a promising solution for applications requiring flexible camera viewpoint changes in generated images.**GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping** introduces a novel approach for generating high-quality novel views from a single input image, while preserving semantic details. The method addresses the limitations of existing techniques, which often struggle with noisy depth maps and loss of semantic information during geometric warping. GenWarp integrates view warping and occlusion inpainting into a unified process, using a two-stream architecture that includes a semantic preserver network and a diffusion model. By augmenting self-attention with cross-view attention, the model learns to determine where to warp and where to generate, effectively handling both in-domain and out-of-domain images. Extensive experiments on datasets like RealEstate10K, ScanNet, and in-the-wild images demonstrate that GenWarp outperforms existing methods in terms of both qualitative and quantitative metrics. The approach leverages large-scale text-to-image (T2I) models and monocular depth estimation (MDE) to achieve high-quality novel view synthesis, making it a promising solution for applications requiring flexible camera viewpoint changes in generated images.