InseRF: Text-Driven Generative Object Insertion in Neural 3D Scenes

InseRF: Text-Driven Generative Object Insertion in Neural 3D Scenes

2024-01-10 | Mohamad Shahbazi, Liesbeth Claessens, Michael Niemeyer, Edo Collins, Alessio Tonioni, Luc Van Gool, Federico Tombari
InseRF is a novel method for generative object insertion in neural 3D scenes. It takes as input a NeRF reconstruction of a 3D scene, a textual description of the target object, and a 2D bounding box in a reference view. The method generates a 3D object in the scene, guided by the 2D bounding box. InseRF is designed to be multiview consistent and can insert new objects in user-specified locations without requiring explicit 3D information. The method works by first generating a 2D view of the target object in the reference view using a text prompt and a 2D bounding box. This 2D view is then lifted to 3D using a single-view object reconstruction method. The reconstructed object is then inserted into the scene, guided by the priors of monocular depth estimation methods. The method is evaluated on various 3D scenes and shows effectiveness compared to existing methods. InseRF is capable of controllable and 3D-consistent object insertion without requiring explicit 3D information as input. The method is compared to existing baselines, including Instruct-NeRF2NeRF and Multi-View Inpainting, and shows superior performance in generating 3D-consistent objects. The method is also evaluated quantitatively using three metrics: CLIP Text-Image Similarity, Directional Text-Image Similarity, and Temporal Direction Consistency. The results show that InseRF outperforms the baselines in all three metrics. The method is also compared visually, showing that it can insert new 3D-consistent objects in the desired locations. The method is implemented using a combination of diffusion models and single-view object reconstruction methods. The method is also refined using an optional refinement step to improve the insertion further. The method is evaluated on various 3D scenes and shows effectiveness in generating 3D-consistent objects. The method is also compared to existing methods, showing that it can insert new 3D-consistent objects in the desired locations. The method is also evaluated quantitatively using three metrics, showing that it outperforms the baselines in all three metrics. The method is also compared visually, showing that it can insert new 3D-consistent objects in the desired locations.InseRF is a novel method for generative object insertion in neural 3D scenes. It takes as input a NeRF reconstruction of a 3D scene, a textual description of the target object, and a 2D bounding box in a reference view. The method generates a 3D object in the scene, guided by the 2D bounding box. InseRF is designed to be multiview consistent and can insert new objects in user-specified locations without requiring explicit 3D information. The method works by first generating a 2D view of the target object in the reference view using a text prompt and a 2D bounding box. This 2D view is then lifted to 3D using a single-view object reconstruction method. The reconstructed object is then inserted into the scene, guided by the priors of monocular depth estimation methods. The method is evaluated on various 3D scenes and shows effectiveness compared to existing methods. InseRF is capable of controllable and 3D-consistent object insertion without requiring explicit 3D information as input. The method is compared to existing baselines, including Instruct-NeRF2NeRF and Multi-View Inpainting, and shows superior performance in generating 3D-consistent objects. The method is also evaluated quantitatively using three metrics: CLIP Text-Image Similarity, Directional Text-Image Similarity, and Temporal Direction Consistency. The results show that InseRF outperforms the baselines in all three metrics. The method is also compared visually, showing that it can insert new 3D-consistent objects in the desired locations. The method is implemented using a combination of diffusion models and single-view object reconstruction methods. The method is also refined using an optional refinement step to improve the insertion further. The method is evaluated on various 3D scenes and shows effectiveness in generating 3D-consistent objects. The method is also compared to existing methods, showing that it can insert new 3D-consistent objects in the desired locations. The method is also evaluated quantitatively using three metrics, showing that it outperforms the baselines in all three metrics. The method is also compared visually, showing that it can insert new 3D-consistent objects in the desired locations.
Reach us at info@study.space