**Abstract:**
Image Sculpting is a novel framework that integrates 3D geometry and graphics to edit 2D images. Unlike existing methods confined to 2D spaces and relying on textual instructions, Image Sculpting converts 2D objects into 3D models, enabling precise and quantifiable edits such as pose, rotation, translation, 3D composition, carving, and serial addition. The framework supports high-fidelity results through a coarse-to-fine enhancement process, merging the edited 3D objects back into their original 2D contexts.
**Introduction:**
Recent advancements in image generative modeling have opened new avenues for creative content creation, but the integration of these models into real-world workflows remains challenging. Image Sculpting addresses this by leveraging 3D geometry and graphics to achieve precise control over object manipulation. The framework consists of three key components: single-view 3D reconstruction, 3D object manipulation, and a coarse-to-fine generative enhancement process.
**3D Shape Deformation:**
The framework employs various 3D deformation techniques, including space deformations, shape-aware deformations, and linear blend skinning, to achieve precise control over object shapes and orientations.
**Methods:**
1. **De-Rendering and Deformation:** The input 2D image is converted into a 3D model using a zero-shot single-image reconstruction model. The 3D model is then deformed in 3D space.
2. **Coarse-to-Fine Generative Enhancement:** A pre-trained text-to-image diffusion model is fine-tuned to preserve key details from the input while maintaining the edited geometry. Depth control and feature injection techniques are used to enhance texture quality and geometric consistency.
**Experiments:**
The framework is evaluated on a new dataset, SculptingBench, which includes 28 images covering six categories of editing tasks. Quantitative and qualitative evaluations show that Image Sculpting outperforms existing methods in terms of texture quality and geometric fidelity.
**Limitations:**
The framework relies on the quality of single-view 3D reconstruction, which can be improved over time. Future work could explore data-driven techniques to automate mesh deformation and incorporate super-resolution methods to enhance output resolution.**Abstract:**
Image Sculpting is a novel framework that integrates 3D geometry and graphics to edit 2D images. Unlike existing methods confined to 2D spaces and relying on textual instructions, Image Sculpting converts 2D objects into 3D models, enabling precise and quantifiable edits such as pose, rotation, translation, 3D composition, carving, and serial addition. The framework supports high-fidelity results through a coarse-to-fine enhancement process, merging the edited 3D objects back into their original 2D contexts.
**Introduction:**
Recent advancements in image generative modeling have opened new avenues for creative content creation, but the integration of these models into real-world workflows remains challenging. Image Sculpting addresses this by leveraging 3D geometry and graphics to achieve precise control over object manipulation. The framework consists of three key components: single-view 3D reconstruction, 3D object manipulation, and a coarse-to-fine generative enhancement process.
**3D Shape Deformation:**
The framework employs various 3D deformation techniques, including space deformations, shape-aware deformations, and linear blend skinning, to achieve precise control over object shapes and orientations.
**Methods:**
1. **De-Rendering and Deformation:** The input 2D image is converted into a 3D model using a zero-shot single-image reconstruction model. The 3D model is then deformed in 3D space.
2. **Coarse-to-Fine Generative Enhancement:** A pre-trained text-to-image diffusion model is fine-tuned to preserve key details from the input while maintaining the edited geometry. Depth control and feature injection techniques are used to enhance texture quality and geometric consistency.
**Experiments:**
The framework is evaluated on a new dataset, SculptingBench, which includes 28 images covering six categories of editing tasks. Quantitative and qualitative evaluations show that Image Sculpting outperforms existing methods in terms of texture quality and geometric fidelity.
**Limitations:**
The framework relies on the quality of single-view 3D reconstruction, which can be improved over time. Future work could explore data-driven techniques to automate mesh deformation and incorporate super-resolution methods to enhance output resolution.