Image Sculpting is a novel framework that enables precise editing of 2D images by integrating tools from 3D geometry and graphics. Unlike traditional 2D generative methods, which are limited to 2D spaces and rely on textual instructions, Image Sculpting converts 2D objects into 3D, allowing direct interaction with their 3D geometry. After editing, the objects are re-rendered into 2D, merging into the original image to produce high-fidelity results through a coarse-to-fine enhancement process. The framework supports precise, quantifiable, and physically-plausible editing options such as pose editing, rotation, translation, 3D composition, carving, and serial addition. It marks an initial step towards combining the creative freedom of generative models with the precision of graphics pipelines.
The framework involves three key phases: (1) single-view 3D reconstruction, (2) manipulation of objects in 3D, and (3) a coarse-to-fine generative enhancement process. For 3D reconstruction, a zero-shot single image reconstruction model (Zero-1-to-3) is used, trained on extensive datasets of 3D objects. The deformation process utilizes established geometric processing tools, such as As-Rigid-As-Possible (ARAP) and linear-based skinning, enabling interactive and precise manipulation of the 3D models. For the generative enhancement process, an improved feature injection technique is used, balancing the original texture of the object with the modified geometry.
The framework demonstrates precise and quantifiable image editing capabilities, including precise pose editing, rotation, translation, multi-object 3D composition, carving, and serial addition. It outperforms various baselines in image quality, as confirmed by both qualitative and quantitative evaluations on the new SculptingBench benchmark. The method introduces new editing features through precise 3D geometry control, a capability not present in existing methods. It also addresses the challenge of preserving both texture and geometry during image enhancement, using a combination of feature injection and depth control. The framework is evaluated on a new dataset, SculptingBench, which contains 28 images covering six categories. The results show that the method significantly enhances texture quality while maintaining geometric consistency. The method is also compared with other state-of-the-art techniques, demonstrating its superiority in precision and control. The framework has limitations, including dependency on the quality of single-view 3D reconstruction and the need for manual efforts in model rigging. Future research could explore data-driven techniques to automate this process. The output resolution of the pipeline also falls short of industrial rendering systems, and incorporating super-resolution methods could be a solution for future improvements. Another issue is the lack of background lighting adjustment, which undermines the realism of the scene; future work could benefit from integrating dynamic (re-)lighting techniques.Image Sculpting is a novel framework that enables precise editing of 2D images by integrating tools from 3D geometry and graphics. Unlike traditional 2D generative methods, which are limited to 2D spaces and rely on textual instructions, Image Sculpting converts 2D objects into 3D, allowing direct interaction with their 3D geometry. After editing, the objects are re-rendered into 2D, merging into the original image to produce high-fidelity results through a coarse-to-fine enhancement process. The framework supports precise, quantifiable, and physically-plausible editing options such as pose editing, rotation, translation, 3D composition, carving, and serial addition. It marks an initial step towards combining the creative freedom of generative models with the precision of graphics pipelines.
The framework involves three key phases: (1) single-view 3D reconstruction, (2) manipulation of objects in 3D, and (3) a coarse-to-fine generative enhancement process. For 3D reconstruction, a zero-shot single image reconstruction model (Zero-1-to-3) is used, trained on extensive datasets of 3D objects. The deformation process utilizes established geometric processing tools, such as As-Rigid-As-Possible (ARAP) and linear-based skinning, enabling interactive and precise manipulation of the 3D models. For the generative enhancement process, an improved feature injection technique is used, balancing the original texture of the object with the modified geometry.
The framework demonstrates precise and quantifiable image editing capabilities, including precise pose editing, rotation, translation, multi-object 3D composition, carving, and serial addition. It outperforms various baselines in image quality, as confirmed by both qualitative and quantitative evaluations on the new SculptingBench benchmark. The method introduces new editing features through precise 3D geometry control, a capability not present in existing methods. It also addresses the challenge of preserving both texture and geometry during image enhancement, using a combination of feature injection and depth control. The framework is evaluated on a new dataset, SculptingBench, which contains 28 images covering six categories. The results show that the method significantly enhances texture quality while maintaining geometric consistency. The method is also compared with other state-of-the-art techniques, demonstrating its superiority in precision and control. The framework has limitations, including dependency on the quality of single-view 3D reconstruction and the need for manual efforts in model rigging. Future research could explore data-driven techniques to automate this process. The output resolution of the pipeline also falls short of industrial rendering systems, and incorporating super-resolution methods could be a solution for future improvements. Another issue is the lack of background lighting adjustment, which undermines the realism of the scene; future work could benefit from integrating dynamic (re-)lighting techniques.