21 May 2024 | FRANCESCO PALANDRA* and ANDREA SANCHIETTI*, Sapienza University of Rome, Italy DANIELE BAIERI, Sapienza University of Rome, Italy EMANUELE RODOLÀ, Sapienza University of Rome, Italy
**GSEdit: Efficient Text-Guided Editing of 3D Objects via Gaussian Splatting**
**Authors:** Francesco Palandra, Andrea Sanchietti, Daniele Baieri, Emanuele Rodola
**Institution:** Sapienza University of Rome, Italy
**Abstract:**
GSEdit is a pipeline for text-guided 3D object editing based on Gaussian Splatting (GS) models. It enables the editing of 3D object style and appearance without altering their main details, all within a few minutes on consumer hardware. The method leverages GS to represent 3D scenes and optimizes the model by progressively varying image supervision using a pretrained image-based diffusion model. The input can be a 3D triangular mesh or Gaussians from a generative model like DreamGaussian. GSEdit ensures consistency across different viewpoints and maintains the integrity of the original object's information. Compared to methods relying on NeRF-like MLP models, GSEdit stands out for its efficiency, making 3D editing tasks much faster. The editing process is refined via the application of the SDS loss, ensuring precise and accurate edits. Comprehensive evaluation demonstrates that GSEdit effectively alters object shape and appearance following given textual instructions while preserving coherence and detail.
**Keywords:**
Gaussian splatting, Radiance fields, Inverse rendering, 3D Editing
**Contributions:**
- Adaptation of the SDS loss for GS editing to derive analytical gradients.
- Introduction of a pipeline for 3D object editing that can perform significant modifications of any input shape in a few minutes on consumer-grade hardware.
**Background:**
- **Gaussian Splatting:** Represents 3D scenes as a finite set of Gaussian functions, enabling faster training and rendering times while maintaining reconstruction quality.
- **Instruct-Pix2Pix:** A diffusion model for image editing based on text conditioning, operating on the latent space to achieve efficient and high-quality edits.
**Method:**
- **Input Representation:** Supports any Gaussian splatting scene as input, including GS reconstructions from multi-view renders of a 3D mesh or GS scenes output by generative models like DreamGaussian.
- **3D Object Editing:** Uses the same cameras placed for the generation of the GS to render the scene and apply edits using the IP2P model. The process involves capturing, editing, and updating the scene iteratively.
- **Gradient Backpropagation:** Adapts the SDS loss for GS editing, using the noise residual to backpropagate the error through the rasterizer.
- **Mesh Extraction and Texture Refinement:** Extracts the mesh geometry using Marching Cubes and refines the texture quality using a multi-step denoising process and a pixel-wise MSE loss.
**Results:**
- Qualitative results show the ability to modify object shape, color, and style while preserving original features.
- Quantitative results using CLIP-based metrics demonstrate superior**GSEdit: Efficient Text-Guided Editing of 3D Objects via Gaussian Splatting**
**Authors:** Francesco Palandra, Andrea Sanchietti, Daniele Baieri, Emanuele Rodola
**Institution:** Sapienza University of Rome, Italy
**Abstract:**
GSEdit is a pipeline for text-guided 3D object editing based on Gaussian Splatting (GS) models. It enables the editing of 3D object style and appearance without altering their main details, all within a few minutes on consumer hardware. The method leverages GS to represent 3D scenes and optimizes the model by progressively varying image supervision using a pretrained image-based diffusion model. The input can be a 3D triangular mesh or Gaussians from a generative model like DreamGaussian. GSEdit ensures consistency across different viewpoints and maintains the integrity of the original object's information. Compared to methods relying on NeRF-like MLP models, GSEdit stands out for its efficiency, making 3D editing tasks much faster. The editing process is refined via the application of the SDS loss, ensuring precise and accurate edits. Comprehensive evaluation demonstrates that GSEdit effectively alters object shape and appearance following given textual instructions while preserving coherence and detail.
**Keywords:**
Gaussian splatting, Radiance fields, Inverse rendering, 3D Editing
**Contributions:**
- Adaptation of the SDS loss for GS editing to derive analytical gradients.
- Introduction of a pipeline for 3D object editing that can perform significant modifications of any input shape in a few minutes on consumer-grade hardware.
**Background:**
- **Gaussian Splatting:** Represents 3D scenes as a finite set of Gaussian functions, enabling faster training and rendering times while maintaining reconstruction quality.
- **Instruct-Pix2Pix:** A diffusion model for image editing based on text conditioning, operating on the latent space to achieve efficient and high-quality edits.
**Method:**
- **Input Representation:** Supports any Gaussian splatting scene as input, including GS reconstructions from multi-view renders of a 3D mesh or GS scenes output by generative models like DreamGaussian.
- **3D Object Editing:** Uses the same cameras placed for the generation of the GS to render the scene and apply edits using the IP2P model. The process involves capturing, editing, and updating the scene iteratively.
- **Gradient Backpropagation:** Adapts the SDS loss for GS editing, using the noise residual to backpropagate the error through the rasterizer.
- **Mesh Extraction and Texture Refinement:** Extracts the mesh geometry using Marching Cubes and refines the texture quality using a multi-step denoising process and a pixel-wise MSE loss.
**Results:**
- Qualitative results show the ability to modify object shape, color, and style while preserving original features.
- Quantitative results using CLIP-based metrics demonstrate superior