Understanding GaussCtrl%3A Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing

**GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing** **Authors:** Jing Wu, Jia-Wang Bian, Xinghui Li, Guangrun Wang, Ian Reid, Philip Torr, Victor Adrian Prisacariu **Institutions:** University of Oxford, Mohamed bin Zayed University of Artificial Intelligence **Abstract:** GaussCtrl is a text-driven method for editing 3D scenes reconstructed using 3D Gaussian Splatting (3DGS). It first renders images from the 3DGS and edits them using a pre-trained 2D diffusion model (ControlNet) based on input prompts. The key contribution is a depth-conditioned multi-view consistent editing framework, which improves the quality and consistency of 3D results by editing all images together and re-training the 3D model. This approach enforces geometric consistency across multi-view images using depth maps and aligns the appearance of edited images through attention-based latent code alignment. Experiments demonstrate that GaussCtrl achieves faster editing and better visual quality compared to previous methods. **Keywords:** 3D Editing, Diffusion Models, Gaussian Splatting, Neural Radiance Fields **Introduction:** Neural representations like Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have shown promise in creating high-quality novel view renderings. However, editing these representations remains challenging. Instruct NeRF2NeRF (IN2N) was the first to edit NeRF scenes using text instructions, but it lacks multi-view consistency, leading to visual artifacts. GaussCtrl addresses this by employing depth-conditioned editing and attention-based latent code alignment to ensure consistent editing across all images. **Method:** GaussCtrl uses ControlNet for depth-conditioned editing, leveraging consistent depth maps to enforce geometric consistency. It also employs an attention-based latent code alignment module to unify the appearance of edited images across multiple views. The method optimizes the original 3D model using the edited images to produce the final edited 3D model. **Experiments:** GaussCtrl is evaluated on various scenes and text prompts, including 360-degree and forward-facing scenes. Ablation studies and qualitative comparisons with state-of-the-art methods demonstrate the effectiveness of the proposed techniques. The method outperforms competitors in terms of visual quality and consistency. **Conclusion:** GaussCtrl is an efficient 3D-aware consistency control editing method that significantly reduces artifacts and improves the quality of 3D editing, especially in 360-degree scenes. It achieves this by enforcing multi-view consistency at all stages of editing, including depth-conditioned image editing and attention-based latent code alignment.**GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing** **Authors:** Jing Wu, Jia-Wang Bian, Xinghui Li, Guangrun Wang, Ian Reid, Philip Torr, Victor Adrian Prisacariu **Institutions:** University of Oxford, Mohamed bin Zayed University of Artificial Intelligence **Abstract:** GaussCtrl is a text-driven method for editing 3D scenes reconstructed using 3D Gaussian Splatting (3DGS). It first renders images from the 3DGS and edits them using a pre-trained 2D diffusion model (ControlNet) based on input prompts. The key contribution is a depth-conditioned multi-view consistent editing framework, which improves the quality and consistency of 3D results by editing all images together and re-training the 3D model. This approach enforces geometric consistency across multi-view images using depth maps and aligns the appearance of edited images through attention-based latent code alignment. Experiments demonstrate that GaussCtrl achieves faster editing and better visual quality compared to previous methods. **Keywords:** 3D Editing, Diffusion Models, Gaussian Splatting, Neural Radiance Fields **Introduction:** Neural representations like Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have shown promise in creating high-quality novel view renderings. However, editing these representations remains challenging. Instruct NeRF2NeRF (IN2N) was the first to edit NeRF scenes using text instructions, but it lacks multi-view consistency, leading to visual artifacts. GaussCtrl addresses this by employing depth-conditioned editing and attention-based latent code alignment to ensure consistent editing across all images. **Method:** GaussCtrl uses ControlNet for depth-conditioned editing, leveraging consistent depth maps to enforce geometric consistency. It also employs an attention-based latent code alignment module to unify the appearance of edited images across multiple views. The method optimizes the original 3D model using the edited images to produce the final edited 3D model. **Experiments:** GaussCtrl is evaluated on various scenes and text prompts, including 360-degree and forward-facing scenes. Ablation studies and qualitative comparisons with state-of-the-art methods demonstrate the effectiveness of the proposed techniques. The method outperforms competitors in terms of visual quality and consistency. **Conclusion:** GaussCtrl is an efficient 3D-aware consistency control editing method that significantly reduces artifacts and improves the quality of 3D editing, especially in 360-degree scenes. It achieves this by enforcing multi-view consistency at all stages of editing, including depth-conditioned image editing and attention-based latent code alignment.

GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing

14 Jul 2024 | Jing Wu*1, Jia-Wang Bian*2, Xinghui Li1, Guangrun Wang1, Ian Reid2, Philip Torr1, Victor Adrian Prisacariu1

14 Jul 2024 | Jing Wu1, Jia-Wang Bian2, Xinghui Li1, Guangrun Wang1, Ian Reid2, Philip Torr1, Victor Adrian Prisacariu1