Understanding View-Consistent 3D Editing with Gaussian Splatting

This paper introduces VcEDIT, a novel framework for view-consistent 3D Gaussian Splatting (3DGS) editing. VcEDIT addresses the critical issue of multi-view inconsistency in image-guided 3DGS editing, where guidance images from different views often exhibit significant discrepancies, leading to mode collapse and visual artifacts. To resolve this, VcEDIT integrates two innovative consistency modules: the Cross-attention Consistency Module (CCM) and the Editing Consistency Module (ECM). The CCM harmonizes cross-attention maps across views, while the ECM calibrates multi-view editing outputs to ensure consistency. These modules are integrated into an iterative pattern that refines the 3DGS model through multiple editing iterations, enhancing multi-view consistency and achieving high-quality 3DGS editing. VcEDIT leverages a pre-trained 2D diffusion model to generate edited images, which are then used to refine the 3DGS model. The iterative process involves rendering images, editing them, and updating the 3DGS model based on the edited images. This approach ensures that the 3DGS model is consistently guided across all views, reducing mode collapse and improving editing quality. The framework is evaluated on various real-world scenes, demonstrating superior performance compared to existing methods. VcEDIT's ability to handle diverse editing tasks, including face, object, and large-scale scene editing, highlights its effectiveness in achieving high-fidelity, view-consistent 3DGS editing. The method is implemented with a detailed pipeline that includes encoding, editing, and decoding steps, ensuring consistent and high-quality results. The results show that VcEDIT significantly outperforms other state-of-the-art methods in terms of editing quality and consistency.This paper introduces VcEDIT, a novel framework for view-consistent 3D Gaussian Splatting (3DGS) editing. VcEDIT addresses the critical issue of multi-view inconsistency in image-guided 3DGS editing, where guidance images from different views often exhibit significant discrepancies, leading to mode collapse and visual artifacts. To resolve this, VcEDIT integrates two innovative consistency modules: the Cross-attention Consistency Module (CCM) and the Editing Consistency Module (ECM). The CCM harmonizes cross-attention maps across views, while the ECM calibrates multi-view editing outputs to ensure consistency. These modules are integrated into an iterative pattern that refines the 3DGS model through multiple editing iterations, enhancing multi-view consistency and achieving high-quality 3DGS editing. VcEDIT leverages a pre-trained 2D diffusion model to generate edited images, which are then used to refine the 3DGS model. The iterative process involves rendering images, editing them, and updating the 3DGS model based on the edited images. This approach ensures that the 3DGS model is consistently guided across all views, reducing mode collapse and improving editing quality. The framework is evaluated on various real-world scenes, demonstrating superior performance compared to existing methods. VcEDIT's ability to handle diverse editing tasks, including face, object, and large-scale scene editing, highlights its effectiveness in achieving high-fidelity, view-consistent 3DGS editing. The method is implemented with a detailed pipeline that includes encoding, editing, and decoding steps, ensuring consistent and high-quality results. The results show that VcEDIT significantly outperforms other state-of-the-art methods in terms of editing quality and consistency.

View-Consistent 3D Editing with Gaussian Splatting

17 Feb 2025 | Yuxuan Wang, Xuanyu Yi, Zike Wu, Na Zhao, Long Chen, and Hanwang Zhang