DiffUHaul: A Training-Free Method for Object Dragging in Images

DiffUHaul: A Training-Free Method for Object Dragging in Images

3 Jun 2024 | OMRI AVRAHAMI, NVIDIA, The Hebrew University of Jerusalem RINON GAL, NVIDIA, Tel Aviv University GAL CHECHIK, NVIDIA OHAD FRIED, Reichman University DANI LISCHINSKI, The Hebrew University of Jerusalem ARASH VAHDAT*, NVIDIA WEILI NIE*, NVIDIA
**DiffUHaul: A Training-Free Method for Object Dragging in Images** This paper addresses the challenging task of object dragging in images, where the goal is to seamlessly relocate objects within a scene while preserving their appearance and the surrounding environment. The proposed method, DiffUHaul, leverages the spatial understanding of a localized text-to-image model, specifically the BlobGEN model, to achieve this task without requiring additional training. The key contributions of the paper include: 1. **Gated Self-Attention Entanglement**: The method identifies and addresses the entanglement problem in gated self-attention layers, which can cause the model to leak information between different objects. An inference-time masking technique is introduced to disentangle the representations of different objects. 2. **Soft Attention Anchoring**: A novel soft anchoring mechanism is proposed to improve the consistency of the dragged object. This mechanism blends the attention features of the source and target images during the denoising process, ensuring that the object's appearance is preserved while it is moved to a new location. 3. **DDPM Self-Attention Bucketing**: To adapt the method to real-image editing, a DDPM self-attention bucketing technique is introduced. This technique adds noise to the reference image independently at each diffusion step, preserving fine-grained details in the final image. 4. **Automatic Evaluation and User Study**: The effectiveness of the method is evaluated using a specialized dataset and automatic metrics, including foreground similarity, object traces, and realism. A user study further confirms the method's superior performance over baselines in terms of object dragging, trace removal, and overall edit quality. The paper demonstrates that DiffUHaul significantly outperforms existing methods in object dragging tasks, providing a robust and training-free solution for image editing.**DiffUHaul: A Training-Free Method for Object Dragging in Images** This paper addresses the challenging task of object dragging in images, where the goal is to seamlessly relocate objects within a scene while preserving their appearance and the surrounding environment. The proposed method, DiffUHaul, leverages the spatial understanding of a localized text-to-image model, specifically the BlobGEN model, to achieve this task without requiring additional training. The key contributions of the paper include: 1. **Gated Self-Attention Entanglement**: The method identifies and addresses the entanglement problem in gated self-attention layers, which can cause the model to leak information between different objects. An inference-time masking technique is introduced to disentangle the representations of different objects. 2. **Soft Attention Anchoring**: A novel soft anchoring mechanism is proposed to improve the consistency of the dragged object. This mechanism blends the attention features of the source and target images during the denoising process, ensuring that the object's appearance is preserved while it is moved to a new location. 3. **DDPM Self-Attention Bucketing**: To adapt the method to real-image editing, a DDPM self-attention bucketing technique is introduced. This technique adds noise to the reference image independently at each diffusion step, preserving fine-grained details in the final image. 4. **Automatic Evaluation and User Study**: The effectiveness of the method is evaluated using a specialized dataset and automatic metrics, including foreground similarity, object traces, and realism. A user study further confirms the method's superior performance over baselines in terms of object dragging, trace removal, and overall edit quality. The paper demonstrates that DiffUHaul significantly outperforms existing methods in object dragging tasks, providing a robust and training-free solution for image editing.
Reach us at info@study.space
Understanding DiffUHaul%3A A Training-Free Method for Object Dragging in Images