27 Mar 2024 | Daniel Winter¹², Matan Cohen¹, Shlomi Fruchter¹, Yael Pritch¹, Alex Rav-Acha¹, Yedid Hoshen¹²
ObjectDrop is a method for photorealistic object removal and insertion that uses a "counterfactual" dataset to train a diffusion model. The method involves capturing a scene before and after removing an object, while minimizing other changes. By fine-tuning a diffusion model on this dataset, the method can remove objects and their effects on the scene. However, for object insertion, a large dataset is impractical, so the method proposes bootstrap supervision, using the object removal model to synthetically expand the dataset. This approach significantly outperforms prior methods in photorealistic object removal and insertion, particularly in modeling the effects of objects on the scene. The method is effective for both adding and removing the effects of objects, and it generalizes well to out-of-distribution scenarios. The method is compared to recent approaches such as Emu Edit, AnyDoor, and Paint-by-Example, and shows superior performance. The method's contributions include an analysis of the limitations of self-supervised training for editing the effects of objects on scenes, an effective counterfactual supervised training method for photorealistic object removal, and a bootstrap supervision approach to mitigate the labeling burden for object insertion. The method is evaluated on various benchmarks and shows significant improvements in object removal and insertion tasks.ObjectDrop is a method for photorealistic object removal and insertion that uses a "counterfactual" dataset to train a diffusion model. The method involves capturing a scene before and after removing an object, while minimizing other changes. By fine-tuning a diffusion model on this dataset, the method can remove objects and their effects on the scene. However, for object insertion, a large dataset is impractical, so the method proposes bootstrap supervision, using the object removal model to synthetically expand the dataset. This approach significantly outperforms prior methods in photorealistic object removal and insertion, particularly in modeling the effects of objects on the scene. The method is effective for both adding and removing the effects of objects, and it generalizes well to out-of-distribution scenarios. The method is compared to recent approaches such as Emu Edit, AnyDoor, and Paint-by-Example, and shows superior performance. The method's contributions include an analysis of the limitations of self-supervised training for editing the effects of objects on scenes, an effective counterfactual supervised training method for photorealistic object removal, and a bootstrap supervision approach to mitigate the labeling burden for object insertion. The method is evaluated on various benchmarks and shows significant improvements in object removal and insertion tasks.