11 Jun 2024 | Xi Chen, Yutong Feng, Mengting Chen, Yiyang Wang, Shilong Zhang, Yu Liu, Yujun Shen
This paper introduces a novel form of image editing called *imitative editing*, which allows users to edit specific regions in an image by providing a reference image and a source image with a masked region. The key challenge is to automatically capture the semantic correspondence between the reference and source images without explicit user instructions. To achieve this, the authors propose a generative training framework named MimicBrush, which uses a dual diffusion U-Net architecture to learn from video clips. The model is trained to recover masked regions in the source image using information from the reference image, effectively capturing the semantic correspondence between the two images. The effectiveness of MimicBrush is demonstrated through various experiments, including part composition and texture transfer tasks, showing superior performance compared to existing methods. The paper also constructs a benchmark to facilitate further research in this area.This paper introduces a novel form of image editing called *imitative editing*, which allows users to edit specific regions in an image by providing a reference image and a source image with a masked region. The key challenge is to automatically capture the semantic correspondence between the reference and source images without explicit user instructions. To achieve this, the authors propose a generative training framework named MimicBrush, which uses a dual diffusion U-Net architecture to learn from video clips. The model is trained to recover masked regions in the source image using information from the reference image, effectively capturing the semantic correspondence between the two images. The effectiveness of MimicBrush is demonstrated through various experiments, including part composition and texture transfer tasks, showing superior performance compared to existing methods. The paper also constructs a benchmark to facilitate further research in this area.