28 Apr 2024 | Navve Wasserman, Noam Rotstein, Roy Ganz, and Ron Kimmel
Paint by Inpaint: Learning to Add Image Objects by Removing Them First
This paper introduces a novel framework for image editing called Paint by Inpaint, which leverages the insight that removing objects (inpainting) is simpler than adding them (painting). The key idea is to use a large-scale dataset of image pairs, where one image contains an object and the other does not, to train a diffusion model that can add objects to images. The dataset, named PIPE (Paint by Inpaint Editing), is created by first removing objects from images using a high-performance inpainting model and then generating natural language instructions for adding those objects back. The instructions are generated using a combination of a Vision-Language Model (VLM) and a Large Language Model (LLM), which provide detailed descriptions of the removed objects and convert them into natural language instructions. The resulting dataset includes over 1 million image pairs and spans more than 1400 different classes, with thousands of unique attributes.
The paper also presents a diffusion model trained on the PIPE dataset, which is capable of adding objects to images based on text instructions. The model is evaluated on multiple benchmarks and metrics, demonstrating superior performance compared to existing methods. The results show that the model can add objects to images in a natural and coherent manner, maintaining consistency with the original images. Additionally, the model is shown to be effective in general editing tasks when combined with other editing datasets.
The paper also discusses the limitations of the approach, including the potential for errors in the object removal phase and the effectiveness of the instruction generation methods. Despite these limitations, the results demonstrate that the Paint by Inpaint framework is a promising approach for image editing, with the potential to significantly enhance the performance of image editing models. The dataset and model are made available for further research and development.Paint by Inpaint: Learning to Add Image Objects by Removing Them First
This paper introduces a novel framework for image editing called Paint by Inpaint, which leverages the insight that removing objects (inpainting) is simpler than adding them (painting). The key idea is to use a large-scale dataset of image pairs, where one image contains an object and the other does not, to train a diffusion model that can add objects to images. The dataset, named PIPE (Paint by Inpaint Editing), is created by first removing objects from images using a high-performance inpainting model and then generating natural language instructions for adding those objects back. The instructions are generated using a combination of a Vision-Language Model (VLM) and a Large Language Model (LLM), which provide detailed descriptions of the removed objects and convert them into natural language instructions. The resulting dataset includes over 1 million image pairs and spans more than 1400 different classes, with thousands of unique attributes.
The paper also presents a diffusion model trained on the PIPE dataset, which is capable of adding objects to images based on text instructions. The model is evaluated on multiple benchmarks and metrics, demonstrating superior performance compared to existing methods. The results show that the model can add objects to images in a natural and coherent manner, maintaining consistency with the original images. Additionally, the model is shown to be effective in general editing tasks when combined with other editing datasets.
The paper also discusses the limitations of the approach, including the potential for errors in the object removal phase and the effectiveness of the instruction generation methods. Despite these limitations, the results demonstrate that the Paint by Inpaint framework is a promising approach for image editing, with the potential to significantly enhance the performance of image editing models. The dataset and model are made available for further research and development.