18 Jan 2023 | Tim Brooks*, Aleksander Holynski*, Alexei A. Efros
The paper "InstructPix2Pix: Learning to Follow Image Editing Instructions" by Tim Brooks, Aleksander Holynski, Alexei A. Efros, and others from the University of California, Berkeley, introduces a method for editing images based on human instructions. The authors combine a large language model (GPT-3) and a text-to-image model (Stable Diffusion) to generate a dataset of image editing examples. This dataset is used to train a conditional diffusion model, InstructPix2Pix, which can edit images in the forward pass without requiring per-example fine-tuning or inversion. The model performs a variety of edits, such as replacing objects, changing styles, and modifying settings, and generalizes well to real images and user-written instructions. The paper also discusses the limitations of the method, including the visual quality of the generated dataset and the biases inherent in the models used.The paper "InstructPix2Pix: Learning to Follow Image Editing Instructions" by Tim Brooks, Aleksander Holynski, Alexei A. Efros, and others from the University of California, Berkeley, introduces a method for editing images based on human instructions. The authors combine a large language model (GPT-3) and a text-to-image model (Stable Diffusion) to generate a dataset of image editing examples. This dataset is used to train a conditional diffusion model, InstructPix2Pix, which can edit images in the forward pass without requiring per-example fine-tuning or inversion. The model performs a variety of edits, such as replacing objects, changing styles, and modifying settings, and generalizes well to real images and user-written instructions. The paper also discusses the limitations of the method, including the visual quality of the generated dataset and the biases inherent in the models used.