UltraEdit: Instruction-based Fine-Grained Image Editing at Scale

UltraEdit: Instruction-based Fine-Grained Image Editing at Scale

19 Dec 2024 | Haozhe Zhao, Xiaojian Ma, Liang Chen, Shuzheng Si, Rujie Wu, Kaikai An, Peiyu Yu, Minjia Zhang, Qing Li, Baobao Chang
This paper introduces ULTRAEDIT, a large-scale dataset for instruction-based image editing, comprising approximately 4 million editing samples. The key contributions of ULTRAEDIT include: 1. **Diverse Editing Instructions**: ULTRAEDIT leverages both human raters and large language models (LLMs) to generate a broad range of editing instructions, addressing the limitations of existing datasets like InstructPix2Pix and MagicBrush. 2. **Real Image Anchors**: The dataset uses real images from diverse sources, including photographs and artworks, to reduce biases and provide more balanced and diverse editing examples. 3. **Region-Based Editing**: ULTRAEDIT supports region-based editing, enhancing the quality and effectiveness of editing models by allowing more fine-grained modifications. The dataset is constructed through a systematic pipeline that combines LLM creativity with human-written instructions, real image anchors, and automatic region generation. Experiments on MagicBrush and EmuEdit benchmarks demonstrate that models trained on ULTRAEDIT achieve state-of-the-art performance, particularly in handling complex and fine-grained editing tasks. The paper also includes qualitative evaluations and ablation studies to validate the effectiveness of the dataset's design choices. Overall, ULTRAEDIT represents a significant advancement in the field of image editing, offering a rich and diverse resource for researchers and practitioners.This paper introduces ULTRAEDIT, a large-scale dataset for instruction-based image editing, comprising approximately 4 million editing samples. The key contributions of ULTRAEDIT include: 1. **Diverse Editing Instructions**: ULTRAEDIT leverages both human raters and large language models (LLMs) to generate a broad range of editing instructions, addressing the limitations of existing datasets like InstructPix2Pix and MagicBrush. 2. **Real Image Anchors**: The dataset uses real images from diverse sources, including photographs and artworks, to reduce biases and provide more balanced and diverse editing examples. 3. **Region-Based Editing**: ULTRAEDIT supports region-based editing, enhancing the quality and effectiveness of editing models by allowing more fine-grained modifications. The dataset is constructed through a systematic pipeline that combines LLM creativity with human-written instructions, real image anchors, and automatic region generation. Experiments on MagicBrush and EmuEdit benchmarks demonstrate that models trained on ULTRAEDIT achieve state-of-the-art performance, particularly in handling complex and fine-grained editing tasks. The paper also includes qualitative evaluations and ablation studies to validate the effectiveness of the dataset's design choices. Overall, ULTRAEDIT represents a significant advancement in the field of image editing, offering a rich and diverse resource for researchers and practitioners.
Reach us at info@study.space