This paper introduces ULTRAEDIT, a large-scale dataset for instruction-based image editing, comprising approximately 4 million editing samples. The key contributions of ULTRAEDIT include:
1. **Diverse Editing Instructions**: ULTRAEDIT leverages both human raters and large language models (LLMs) to generate a broad range of editing instructions, addressing the limitations of existing datasets like InstructPix2Pix and MagicBrush.
2. **Real Image Anchors**: The dataset uses real images from diverse sources, including photographs and artworks, to reduce biases and provide more balanced and diverse editing examples.
3. **Region-Based Editing**: ULTRAEDIT supports region-based editing, enhancing the quality and effectiveness of editing models by allowing more fine-grained modifications.
The dataset is constructed through a systematic pipeline that combines LLM creativity with human-written instructions, real image anchors, and automatic region generation. Experiments on MagicBrush and EmuEdit benchmarks demonstrate that models trained on ULTRAEDIT achieve state-of-the-art performance, particularly in handling complex and fine-grained editing tasks. The paper also includes qualitative evaluations and ablation studies to validate the effectiveness of the dataset's design choices. Overall, ULTRAEDIT represents a significant advancement in the field of image editing, offering a rich and diverse resource for researchers and practitioners.This paper introduces ULTRAEDIT, a large-scale dataset for instruction-based image editing, comprising approximately 4 million editing samples. The key contributions of ULTRAEDIT include:
1. **Diverse Editing Instructions**: ULTRAEDIT leverages both human raters and large language models (LLMs) to generate a broad range of editing instructions, addressing the limitations of existing datasets like InstructPix2Pix and MagicBrush.
2. **Real Image Anchors**: The dataset uses real images from diverse sources, including photographs and artworks, to reduce biases and provide more balanced and diverse editing examples.
3. **Region-Based Editing**: ULTRAEDIT supports region-based editing, enhancing the quality and effectiveness of editing models by allowing more fine-grained modifications.
The dataset is constructed through a systematic pipeline that combines LLM creativity with human-written instructions, real image anchors, and automatic region generation. Experiments on MagicBrush and EmuEdit benchmarks demonstrate that models trained on ULTRAEDIT achieve state-of-the-art performance, particularly in handling complex and fine-grained editing tasks. The paper also includes qualitative evaluations and ablation studies to validate the effectiveness of the dataset's design choices. Overall, ULTRAEDIT represents a significant advancement in the field of image editing, offering a rich and diverse resource for researchers and practitioners.