SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing

SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing

7 May 2024 | Yuying Ge1*, Sijie Zhao1*, Chen Li2*, Yixiao Ge1,2† Ying Shan1,2
SEED-Data-Edit is a hybrid dataset designed for instruction-guided image editing, aiming to enhance the controllability and flexibility of image manipulation using natural language instructions. The dataset comprises three main components: 1. **Automated Pipeline-Generated Data**: This part uses two automated pipelines to produce a substantial volume of diverse image editing pairs. Pipeline (a) removes objects from images, while Pipeline (b) generates changes in style, object, color, material, or expression. 2. **Real-World Scenario Data**: This part collects image editing pairs from websites where amateur photographers post their images along with editing requests. These requests are addressed by Photoshop experts, providing edited images that reflect real-world user intentions. 3. **Multi-Turn Editing Data**: This part involves multiple rounds of edits performed by Photoshop experts on real images, simulating iterative editing processes. Each editing round includes various modifications such as replacing, adding, or removing objects, changing actions, altering text, and modifying object counts. The dataset is comprehensive and versatile, making it suitable for training language-guided image editing models. A pre-trained Multimodal Large Language Model (MLLM) named SEED-X is fine-tuned with SEED-Data-Edit to create the instruction-tuned model SEED-X-Edit. This model demonstrates promising results in adhering to editing instructions, showcasing the effectiveness of SEED-Data-Edit in advancing the field of instructional image editing. All data and the fine-tuned model are released for public use.SEED-Data-Edit is a hybrid dataset designed for instruction-guided image editing, aiming to enhance the controllability and flexibility of image manipulation using natural language instructions. The dataset comprises three main components: 1. **Automated Pipeline-Generated Data**: This part uses two automated pipelines to produce a substantial volume of diverse image editing pairs. Pipeline (a) removes objects from images, while Pipeline (b) generates changes in style, object, color, material, or expression. 2. **Real-World Scenario Data**: This part collects image editing pairs from websites where amateur photographers post their images along with editing requests. These requests are addressed by Photoshop experts, providing edited images that reflect real-world user intentions. 3. **Multi-Turn Editing Data**: This part involves multiple rounds of edits performed by Photoshop experts on real images, simulating iterative editing processes. Each editing round includes various modifications such as replacing, adding, or removing objects, changing actions, altering text, and modifying object counts. The dataset is comprehensive and versatile, making it suitable for training language-guided image editing models. A pre-trained Multimodal Large Language Model (MLLM) named SEED-X is fine-tuned with SEED-Data-Edit to create the instruction-tuned model SEED-X-Edit. This model demonstrates promising results in adhering to editing instructions, showcasing the effectiveness of SEED-Data-Edit in advancing the field of instructional image editing. All data and the fine-tuned model are released for public use.
Reach us at info@study.space