[slides] HQ-Edit%3A A High-Quality Dataset for Instruction-based Image Editing

HQ-Edit is a high-quality dataset for instruction-based image editing, containing around 200,000 edits. Unlike previous methods that relied on attribute guidance or human feedback, HQ-Edit uses advanced foundation models like GPT-4V and DALL-E 3 to create a scalable data collection pipeline. The dataset includes high-resolution images with detailed text prompts, ensuring precise alignment between instructions and images. Two evaluation metrics, Alignment and Coherence, are introduced to assess the quality of image edit pairs. HQ-Edit significantly outperforms existing datasets in both metrics, demonstrating its superior data quality. The dataset is generated through three phases: Expansion, Generation, and Post-processing. During Expansion, seed triplets are expanded into 100,000 instances. In Generation, DALL-E 3 is used to create diptychs based on prompts. Post-processing refines the images and instructions to ensure alignment and quality. The dataset includes diverse editing operations, ranging from global changes to local modifications, and provides comprehensive instructions for image editing. The dataset is used to fine-tune models like InstructPix2Pix, achieving state-of-the-art performance. The project page is available at https://thefllood.github.io/HQEdit_web.HQ-Edit is a high-quality dataset for instruction-based image editing, containing around 200,000 edits. Unlike previous methods that relied on attribute guidance or human feedback, HQ-Edit uses advanced foundation models like GPT-4V and DALL-E 3 to create a scalable data collection pipeline. The dataset includes high-resolution images with detailed text prompts, ensuring precise alignment between instructions and images. Two evaluation metrics, Alignment and Coherence, are introduced to assess the quality of image edit pairs. HQ-Edit significantly outperforms existing datasets in both metrics, demonstrating its superior data quality. The dataset is generated through three phases: Expansion, Generation, and Post-processing. During Expansion, seed triplets are expanded into 100,000 instances. In Generation, DALL-E 3 is used to create diptychs based on prompts. Post-processing refines the images and instructions to ensure alignment and quality. The dataset includes diverse editing operations, ranging from global changes to local modifications, and provides comprehensive instructions for image editing. The dataset is used to fine-tune models like InstructPix2Pix, achieving state-of-the-art performance. The project page is available at https://thefllood.github.io/HQEdit_web.

HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing

15 Apr 2024 | Mude Hui, Siwei Yang, Bingchen Zhao, Yichun Shi, Heng Wang, Peng Wang, Yuyin Zhou, Cihang Xie