DREAMBENCH++: A Human-Aligned Benchmark for Personalized Image Generation

DREAMBENCH++: A Human-Aligned Benchmark for Personalized Image Generation

24 Jun 2024 | Yuang Peng, Yuxin Cui, Haomiao Tang, Zekun Qi, Runpei Dong, Jing Bai, Chunrui Han, Zheng Ge, Xiangyu Zhang, Shu-Tao Xia
DREAMBENCH++ is a human-aligned benchmark designed for evaluating personalized image generation models. It addresses the limitations of existing evaluation methods, which either misalign with human preferences or are time-consuming and expensive. DREAMBENCH++ leverages advanced multimodal GPT models to automate the evaluation process while ensuring alignment with human preferences. The benchmark includes a comprehensive dataset of diverse images and prompts, covering various levels of difficulty and categories. By evaluating seven modern generative models, DREAMBENCH++ demonstrates significantly better alignment with human preferences compared to traditional metrics like DINO and CLIP. The evaluation results show that DREAMBENCH++ achieves 79.64% and 93.18% agreement with human ratings in concept preservation and prompt following, respectively. The paper also discusses the design of prompts for GPT models to enhance their alignment with human preferences and the importance of diverse and rich evaluation data. The authors provide insights into the strengths and limitations of different evaluation methods and highlight the potential of DREAMBENCH++ in advancing the field of personalized image generation.DREAMBENCH++ is a human-aligned benchmark designed for evaluating personalized image generation models. It addresses the limitations of existing evaluation methods, which either misalign with human preferences or are time-consuming and expensive. DREAMBENCH++ leverages advanced multimodal GPT models to automate the evaluation process while ensuring alignment with human preferences. The benchmark includes a comprehensive dataset of diverse images and prompts, covering various levels of difficulty and categories. By evaluating seven modern generative models, DREAMBENCH++ demonstrates significantly better alignment with human preferences compared to traditional metrics like DINO and CLIP. The evaluation results show that DREAMBENCH++ achieves 79.64% and 93.18% agreement with human ratings in concept preservation and prompt following, respectively. The paper also discusses the design of prompts for GPT models to enhance their alignment with human preferences and the importance of diverse and rich evaluation data. The authors provide insights into the strengths and limitations of different evaluation methods and highlight the potential of DREAMBENCH++ in advancing the field of personalized image generation.
Reach us at info@study.space
[slides and audio] DreamBench%2B%2B%3A A Human-Aligned Benchmark for Personalized Image Generation