DREAMBENCH++ is a human-aligned benchmark designed for evaluating personalized image generation models. It addresses the limitations of existing evaluation methods, which either misalign with human preferences or are time-consuming and expensive. DREAMBENCH++ leverages advanced multimodal GPT models to automate the evaluation process while ensuring alignment with human preferences. The benchmark includes a comprehensive dataset of diverse images and prompts, covering various levels of difficulty and categories. By evaluating seven modern generative models, DREAMBENCH++ demonstrates significantly better alignment with human preferences compared to traditional metrics like DINO and CLIP. The evaluation results show that DREAMBENCH++ achieves 79.64% and 93.18% agreement with human ratings in concept preservation and prompt following, respectively. The paper also discusses the design of prompts for GPT models to enhance their alignment with human preferences and the importance of diverse and rich evaluation data. The authors provide insights into the strengths and limitations of different evaluation methods and highlight the potential of DREAMBENCH++ in advancing the field of personalized image generation.DREAMBENCH++ is a human-aligned benchmark designed for evaluating personalized image generation models. It addresses the limitations of existing evaluation methods, which either misalign with human preferences or are time-consuming and expensive. DREAMBENCH++ leverages advanced multimodal GPT models to automate the evaluation process while ensuring alignment with human preferences. The benchmark includes a comprehensive dataset of diverse images and prompts, covering various levels of difficulty and categories. By evaluating seven modern generative models, DREAMBENCH++ demonstrates significantly better alignment with human preferences compared to traditional metrics like DINO and CLIP. The evaluation results show that DREAMBENCH++ achieves 79.64% and 93.18% agreement with human ratings in concept preservation and prompt following, respectively. The paper also discusses the design of prompts for GPT models to enhance their alignment with human preferences and the importance of diverse and rich evaluation data. The authors provide insights into the strengths and limitations of different evaluation methods and highlight the potential of DREAMBENCH++ in advancing the field of personalized image generation.