[slides] GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation

This paper introduces GPT-4V as a human-aligned evaluator for text-to-3D generation. The authors propose a versatile evaluation metric that can compare 3D shapes based on user-defined criteria. The method involves generating input prompts using GPT-4V and then using GPT-4V to compare 3D assets according to these criteria. The results are used to assign Elo ratings to text-to-3D models, providing a scalable and holistic evaluation. The evaluation metric is tested on various criteria, including text-asset alignment, 3D plausibility, texture details, geometry details, and texture-geometry coherency. The results show that the metric strongly aligns with human preferences across different criteria. The code is available at https://github.com/3DTopia/GPTEval3D. The method is evaluated on 13 generative models, including ten optimization-based methods and three feed-forward methods. The results show that the proposed metric outperforms existing metrics in most criteria. The method is also extended to evaluate diversity in 3D outputs. The authors conclude that their method provides a scalable and human-aligned evaluation for text-to-3D generation.This paper introduces GPT-4V as a human-aligned evaluator for text-to-3D generation. The authors propose a versatile evaluation metric that can compare 3D shapes based on user-defined criteria. The method involves generating input prompts using GPT-4V and then using GPT-4V to compare 3D assets according to these criteria. The results are used to assign Elo ratings to text-to-3D models, providing a scalable and holistic evaluation. The evaluation metric is tested on various criteria, including text-asset alignment, 3D plausibility, texture details, geometry details, and texture-geometry coherency. The results show that the metric strongly aligns with human preferences across different criteria. The code is available at https://github.com/3DTopia/GPTEval3D. The method is evaluated on 13 generative models, including ten optimization-based methods and three feed-forward methods. The results show that the proposed metric outperforms existing metrics in most criteria. The method is also extended to evaluate diversity in 3D outputs. The authors conclude that their method provides a scalable and human-aligned evaluation for text-to-3D generation.

GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation

9 Jan 2024 | Tong Wu, Guandao Yang, Zhibing Li, Kai Zhang, Ziwei Liu, Leonidas Guibas, Dahua Lin, Gordon Wetzstein