GenAI Arena: An Open Evaluation Platform for Generative Models

GenAI Arena: An Open Evaluation Platform for Generative Models

6 Aug 2024 | Dongfu Jiang, Max Ku, Tianle Li, Yuansheng Ni, Shizhuo Sun, Rongqi Fan, Wenhuchen Chen
This paper introduces GenAI-Arena, an open platform for evaluating generative models across text-to-image, image editing, and text-to-video tasks based on user preferences. Unlike other platforms, GenAI-Arena is driven by community voting to ensure transparency and sustainable operation. The platform allows users to generate images, compare them side-by-side, and vote for their preferred models. This process provides a ranking system that reflects human preferences, offering a more holistic evaluation of model capabilities. GenAI-Arena covers three arenas: text-to-image generation, text-to-video generation, and image editing. It currently includes 27 open-source generative models and has collected over 6,000 votes from the community since February 11, 2024. The platform uses an Elo rating system to rank models based on user votes, and it also releases GenAI-Bench, a public benchmark for judging the evaluation ability of multimodal large language models (MLLMs) for generative tasks. The results show that even the best MLLM, GPT-4o, achieves an average accuracy of 49.19% across the three generative tasks. The platform also includes a pre-computed data pool, GenAI-Museum, to facilitate user interaction and streamline the evaluation process. The findings highlight the importance of training data and the limitations of current evaluation metrics in assessing the quality of generated visual content. The paper also discusses the challenges of evaluating generative models and the potential of using human preferences to improve the accuracy of model evaluations. The results demonstrate that user votes can provide high-quality evaluations, even for advanced models, and that the Elo rating system can be biased by the imbalance between "easy games" and "hard games." The paper concludes that GenAI-Arena provides a valuable resource for the research community to track the progress of generative model research and improve the evaluation of model performance.This paper introduces GenAI-Arena, an open platform for evaluating generative models across text-to-image, image editing, and text-to-video tasks based on user preferences. Unlike other platforms, GenAI-Arena is driven by community voting to ensure transparency and sustainable operation. The platform allows users to generate images, compare them side-by-side, and vote for their preferred models. This process provides a ranking system that reflects human preferences, offering a more holistic evaluation of model capabilities. GenAI-Arena covers three arenas: text-to-image generation, text-to-video generation, and image editing. It currently includes 27 open-source generative models and has collected over 6,000 votes from the community since February 11, 2024. The platform uses an Elo rating system to rank models based on user votes, and it also releases GenAI-Bench, a public benchmark for judging the evaluation ability of multimodal large language models (MLLMs) for generative tasks. The results show that even the best MLLM, GPT-4o, achieves an average accuracy of 49.19% across the three generative tasks. The platform also includes a pre-computed data pool, GenAI-Museum, to facilitate user interaction and streamline the evaluation process. The findings highlight the importance of training data and the limitations of current evaluation metrics in assessing the quality of generated visual content. The paper also discusses the challenges of evaluating generative models and the potential of using human preferences to improve the accuracy of model evaluations. The results demonstrate that user votes can provide high-quality evaluations, even for advanced models, and that the Elo rating system can be biased by the imbalance between "easy games" and "hard games." The paper concludes that GenAI-Arena provides a valuable resource for the research community to track the progress of generative model research and improve the evaluation of model performance.
Reach us at info@study.space