Understanding MARVEL%3A Multidimensional Abstraction and Reasoning through Visual Evaluation and Learning

The paper introduces MARVEL, a multidimensional abstract visual reasoning benchmark designed to evaluate the abstract visual reasoning (AVR) abilities of multi-modal large language models (MLLMs). MARVEL consists of 770 puzzles that cover six core knowledge patterns, geometric and abstract shapes, and five different task configurations. The benchmark aims to provide a comprehensive evaluation of MLLMs' reasoning abilities by incorporating perception questions to assess their ability to perceive visual details. Comprehensive experiments on nine representative MLLMs reveal that all models show near-random performance on the AVR questions, with a significant gap (40%) compared to humans. Further analysis of perception questions indicates that MLLMs struggle with fine-grained visual feature comprehension, hindering their abstract reasoning capabilities. The paper also discusses the impact of few-shot prompting strategies and the performance of different models on various patterns and task configurations. The results highlight the need for more robust visual perception and reasoning abilities in MLLMs.The paper introduces MARVEL, a multidimensional abstract visual reasoning benchmark designed to evaluate the abstract visual reasoning (AVR) abilities of multi-modal large language models (MLLMs). MARVEL consists of 770 puzzles that cover six core knowledge patterns, geometric and abstract shapes, and five different task configurations. The benchmark aims to provide a comprehensive evaluation of MLLMs' reasoning abilities by incorporating perception questions to assess their ability to perceive visual details. Comprehensive experiments on nine representative MLLMs reveal that all models show near-random performance on the AVR questions, with a significant gap (40%) compared to humans. Further analysis of perception questions indicates that MLLMs struggle with fine-grained visual feature comprehension, hindering their abstract reasoning capabilities. The paper also discusses the impact of few-shot prompting strategies and the performance of different models on various patterns and task configurations. The results highlight the need for more robust visual perception and reasoning abilities in MLLMs.

MARVEL: Multidimensional Abstraction and Reasoning through Visual Evaluation and Learning

24 Apr 2024 | Yifan Jiang, Jiarui Zhang, Kexuan Sun, Zhivar Sourati, Kian Ahrabian, Kaixin Ma, Filip Ilievski, Jay Pujara