SceMQA: A Scientific College Entrance Level Multimodal Question Answering Benchmark

SceMQA: A Scientific College Entrance Level Multimodal Question Answering Benchmark

6 Feb 2024 | Zhenwen Liang, Kehan Guo, Gang Liu, Taicheng Guo, Yujun Zhou, Tianyu Yang, Jiajun Jiao, Renjie Pi, Jipeng Zhang, and Xiangliang Zhang
SceMQA is a new benchmark for scientific college entrance level multimodal question answering. It addresses a critical educational phase often overlooked in existing benchmarks, spanning high school to pre-college levels. The benchmark focuses on core science subjects including Mathematics, Physics, Chemistry, and Biology. It features a blend of multiple-choice and free-response formats, ensuring a comprehensive evaluation of AI models' abilities. Additionally, the benchmark provides specific knowledge points for each problem and detailed explanations for each answer. SceMQA also uniquely presents problems with identical contexts but varied questions to facilitate a more thorough and accurate assessment of reasoning capabilities. The benchmark includes 1,045 problems, with an average of 261 problems per subject. Each problem contains one image essential for solving the corresponding question, and the problems are presented in two formats: multiple-choice and free-response. The benchmark also includes detailed, human-verified explanations for each answer and classifies each problem into a specific knowledge component. Furthermore, the benchmark features a variety of questions based on the same image and context, which presents a suitable challenge for current AI systems. In the experiment, we evaluate both open-source and closed-source state-of-the-art Multimodal Large Language Models (MLLMs), across various experimental settings. The results show that further research and development are needed in developing more capable MLLMs, as highlighted by only 50% to 60% accuracy achieved by the strongest models. Our benchmark and analysis will be available at https://scemqa.github.io/.SceMQA is a new benchmark for scientific college entrance level multimodal question answering. It addresses a critical educational phase often overlooked in existing benchmarks, spanning high school to pre-college levels. The benchmark focuses on core science subjects including Mathematics, Physics, Chemistry, and Biology. It features a blend of multiple-choice and free-response formats, ensuring a comprehensive evaluation of AI models' abilities. Additionally, the benchmark provides specific knowledge points for each problem and detailed explanations for each answer. SceMQA also uniquely presents problems with identical contexts but varied questions to facilitate a more thorough and accurate assessment of reasoning capabilities. The benchmark includes 1,045 problems, with an average of 261 problems per subject. Each problem contains one image essential for solving the corresponding question, and the problems are presented in two formats: multiple-choice and free-response. The benchmark also includes detailed, human-verified explanations for each answer and classifies each problem into a specific knowledge component. Furthermore, the benchmark features a variety of questions based on the same image and context, which presents a suitable challenge for current AI systems. In the experiment, we evaluate both open-source and closed-source state-of-the-art Multimodal Large Language Models (MLLMs), across various experimental settings. The results show that further research and development are needed in developing more capable MLLMs, as highlighted by only 50% to 60% accuracy achieved by the strongest models. Our benchmark and analysis will be available at https://scemqa.github.io/.
Reach us at info@study.space