SceMQA: A Scientific College Entrance Level Multimodal Question Answering Benchmark

SceMQA: A Scientific College Entrance Level Multimodal Question Answering Benchmark

6 Feb 2024 | Zhenwen Liang, Kehan Guo, Gang Liu, Taicheng Guo, Yujun Zhou, Tianyu Yang, Jiajun Jiao, Renjie Pi, Jipeng Zhang, Xiangliang Zhang
The paper introduces SceMQA, a novel benchmark for scientific multimodal question answering at the college entrance level. It addresses a critical educational phase often overlooked in existing benchmarks, spanning high school to pre-college levels. SceMQA focuses on core science subjects including Mathematics, Physics, Chemistry, and Biology. The benchmark features a blend of multiple-choice and free-response formats, ensuring a comprehensive evaluation of AI models' abilities. Each problem is accompanied by detailed, human-verified explanations and is classified into specific knowledge components, facilitating detailed knowledge tracing for models. Additionally, SceMQA presents problems with identical contexts but varied questions to facilitate a more thorough and accurate assessment of reasoning capabilities. The benchmark includes 1,045 problems, with an average of 261 problems per subject. The data was sourced from publicly available online materials tailored for college entrance level tests in the four key subjects. The problems are presented in two formats: multiple-choice and free-response. The experimental results using state-of-the-art models like GPT4-V and Google Gemini highlight significant potential for further improvements in AI systems. The accuracy of these models on SceMQA shows that even the most advanced models achieve only about 50% accuracy, indicating the need for further research and development. The paper also includes a detailed error analysis of the models' performance, identifying specific patterns and weaknesses. The findings provide valuable insights and directions for future research aimed at enhancing the capabilities of MLLMs in educational and research contexts, particularly in the domain of science.The paper introduces SceMQA, a novel benchmark for scientific multimodal question answering at the college entrance level. It addresses a critical educational phase often overlooked in existing benchmarks, spanning high school to pre-college levels. SceMQA focuses on core science subjects including Mathematics, Physics, Chemistry, and Biology. The benchmark features a blend of multiple-choice and free-response formats, ensuring a comprehensive evaluation of AI models' abilities. Each problem is accompanied by detailed, human-verified explanations and is classified into specific knowledge components, facilitating detailed knowledge tracing for models. Additionally, SceMQA presents problems with identical contexts but varied questions to facilitate a more thorough and accurate assessment of reasoning capabilities. The benchmark includes 1,045 problems, with an average of 261 problems per subject. The data was sourced from publicly available online materials tailored for college entrance level tests in the four key subjects. The problems are presented in two formats: multiple-choice and free-response. The experimental results using state-of-the-art models like GPT4-V and Google Gemini highlight significant potential for further improvements in AI systems. The accuracy of these models on SceMQA shows that even the most advanced models achieve only about 50% accuracy, indicating the need for further research and development. The paper also includes a detailed error analysis of the models' performance, identifying specific patterns and weaknesses. The findings provide valuable insights and directions for future research aimed at enhancing the capabilities of MLLMs in educational and research contexts, particularly in the domain of science.
Reach us at info@study.space