**OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM**
This paper introduces OmniMedVQA, a novel large-scale and comprehensive Visual Question Answering (VQA) benchmark designed for medical applications. The benchmark is constructed from 73 different medical datasets, covering 12 modalities and over 20 anatomical regions, ensuring a diverse and realistic evaluation environment. The images in OmniMedVQA are sourced from authentic medical scenarios, aligning closely with real-world medical needs.
The evaluation of 12 representative Large Vision-Language Models (LVLMs) reveals that existing LVLMs struggle to effectively address medical VQA problems, with medical-specialized models performing even worse than general-domain models. This highlights the need for more versatile and robust LVLMs in the biomedical field. The evaluation results also show that while medical LVLMs perform better on specific modalities like CT, MRI, and X-Ray, they fail to consistently outperform general models across all modalities.
The main contributions of this paper include:
1. The introduction of OmniMedVQA, a comprehensive VQA benchmark for medical applications.
2. A thorough evaluation of 12 LVLMs, including both general-domain and specialized medical models.
3. Insights into the limitations of current LVLMs and suggestions for future improvements.
The paper concludes by emphasizing the need for more high-quality, diverse, and realistic datasets to further advance the development of LVLMs in the medical domain.**OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM**
This paper introduces OmniMedVQA, a novel large-scale and comprehensive Visual Question Answering (VQA) benchmark designed for medical applications. The benchmark is constructed from 73 different medical datasets, covering 12 modalities and over 20 anatomical regions, ensuring a diverse and realistic evaluation environment. The images in OmniMedVQA are sourced from authentic medical scenarios, aligning closely with real-world medical needs.
The evaluation of 12 representative Large Vision-Language Models (LVLMs) reveals that existing LVLMs struggle to effectively address medical VQA problems, with medical-specialized models performing even worse than general-domain models. This highlights the need for more versatile and robust LVLMs in the biomedical field. The evaluation results also show that while medical LVLMs perform better on specific modalities like CT, MRI, and X-Ray, they fail to consistently outperform general models across all modalities.
The main contributions of this paper include:
1. The introduction of OmniMedVQA, a comprehensive VQA benchmark for medical applications.
2. A thorough evaluation of 12 LVLMs, including both general-domain and specialized medical models.
3. Insights into the limitations of current LVLMs and suggestions for future improvements.
The paper concludes by emphasizing the need for more high-quality, diverse, and realistic datasets to further advance the development of LVLMs in the medical domain.