4 Jul 2024 | Asma Alkhaldi, Raneem Alnajim, Layan Alabdullatef, Rawan Alyahya, Jun Chen, Deyao Zhu, Ahmed Alsinan, Mohamed Elhoseiny
MiniGPT-Med is a specialized multi-modal model designed for radiology diagnosis applications. It leverages large-scale language models to handle various medical vision-language tasks, including medical report generation, disease detection, and visual question answering (VQA). The model demonstrates remarkable versatility across different imaging modalities such as X-rays, CT scans, and MRIs. It integrates both image and textual clinical data to enhance diagnostic accuracy. Empirical assessments confirm MiniGPT-Med's superior performance in disease grounding, medical report generation, and VQA benchmarks, achieving state-of-the-art results in medical report generation with a 19% accuracy improvement over the previous best model. The model's architecture includes a visual backbone, a linear projection layer, and a large language model, with task-specific tokens to enhance its ability to perform diverse tasks. Extensive experiments using comprehensive datasets show that MiniGPT-Med outperforms both specialized and generalist models, making it a promising tool for radiology diagnostics. However, challenges such as limited training datasets and hallucination issues require further improvements.MiniGPT-Med is a specialized multi-modal model designed for radiology diagnosis applications. It leverages large-scale language models to handle various medical vision-language tasks, including medical report generation, disease detection, and visual question answering (VQA). The model demonstrates remarkable versatility across different imaging modalities such as X-rays, CT scans, and MRIs. It integrates both image and textual clinical data to enhance diagnostic accuracy. Empirical assessments confirm MiniGPT-Med's superior performance in disease grounding, medical report generation, and VQA benchmarks, achieving state-of-the-art results in medical report generation with a 19% accuracy improvement over the previous best model. The model's architecture includes a visual backbone, a linear projection layer, and a large language model, with task-specific tokens to enhance its ability to perform diverse tasks. Extensive experiments using comprehensive datasets show that MiniGPT-Med outperforms both specialized and generalist models, making it a promising tool for radiology diagnostics. However, challenges such as limited training datasets and hallucination issues require further improvements.