miniGPT-Med: Large Language Model as a General Interface for Radiology Diagnosis

miniGPT-Med: Large Language Model as a General Interface for Radiology Diagnosis

4 Jul 2024 | Asma Alkhaldi, Raneem Alnajim, Layan Alabdullatef, Rawan Alyahya, Jun Chen, Deyao Zhu, Ahmed Alsinan, Mohamed Elhoseiny
MiniGPT-Med is a vision-language model designed for medical applications, particularly in radiology diagnosis. It is derived from large-scale language models and is capable of performing tasks such as medical report generation, visual question answering (VQA), and disease identification in medical imaging. The model integrates both image and textual clinical data, improving diagnostic accuracy. It has been evaluated on various benchmarks, achieving state-of-the-art performance in medical report generation, surpassing previous models by 19% in BERT-Sim and 5.2% in CheXbert-Sim. MiniGPT-Med is also effective in disease detection and VQA tasks, demonstrating strong performance across a range of medical vision-language tasks. The model is trained using a comprehensive dataset of radiological images, including X-rays, CT scans, and MRIs, and has been tested on multiple datasets, including MIMIC, NLST, SLAKE, RSNA, and RadVQA. The model's performance was evaluated using metrics such as BERTsim, CheXbertSim, and Intersection over Union (IoU), with results showing that MiniGPT-Med outperforms both specialized and generalist models in most tasks. Radiologists evaluated the model's generated reports and found that 76% were of high quality. However, the model has limitations, including the need for more diverse and high-quality training data and the potential for hallucination in medical reports. Despite these challenges, MiniGPT-Med shows significant potential as a general interface for radiology diagnoses, enhancing diagnostic efficiency across various medical imaging applications. The model and code are publicly available on GitHub.MiniGPT-Med is a vision-language model designed for medical applications, particularly in radiology diagnosis. It is derived from large-scale language models and is capable of performing tasks such as medical report generation, visual question answering (VQA), and disease identification in medical imaging. The model integrates both image and textual clinical data, improving diagnostic accuracy. It has been evaluated on various benchmarks, achieving state-of-the-art performance in medical report generation, surpassing previous models by 19% in BERT-Sim and 5.2% in CheXbert-Sim. MiniGPT-Med is also effective in disease detection and VQA tasks, demonstrating strong performance across a range of medical vision-language tasks. The model is trained using a comprehensive dataset of radiological images, including X-rays, CT scans, and MRIs, and has been tested on multiple datasets, including MIMIC, NLST, SLAKE, RSNA, and RadVQA. The model's performance was evaluated using metrics such as BERTsim, CheXbertSim, and Intersection over Union (IoU), with results showing that MiniGPT-Med outperforms both specialized and generalist models in most tasks. Radiologists evaluated the model's generated reports and found that 76% were of high quality. However, the model has limitations, including the need for more diverse and high-quality training data and the potential for hallucination in medical reports. Despite these challenges, MiniGPT-Med shows significant potential as a general interface for radiology diagnoses, enhancing diagnostic efficiency across various medical imaging applications. The model and code are publicly available on GitHub.
Reach us at info@study.space
Understanding MiniGPT-Med%3A Large Language Model as a General Interface for Radiology Diagnosis