21 Feb 2024 | Minh-Hao Van, Prateek Verma, Xintao Wu
This paper explores the application of large visual language models (VLMs) in medical imaging analysis, focusing on their zero-shot and few-shot robustness. The authors evaluate five VLMs—BiomedCLIP, OpenCLIP, OpenFlamingo, LLaVA, and ChatGPT-4—on three medical imaging datasets: Brain Tumor Detection (BTD), Acute Lymphoblastic Leukemia Image Database (ALL-IDB2), and COVID Chest X-ray (CX-Ray). The study aims to demonstrate the effectiveness of VLMs in analyzing biomedical images such as brain MRIs, microscopic images of blood cells, and chest X-rays without the need for retraining or fine-tuning.
The authors find that while CNN-based methods achieve the best performance on all datasets, VLMs still show impressive results, particularly in terms of efficiency and ease of use. BiomedCLIP, ChatGPT, and OpenFlamingo perform well on BTD, ALL-IDB2, and CX-Ray, respectively. The study also highlights the importance of prompt engineering to optimize VLMs for medical imaging tasks, showing that few-shot prompting can improve accuracy in most cases.
The paper concludes by discussing the limitations of VLMs in medical applications, including data quality, safety, and privacy concerns. Despite these limitations, the authors suggest that VLMs can serve as valuable chat assistants for pre-diagnosis and provide insights for future research in medical imaging analysis.This paper explores the application of large visual language models (VLMs) in medical imaging analysis, focusing on their zero-shot and few-shot robustness. The authors evaluate five VLMs—BiomedCLIP, OpenCLIP, OpenFlamingo, LLaVA, and ChatGPT-4—on three medical imaging datasets: Brain Tumor Detection (BTD), Acute Lymphoblastic Leukemia Image Database (ALL-IDB2), and COVID Chest X-ray (CX-Ray). The study aims to demonstrate the effectiveness of VLMs in analyzing biomedical images such as brain MRIs, microscopic images of blood cells, and chest X-rays without the need for retraining or fine-tuning.
The authors find that while CNN-based methods achieve the best performance on all datasets, VLMs still show impressive results, particularly in terms of efficiency and ease of use. BiomedCLIP, ChatGPT, and OpenFlamingo perform well on BTD, ALL-IDB2, and CX-Ray, respectively. The study also highlights the importance of prompt engineering to optimize VLMs for medical imaging tasks, showing that few-shot prompting can improve accuracy in most cases.
The paper concludes by discussing the limitations of VLMs in medical applications, including data quality, safety, and privacy concerns. Despite these limitations, the authors suggest that VLMs can serve as valuable chat assistants for pre-diagnosis and provide insights for future research in medical imaging analysis.