In-context learning enables multimodal large language models to classify cancer pathology images

In-context learning enables multimodal large language models to classify cancer pathology images

12 Mar 2024 | Dyke Ferber, Georg Wölfllein, Isabella C. Wiest, Marta Liger, Srividhya Sainath, Narmin Ghaffari Laleh, Omar S.M. El Nahhas, Gustav Müller-Franzes, Dirk Jäger, Daniel Truhn, Jakob Nikolas Kather
In-context learning enables multimodal large language models to classify cancer pathology images. This study evaluates the performance of GPT-4V, a vision language model, in three histopathology tasks: colorectal cancer tissue subtype classification, colon polyp subtyping, and breast tumor detection in lymph node sections. The results show that in-context learning can match or outperform specialized neural networks with minimal sample input. The study demonstrates that large vision language models trained on non-domain specific data can be applied directly to medical image processing tasks in histopathology, democratizing access to generalist AI models for medical experts without technical background, especially in areas with scarce annotated data. The study compares GPT-4V with four image classifiers (ResNet-18, ResNet15, Tiny-Vit, and Small-Vit) on three histopathology benchmarking datasets: CRC100K, PatchCamelyon, and MHIST. GPT-4V's performance was improved through in-context learning, achieving results comparable to specialist computer vision models. The study also shows that kNN-based sampling improves accuracy over random sampling, especially with more few-shot examples. In a multilabel setting, GPT-4V demonstrated notable improvements with more few-shot samples, although it did not reach the performance levels of specialist models in all cases. The study highlights the potential of in-context learning for medical image classification, showing that vision-language models can achieve performance on par with retrained vision classifiers. The results suggest that in-context learning can be a viable alternative to traditional deep learning models, particularly in scenarios where annotated data is scarce. The study also emphasizes the importance of multimodal learning in medical image analysis, as vision-language models can integrate textual and visual information to improve classification accuracy. The findings indicate that in-context learning with GPT-4V can be a powerful tool for medical image classification, offering a more efficient and accessible approach to AI in healthcare.In-context learning enables multimodal large language models to classify cancer pathology images. This study evaluates the performance of GPT-4V, a vision language model, in three histopathology tasks: colorectal cancer tissue subtype classification, colon polyp subtyping, and breast tumor detection in lymph node sections. The results show that in-context learning can match or outperform specialized neural networks with minimal sample input. The study demonstrates that large vision language models trained on non-domain specific data can be applied directly to medical image processing tasks in histopathology, democratizing access to generalist AI models for medical experts without technical background, especially in areas with scarce annotated data. The study compares GPT-4V with four image classifiers (ResNet-18, ResNet15, Tiny-Vit, and Small-Vit) on three histopathology benchmarking datasets: CRC100K, PatchCamelyon, and MHIST. GPT-4V's performance was improved through in-context learning, achieving results comparable to specialist computer vision models. The study also shows that kNN-based sampling improves accuracy over random sampling, especially with more few-shot examples. In a multilabel setting, GPT-4V demonstrated notable improvements with more few-shot samples, although it did not reach the performance levels of specialist models in all cases. The study highlights the potential of in-context learning for medical image classification, showing that vision-language models can achieve performance on par with retrained vision classifiers. The results suggest that in-context learning can be a viable alternative to traditional deep learning models, particularly in scenarios where annotated data is scarce. The study also emphasizes the importance of multimodal learning in medical image analysis, as vision-language models can integrate textual and visual information to improve classification accuracy. The findings indicate that in-context learning with GPT-4V can be a powerful tool for medical image classification, offering a more efficient and accessible approach to AI in healthcare.
Reach us at info@study.space