Advancing Multimodal Medical Capabilities of Gemini

Advancing Multimodal Medical Capabilities of Gemini

2024-5-7 | Google Research and Google DeepMind
Google Research and Google DeepMind have developed the Med-Gemini family of models, which are optimized for medical use through fine-tuning with 2D and 3D radiology, histopathology, ophthalmology, dermatology, and genomic data. Med-Gemini-2D sets a new standard for AI-based chest X-ray (CXR) report generation, outperforming previous models by 1% and 12% on two datasets. Med-Gemini-3D is the first large multimodal model-based report generation for 3D CT volumes, with 53% of AI reports considered clinically acceptable. Med-Gemini-2D also excels in CXR visual question answering (VQA), classification, and radiology VQA, surpassing previous models on 17 of 20 tasks. In histopathology, ophthalmology, and dermatology image classification, Med-Gemini-2D surpasses baselines on 18 out of 20 tasks. Med-Gemini-Polygenic outperforms standard linear polygenic risk score-based approaches for disease risk prediction and generalizes to genetically correlated diseases. The Med-Gemini family of models is built upon Gemini's multimodal capabilities and is fine-tuned on a dataset of 7 million samples from 3.7 million medical images and cases. The models are evaluated on a range of clinically relevant tasks, including 2D and 3D radiology images, histopathology patches, ophthalmology images, dermatology images, and genetic risk scoring. The evaluation includes both open benchmark datasets and curated datasets, with a focus on expert human evaluations for tasks like CXR and CT report generation and open visual question answering (VQA) questions from VQA-Rad. The models were trained on various datasets, including MIMIC-CXR, PAD-UFES-20, NLST, Slake-VQA, PathVQA, VQA-Med, UK Biobank, PMC-OA, and private datasets like histopathology patches, fundus images, and CT volumes. The models were fine-tuned using a combination of captioning and VQA tasks, with instruction-tuning to enhance instruction-following capabilities. The Med-Gemini models demonstrate strong performance across multiple medical tasks, including medical image classification, VQA, report generation, and genomic risk prediction. Med-Gemini-2D outperforms Gemini Ultra on most labels in the in-distribution MIMIC-CXR dataset, and excels in specific tasks like cardiomegaly detection on CheXpert. Med-Gemini-2D also performs well in histopathology image classification, skin lesion classification, and fundus image classification, achieving competitive results with specialized models like Derm Foundation. The models show promise in handling a wide range of medical tasks, highlighting the potential of multimodal foundation models in the medical fieldGoogle Research and Google DeepMind have developed the Med-Gemini family of models, which are optimized for medical use through fine-tuning with 2D and 3D radiology, histopathology, ophthalmology, dermatology, and genomic data. Med-Gemini-2D sets a new standard for AI-based chest X-ray (CXR) report generation, outperforming previous models by 1% and 12% on two datasets. Med-Gemini-3D is the first large multimodal model-based report generation for 3D CT volumes, with 53% of AI reports considered clinically acceptable. Med-Gemini-2D also excels in CXR visual question answering (VQA), classification, and radiology VQA, surpassing previous models on 17 of 20 tasks. In histopathology, ophthalmology, and dermatology image classification, Med-Gemini-2D surpasses baselines on 18 out of 20 tasks. Med-Gemini-Polygenic outperforms standard linear polygenic risk score-based approaches for disease risk prediction and generalizes to genetically correlated diseases. The Med-Gemini family of models is built upon Gemini's multimodal capabilities and is fine-tuned on a dataset of 7 million samples from 3.7 million medical images and cases. The models are evaluated on a range of clinically relevant tasks, including 2D and 3D radiology images, histopathology patches, ophthalmology images, dermatology images, and genetic risk scoring. The evaluation includes both open benchmark datasets and curated datasets, with a focus on expert human evaluations for tasks like CXR and CT report generation and open visual question answering (VQA) questions from VQA-Rad. The models were trained on various datasets, including MIMIC-CXR, PAD-UFES-20, NLST, Slake-VQA, PathVQA, VQA-Med, UK Biobank, PMC-OA, and private datasets like histopathology patches, fundus images, and CT volumes. The models were fine-tuned using a combination of captioning and VQA tasks, with instruction-tuning to enhance instruction-following capabilities. The Med-Gemini models demonstrate strong performance across multiple medical tasks, including medical image classification, VQA, report generation, and genomic risk prediction. Med-Gemini-2D outperforms Gemini Ultra on most labels in the in-distribution MIMIC-CXR dataset, and excels in specific tasks like cardiomegaly detection on CheXpert. Med-Gemini-2D also performs well in histopathology image classification, skin lesion classification, and fundus image classification, achieving competitive results with specialized models like Derm Foundation. The models show promise in handling a wide range of medical tasks, highlighting the potential of multimodal foundation models in the medical field
Reach us at info@study.space
Understanding Advancing Multimodal Medical Capabilities of Gemini