Understanding Advancing Multimodal Medical Capabilities of Gemini

The report from Google Research and Google DeepMind details the development and evaluation of Med-Gemini, a family of multimodal AI models optimized for medical tasks. Med-Gemini builds upon the capabilities of Gemini, a large multimodal model, and is fine-tuned for specific medical data types such as 2D and 3D radiology, histopathology, ophthalmology, dermatology, and genomics. Key contributions include: 1. **Model Development**: Med-Gemini-2D sets a new standard for AI-based chest X-ray (CXR) report generation, outperforming previous models by 1% and 12% in two datasets. Med-Gemini-3D demonstrates the first large multimodal model-based report generation for 3D CT volumes, with 53% of AI reports considered clinically acceptable. Med-Gemini-2D also excels in CXR visual question answering (VQA) and classification, and approaches task-specific model performance in histopathology, ophthalmology, and dermatology image classification. 2. **Genomic Risk Prediction**: Med-Gemini-Polygenic outperforms standard linear polygenic risk score (PRS) approaches for disease risk prediction and generalizes to genetically correlated diseases. 3. **Dataset and Evaluation**: The evaluation includes 22 datasets across five tasks and six medical image modalities, with a focus on clinically relevant benchmarks. The datasets cover a wide range of medical tasks, including image classification, VQA, report generation, and genomic risk prediction. Expert human evaluations and automated metrics are used to assess performance. 4. **Model Architecture and Training**: Med-Gemini is trained on large-scale Google TPuV4 accelerator pods, leveraging the video understanding capabilities of Gemini to handle 3D medical data. The model is fine-tuned using captioning and VQA tasks, with an additional instruction-tuning phase to enhance instruction-following capabilities. 5. **Performance Highlights**: - **Chest X-ray Classification**: Med-Gemini-2D outperforms Gemini Ultra in most labels on the in-distribution MIMIC-CXR dataset but shows varied performance on out-of-distribution datasets. - **Histopathology Image Classification**: Med-Gemini-2D achieves competitive performance in skin lesion classification and histopathology patch classification, approaching the performance of specialized models. - **Ophthalmology Image Classification**: Med-Gemini-2D demonstrates significant improvement over Gemini Ultra in ophthalmology classification tasks. Overall, the report highlights the potential of Med-Gemini in advancing multimodal medical capabilities, with promising results in various clinical tasks.The report from Google Research and Google DeepMind details the development and evaluation of Med-Gemini, a family of multimodal AI models optimized for medical tasks. Med-Gemini builds upon the capabilities of Gemini, a large multimodal model, and is fine-tuned for specific medical data types such as 2D and 3D radiology, histopathology, ophthalmology, dermatology, and genomics. Key contributions include: 1. **Model Development**: Med-Gemini-2D sets a new standard for AI-based chest X-ray (CXR) report generation, outperforming previous models by 1% and 12% in two datasets. Med-Gemini-3D demonstrates the first large multimodal model-based report generation for 3D CT volumes, with 53% of AI reports considered clinically acceptable. Med-Gemini-2D also excels in CXR visual question answering (VQA) and classification, and approaches task-specific model performance in histopathology, ophthalmology, and dermatology image classification. 2. **Genomic Risk Prediction**: Med-Gemini-Polygenic outperforms standard linear polygenic risk score (PRS) approaches for disease risk prediction and generalizes to genetically correlated diseases. 3. **Dataset and Evaluation**: The evaluation includes 22 datasets across five tasks and six medical image modalities, with a focus on clinically relevant benchmarks. The datasets cover a wide range of medical tasks, including image classification, VQA, report generation, and genomic risk prediction. Expert human evaluations and automated metrics are used to assess performance. 4. **Model Architecture and Training**: Med-Gemini is trained on large-scale Google TPuV4 accelerator pods, leveraging the video understanding capabilities of Gemini to handle 3D medical data. The model is fine-tuned using captioning and VQA tasks, with an additional instruction-tuning phase to enhance instruction-following capabilities. 5. **Performance Highlights**: - **Chest X-ray Classification**: Med-Gemini-2D outperforms Gemini Ultra in most labels on the in-distribution MIMIC-CXR dataset but shows varied performance on out-of-distribution datasets. - **Histopathology Image Classification**: Med-Gemini-2D achieves competitive performance in skin lesion classification and histopathology patch classification, approaching the performance of specialized models. - **Ophthalmology Image Classification**: Med-Gemini-2D demonstrates significant improvement over Gemini Ultra in ophthalmology classification tasks. Overall, the report highlights the potential of Med-Gemini in advancing multimodal medical capabilities, with promising results in various clinical tasks.

Advancing Multimodal Medical Capabilities of Gemini

2024-5-7 | Google Research and Google DeepMind