1 May 2024 | Khaled Saab, Tao Tu, Wei-Hung Weng, Ryutaro Tanno, David Stutz, Ellery Wulczyn, Fan Zhang, Tim Strother, Chunjong Park, Elahe Vedadi, Juanma Zambrano Chaves, Szu-Yeu Hu, Mike Schaeckermann, Aishwarya Kamath, Yong Cheng, David G.T. Barrett, Cathy Cheung, Basil Mustafa, Anil Palepu, Daniel McDuff, Le Hou, Tomer Golany, Luyang Liu, Jean-baptiste Alayrac, Neil Houlsby, Nenad Tomasev, Jan Freyberg, Charles Lau, Jonas Kemp, Jeremy Lai, Shekoofeh Azizi, Kimberly Kanada, SiWai Man, Kavita Kulkarni, Ruoxi Sun, Sihamak Shakeri, Luheng He, Ben Caine, Albert Webson, Natasha Latysheva, Melvin Johnson, Philip Mansfield, Jian Lu, Ehud Rivlin, Jesper Anderson, Bradley Green, Renee Wong, Jonathan Krause, Jonathon Shlens, Ewa Dominowska, S. M. Ali Eslami, Katherine Chou, Claire Cui, Oriol Vinyals, Koray Kavukcuoglu, James Manyika, Jeff Dean, Demis Hassabis, Yossi Matias, Dale Webster, Joelle Barra, Greg Corrado, Christopher Senturs, S. Sara Mahdavi, Juraj Gottweis, Alan Karthikesalingam, Vivek Natarajan
The paper introduces Med-Gemini, a family of highly capable multimodal models specialized for medical applications. Building on the strengths of Gemini models, Med-Gemini enhances clinical reasoning, multimodal understanding, and long-context processing capabilities. The models are fine-tuned to integrate web search for up-to-date information and can be customized for novel medical modalities using custom encoders. Med-Gemini achieves state-of-the-art (SoTA) performance on 10 out of 14 medical benchmarks, surpassing GPT-4 in direct comparisons. Notably, it outperforms prior models on the MedQA (USMLE) benchmark with an accuracy of 91.1%, a significant improvement of 4.6% over Med-PaLM 2. The models also demonstrate real-world utility by outperforming human experts in tasks such as medical text summarization and referral letter generation. Med-Gemini's long-context capabilities are further validated through tasks like needle-in-a-haystack retrieval from de-identified health records and medical video question answering. The paper highlights the potential of Med-Gemini in various medical applications, emphasizing the need for rigorous validation before real-world deployment.The paper introduces Med-Gemini, a family of highly capable multimodal models specialized for medical applications. Building on the strengths of Gemini models, Med-Gemini enhances clinical reasoning, multimodal understanding, and long-context processing capabilities. The models are fine-tuned to integrate web search for up-to-date information and can be customized for novel medical modalities using custom encoders. Med-Gemini achieves state-of-the-art (SoTA) performance on 10 out of 14 medical benchmarks, surpassing GPT-4 in direct comparisons. Notably, it outperforms prior models on the MedQA (USMLE) benchmark with an accuracy of 91.1%, a significant improvement of 4.6% over Med-PaLM 2. The models also demonstrate real-world utility by outperforming human experts in tasks such as medical text summarization and referral letter generation. Med-Gemini's long-context capabilities are further validated through tasks like needle-in-a-haystack retrieval from de-identified health records and medical video question answering. The paper highlights the potential of Med-Gemini in various medical applications, emphasizing the need for rigorous validation before real-world deployment.