2024 | Clément Christophe, Praveen K Kanithi, Prateek Munjal, Tathagata Raha, Nasir Hayat, Ronnie Rajan, Ahmed Al-Mahrooqi, Avani Gupta, Muhammad Umar Salman, Gurpreet Gosai, Bhargav Kanakiya, Charles Chen, Natalia Vassilieva, Boulbaba Ben Amor, Marco AF Pimentel, Shadab Khan
This study evaluates two primary fine-tuning methodologies—full-parameter fine-tuning and parameter-efficient tuning—in the context of medical Large Language Models (LLMs). The authors developed and refined a series of LLMs based on the Llama-2 architecture to enhance medical knowledge retrieval, reasoning, and question answering capabilities. They conducted experiments to systematically assess the effectiveness of these tuning strategies across various well-known medical benchmarks. Notably, their medical LLM, Med42, achieved an accuracy level of 72% on USMLE datasets, setting a new standard for open-source medical LLMs. The study aims to identify the most effective and efficient method for fine-tuning LLMs in the medical domain, contributing to the advancement of AI-driven healthcare applications. The research highlights that full-parameter fine-tuning generally outperforms parameter-efficient tuning, but parameter-efficient methods like LoRA yield results close to full-parameter tuning, making them viable alternatives in resource-constrained scenarios. The study also addresses the issue of test set contamination by implementing a decontamination pipeline, ensuring the robustness and integrity of the evaluation results. Overall, the findings underscore the importance of comprehensive fine-tuning strategies and the potential of parameter-efficient methods in enhancing the performance of LLMs for medical applications.This study evaluates two primary fine-tuning methodologies—full-parameter fine-tuning and parameter-efficient tuning—in the context of medical Large Language Models (LLMs). The authors developed and refined a series of LLMs based on the Llama-2 architecture to enhance medical knowledge retrieval, reasoning, and question answering capabilities. They conducted experiments to systematically assess the effectiveness of these tuning strategies across various well-known medical benchmarks. Notably, their medical LLM, Med42, achieved an accuracy level of 72% on USMLE datasets, setting a new standard for open-source medical LLMs. The study aims to identify the most effective and efficient method for fine-tuning LLMs in the medical domain, contributing to the advancement of AI-driven healthcare applications. The research highlights that full-parameter fine-tuning generally outperforms parameter-efficient tuning, but parameter-efficient methods like LoRA yield results close to full-parameter tuning, making them viable alternatives in resource-constrained scenarios. The study also addresses the issue of test set contamination by implementing a decontamination pipeline, ensuring the robustness and integrity of the evaluation results. Overall, the findings underscore the importance of comprehensive fine-tuning strategies and the potential of parameter-efficient methods in enhancing the performance of LLMs for medical applications.