Med42 - Evaluating Fine-Tuning Strategies for Medical LLMs: Full-Parameter vs. Parameter-Efficient Approaches

Med42 - Evaluating Fine-Tuning Strategies for Medical LLMs: Full-Parameter vs. Parameter-Efficient Approaches

23 Apr 2024 | Clément Christophe, Praveen K Kanithi, Prateek Munjal, Tathagata Raha, Nasir Hayat, Ronnie Rajan, Ahmed Al-Mahrooqi, Avani Gupta, Muhammad Umar Salman, Gurpreet Gosal, Bhargav Kanakiya, Charles Chen, Natalia Vassilieva, Boulbaba Ben Amor, Marco AF Pimentel, Shadab Khan
This study evaluates two fine-tuning strategies for medical large language models (LLMs): full-parameter fine-tuning and parameter-efficient tuning. The research develops and refines a series of LLMs based on the Llama-2 architecture to enhance medical knowledge retrieval, reasoning, and question-answering capabilities. The models are tested on various medical benchmarks, with Med42 achieving 72% accuracy on the US Medical Licensing Examination (USMLE) datasets, setting a new standard for publicly available medical LLMs. The study compares full-parameter fine-tuning, which adjusts all model parameters and requires significant computational resources, with parameter-efficient methods like LoRA (Low-Rank Adaptation), which modifies only a subset of parameters. The results show that full-parameter fine-tuning outperforms LoRA in most medical benchmarks, although LoRA achieves results close to full-parameter tuning. The study also addresses the issue of test set contamination, implementing a decontamination pipeline to ensure the integrity of evaluation results. The research highlights the importance of comprehensive fine-tuning strategies in improving the performance of language models in medical tasks. It also emphasizes the need for large, well-structured instruction-tuning datasets to enhance the effectiveness of open-source LLMs. The study contributes to the advancement of AI-driven healthcare applications by identifying effective and efficient fine-tuning methods for medical LLMs. The model Med42 is released on HuggingFace for further research and development.This study evaluates two fine-tuning strategies for medical large language models (LLMs): full-parameter fine-tuning and parameter-efficient tuning. The research develops and refines a series of LLMs based on the Llama-2 architecture to enhance medical knowledge retrieval, reasoning, and question-answering capabilities. The models are tested on various medical benchmarks, with Med42 achieving 72% accuracy on the US Medical Licensing Examination (USMLE) datasets, setting a new standard for publicly available medical LLMs. The study compares full-parameter fine-tuning, which adjusts all model parameters and requires significant computational resources, with parameter-efficient methods like LoRA (Low-Rank Adaptation), which modifies only a subset of parameters. The results show that full-parameter fine-tuning outperforms LoRA in most medical benchmarks, although LoRA achieves results close to full-parameter tuning. The study also addresses the issue of test set contamination, implementing a decontamination pipeline to ensure the integrity of evaluation results. The research highlights the importance of comprehensive fine-tuning strategies in improving the performance of language models in medical tasks. It also emphasizes the need for large, well-structured instruction-tuning datasets to enhance the effectiveness of open-source LLMs. The study contributes to the advancement of AI-driven healthcare applications by identifying effective and efficient fine-tuning methods for medical LLMs. The model Med42 is released on HuggingFace for further research and development.
Reach us at info@study.space