BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains

BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains

17 Jul 2024 | Yanis Labrak, Adrien Bazoge, Emmanuel Morin, Pierre-Antoine Gourraud, Mickaël Rouvier, Richard Dufour
BioMistral is an open-source large language model (LLM) specifically designed for the biomedical domain, built upon the Mistral foundation model and further pre-trained on PubMed Central. The model is evaluated on a benchmark of 10 established medical question-answering (QA) tasks in English, and lightweight versions are created through quantization and model merging. BioMistral outperforms existing open-source medical models and competes with proprietary ones. The benchmark is translated into 7 other languages, marking the first large-scale multilingual evaluation of LLMs in the medical domain. All datasets, benchmarks, scripts, and models are freely available on HuggingFace and GitHub under the Apache 2.0 license. The model is trained on a corpus of medical research papers from PubMed Central, with a focus on the Commercial Use Allowed subset. Preprocessing optimizes the dataset for training efficiency, and the model is fine-tuned using AdamW optimizer with a cosine learning rate scheduler. The model's architecture is based on the standard transformer, including features like Grouped-Query Attention and Sliding Window Attention. The model is trained on 3 billion tokens, with a focus on English documents, and includes non-English texts to ensure a diverse training set. Model merging techniques, such as TIES, DARE, and SLERP, are explored to enhance performance and generalization. Quantization techniques like AWQ and BitsandBytes are also investigated to reduce model size and improve efficiency. The model is evaluated on various medical QA tasks, including MedQA, MedMCQA, PubMedQA, and MMLU, showing strong performance across these tasks. The model's performance is also assessed in multiple languages, demonstrating its multilingual capabilities. The model is evaluated using supervised fine-tuning (SFT) and few-shot learning, with results showing that BioMistral outperforms other models in most tasks. The model's calibration and truthfulness are also assessed, with results indicating that the model's confidence levels are generally reliable. However, the model's performance in non-English contexts is affected by the limited availability of training data in those languages. BioMistral is a state-of-the-art model for the biomedical domain, demonstrating strong performance across various tasks and languages. The model is available for research and development, and further improvements are planned to enhance its multilingual and chat capabilities. The model's performance is also evaluated in terms of calibration and reliability, with results indicating that the model's confidence levels are generally reliable. However, the model's performance in non-English contexts is affected by the limited availability of training data in those languages.BioMistral is an open-source large language model (LLM) specifically designed for the biomedical domain, built upon the Mistral foundation model and further pre-trained on PubMed Central. The model is evaluated on a benchmark of 10 established medical question-answering (QA) tasks in English, and lightweight versions are created through quantization and model merging. BioMistral outperforms existing open-source medical models and competes with proprietary ones. The benchmark is translated into 7 other languages, marking the first large-scale multilingual evaluation of LLMs in the medical domain. All datasets, benchmarks, scripts, and models are freely available on HuggingFace and GitHub under the Apache 2.0 license. The model is trained on a corpus of medical research papers from PubMed Central, with a focus on the Commercial Use Allowed subset. Preprocessing optimizes the dataset for training efficiency, and the model is fine-tuned using AdamW optimizer with a cosine learning rate scheduler. The model's architecture is based on the standard transformer, including features like Grouped-Query Attention and Sliding Window Attention. The model is trained on 3 billion tokens, with a focus on English documents, and includes non-English texts to ensure a diverse training set. Model merging techniques, such as TIES, DARE, and SLERP, are explored to enhance performance and generalization. Quantization techniques like AWQ and BitsandBytes are also investigated to reduce model size and improve efficiency. The model is evaluated on various medical QA tasks, including MedQA, MedMCQA, PubMedQA, and MMLU, showing strong performance across these tasks. The model's performance is also assessed in multiple languages, demonstrating its multilingual capabilities. The model is evaluated using supervised fine-tuning (SFT) and few-shot learning, with results showing that BioMistral outperforms other models in most tasks. The model's calibration and truthfulness are also assessed, with results indicating that the model's confidence levels are generally reliable. However, the model's performance in non-English contexts is affected by the limited availability of training data in those languages. BioMistral is a state-of-the-art model for the biomedical domain, demonstrating strong performance across various tasks and languages. The model is available for research and development, and further improvements are planned to enhance its multilingual and chat capabilities. The model's performance is also evaluated in terms of calibration and reliability, with results indicating that the model's confidence levels are generally reliable. However, the model's performance in non-English contexts is affected by the limited availability of training data in those languages.
Reach us at info@study.space