Apollo: A Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People

Apollo: A Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People

12 Oct 2024 | Xidong Wang, Nuo Chen, Junying Chen, Yidong Wang, Guorui Zhen, Chunxian Zhang, Xiangbo Wu, Yan Hu, Anningzhe Gao, Xiang Wan, Haizhou Li, Benyou Wang
Apollo is a lightweight multilingual medical large language model (LLM) designed to democratize medical AI for 6.1 billion people. The model is built using the ApolloCorpora multilingual medical dataset and the XMedBench benchmark. ApolloCorpora includes data from six major languages: English, Chinese, Hindi, Spanish, French, and Arabic, covering a wide range of medical texts, including books, clinical guidelines, encyclopedias, papers, and online forums. The dataset contains 2.5 billion tokens and is carefully curated to ensure quality and avoid data leakage. Apollo is a series of lightweight LLMs trained on ApolloCorpora, ranging from 2B to 7B parameters. These models are designed to enhance the multilingual medical capabilities of larger models through proxy-tuning, without requiring direct fine-tuning on sensitive medical data. Apollo-7B is the state-of-the-art multilingual medical LLM up to 70B parameters. The models achieve strong performance in the XMedBench benchmark, outperforming other models of similar size. The XMedBench benchmark evaluates the medical knowledge of models across multiple languages, including English, Chinese, Spanish, French, Arabic, and Hindi. The benchmark includes multiple-choice questions and is designed to assess the models' ability to understand and generate medical information in different languages. The results show that Apollo models achieve the best performance among models of equivalent size. The research also explores the benefits of multilingual training for medical LLMs, finding that combining multilingual data significantly improves the performance of medical LLMs. However, there are potential risks associated with multilingual training, such as conflicts between different languages' medical knowledge. The study suggests that these conflicts should be considered as areas for future research. Apollo is also used for proxy-tuning to improve the multilingual medical capabilities of larger models without direct fine-tuning. This approach allows for the efficient use of smaller models to enhance the performance of larger models, making medical AI more accessible to a broader population. The open-sourcing of training corpora, code, model weights, and evaluation benchmarks ensures that the research community can benefit from the advancements made in medical AI.Apollo is a lightweight multilingual medical large language model (LLM) designed to democratize medical AI for 6.1 billion people. The model is built using the ApolloCorpora multilingual medical dataset and the XMedBench benchmark. ApolloCorpora includes data from six major languages: English, Chinese, Hindi, Spanish, French, and Arabic, covering a wide range of medical texts, including books, clinical guidelines, encyclopedias, papers, and online forums. The dataset contains 2.5 billion tokens and is carefully curated to ensure quality and avoid data leakage. Apollo is a series of lightweight LLMs trained on ApolloCorpora, ranging from 2B to 7B parameters. These models are designed to enhance the multilingual medical capabilities of larger models through proxy-tuning, without requiring direct fine-tuning on sensitive medical data. Apollo-7B is the state-of-the-art multilingual medical LLM up to 70B parameters. The models achieve strong performance in the XMedBench benchmark, outperforming other models of similar size. The XMedBench benchmark evaluates the medical knowledge of models across multiple languages, including English, Chinese, Spanish, French, Arabic, and Hindi. The benchmark includes multiple-choice questions and is designed to assess the models' ability to understand and generate medical information in different languages. The results show that Apollo models achieve the best performance among models of equivalent size. The research also explores the benefits of multilingual training for medical LLMs, finding that combining multilingual data significantly improves the performance of medical LLMs. However, there are potential risks associated with multilingual training, such as conflicts between different languages' medical knowledge. The study suggests that these conflicts should be considered as areas for future research. Apollo is also used for proxy-tuning to improve the multilingual medical capabilities of larger models without direct fine-tuning. This approach allows for the efficient use of smaller models to enhance the performance of larger models, making medical AI more accessible to a broader population. The open-sourcing of training corpora, code, model weights, and evaluation benchmarks ensures that the research community can benefit from the advancements made in medical AI.
Reach us at info@study.space