Understanding Apollo%3A A Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People

The paper "Apollo: A Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People" by Xidong Wang et al. aims to develop multilingual medical large language models (LLMs) to extend the reach of medical AI advancements to a broader population, particularly in areas with limited medical resources. The authors focus on six widely spoken languages: English, Chinese, Hindi, Spanish, French, and Arabic, which collectively cover a global population of 6.1 billion people. They create the ApolloCorpora multilingual medical dataset and the XMedBench benchmark to evaluate the models' performance. Key contributions of the paper include: 1. **ApolloCorpora**: A high-quality multilingual medical dataset covering rich local characteristics. 2. **Apollo Models**: Multilingual LLMs (0.5B, 1.8B, 2B, 6B, and 7B parameters) achieving state-of-the-art performance in various languages. 3. **Proxy-Tuning**: A method to improve larger general LLMs without fine-tuning, enhancing their multilingual medical capabilities. 4. **XMedBench**: A multilingual medical evaluation benchmark. The paper also explores the benefits and challenges of multilingual training in medical LLMs, highlighting the importance of preserving local medical knowledge and practices while leveraging the collective benefits of a multilingual approach. The Apollo models are designed to be lightweight and efficient, making them suitable for real-time inference on various hardware, thereby democratizing medical AI to a broader community.The paper "Apollo: A Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People" by Xidong Wang et al. aims to develop multilingual medical large language models (LLMs) to extend the reach of medical AI advancements to a broader population, particularly in areas with limited medical resources. The authors focus on six widely spoken languages: English, Chinese, Hindi, Spanish, French, and Arabic, which collectively cover a global population of 6.1 billion people. They create the ApolloCorpora multilingual medical dataset and the XMedBench benchmark to evaluate the models' performance. Key contributions of the paper include: 1. **ApolloCorpora**: A high-quality multilingual medical dataset covering rich local characteristics. 2. **Apollo Models**: Multilingual LLMs (0.5B, 1.8B, 2B, 6B, and 7B parameters) achieving state-of-the-art performance in various languages. 3. **Proxy-Tuning**: A method to improve larger general LLMs without fine-tuning, enhancing their multilingual medical capabilities. 4. **XMedBench**: A multilingual medical evaluation benchmark. The paper also explores the benefits and challenges of multilingual training in medical LLMs, highlighting the importance of preserving local medical knowledge and practices while leveraging the collective benefits of a multilingual approach. The Apollo models are designed to be lightweight and efficient, making them suitable for real-time inference on various hardware, thereby democratizing medical AI to a broader community.

Apollo: A Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People

12 Oct 2024 | Xidong Wang, Nuo Chen, Junying Chen, Yidong Wang, Guorui Zhen, Chunxian Zhang, Xiangbo Wu, Yan Hu, Anningzhe Gao, Xiang Wan, Haizhou Li, Benyou Wang