Small Language Models Learn Enhanced Reasoning Skills from Medical Textbooks

Small Language Models Learn Enhanced Reasoning Skills from Medical Textbooks

30 Jun 2024 | Hyunjae Kim, Hyeon Hwang, Jiwoo Lee, Sihyeon Park, Dain Kim, Taewhoo Lee, Chanwoong Yoon, Jiwon Sohn, Donghee Choi, Jaewoo Kang
The paper introduces Meerkat, a new family of medical AI systems designed to enhance reasoning skills through synthetic data from medical textbooks. The models, ranging from 7 to 70 billion parameters, were trained using a synthetic dataset that includes high-quality chain-of-thought (CoT) reasoning paths from 18 medical textbooks and diverse instruction-following datasets. Meerkat models achieved significant improvements in accuracy across six medical benchmarks, surpassing previous best models such as MediTron and BioMistral, and even outperforming GPT-4 by an average of 1.3%. Notably, Meerkat-7B became the first 7B-parameter model to exceed the USMLE passing threshold, and Meerkat-70B diagnosed 21 out of 38 complex clinical cases, closely matching GPT-4's performance. The models also provided more detailed and comprehensive responses to clinical queries compared to existing small models, narrowing the performance gap with large commercial models. The study highlights the effectiveness of Meerkat in addressing complex medical challenges and the importance of open-source models in advancing medical AI.The paper introduces Meerkat, a new family of medical AI systems designed to enhance reasoning skills through synthetic data from medical textbooks. The models, ranging from 7 to 70 billion parameters, were trained using a synthetic dataset that includes high-quality chain-of-thought (CoT) reasoning paths from 18 medical textbooks and diverse instruction-following datasets. Meerkat models achieved significant improvements in accuracy across six medical benchmarks, surpassing previous best models such as MediTron and BioMistral, and even outperforming GPT-4 by an average of 1.3%. Notably, Meerkat-7B became the first 7B-parameter model to exceed the USMLE passing threshold, and Meerkat-70B diagnosed 21 out of 38 complex clinical cases, closely matching GPT-4's performance. The models also provided more detailed and comprehensive responses to clinical queries compared to existing small models, narrowing the performance gap with large commercial models. The study highlights the effectiveness of Meerkat in addressing complex medical challenges and the importance of open-source models in advancing medical AI.
Reach us at info@study.space