UltraMedical: Building Specialized Generalists in Biomedicine
UltraMedical is a high-quality dataset in the biomedical domain, consisting of 410,000 medical instructions, including both manual and synthetic data. The dataset includes preference annotations across multiple advanced LLMs, enabling the fine-tuning of specialized medical models based on the Llama-3 series. The UltraMedical dataset is combined with other open-domain datasets such as UltraChat to explore how to fuse professional and general skills, and to fine-tune the Llama-3 family of models to produce competitive medical models. Additionally, a reward model is trained based on UltraMedical preferences and previous feedback datasets, achieving advanced results in both annotated medical benchmarks and RewardBench. Based on the preferences of the constructed reward models, the UltraMedical LMs are continuously optimized through a self-generated response strategy, resulting in more powerful models.
The UltraMedical dataset is constructed by combining manual and synthetic biomedical instructions, including medical exam problems, PubMed literature research, and open-ended questions. The dataset includes about 100,000 instructions annotated with preferences over completions from advanced medical and general models, contributing to fine-tuning, reward modeling, and preference learning. The dataset is annotated with completions from various LLMs, including GPT-4, and preference annotations are generated by ranking the completions based on score and explanation. The dataset is also used to train a reward model, which is then used to label responses from UltraMedical LMs and provide "on-policy" completion pairs for preference learning.
The UltraMedical LMs are developed through four steps: supervised fine-tuning, preference learning, reward modeling, and iterative preference learning. The models are evaluated on various medical and general benchmarks, demonstrating superior performance compared to other models. The UltraMedical LMs achieve advanced performance on medical benchmarks, demonstrating the effectiveness of the UltraMedical instructions and preference datasets. The results indicate that the UltraMedical collections can narrow the gap between open-source and proprietary models. The UltraMedical reward models are also evaluated on the general domain, showing competitive performance in both medical and general reward benchmarks. The results indicate that the UltraMedical reward models are effective for online/iterative preference learning methods such as DPO and KTO. The UltraMedical reward models are also effective for re-ranking candidates, outperforming self-consistency ensembles with 8B models but being less effective in supervising 70B models. The results indicate that the UltraMedical reward models are a critical component for the self-evolution of models, and future research could focus on developing more robust reward models. The UltraMedical datasets and models are released to the public on GitHub and Huggingface, aiming to foster collaboration and accelerate progress in the field of biomedical generative AI.UltraMedical: Building Specialized Generalists in Biomedicine
UltraMedical is a high-quality dataset in the biomedical domain, consisting of 410,000 medical instructions, including both manual and synthetic data. The dataset includes preference annotations across multiple advanced LLMs, enabling the fine-tuning of specialized medical models based on the Llama-3 series. The UltraMedical dataset is combined with other open-domain datasets such as UltraChat to explore how to fuse professional and general skills, and to fine-tune the Llama-3 family of models to produce competitive medical models. Additionally, a reward model is trained based on UltraMedical preferences and previous feedback datasets, achieving advanced results in both annotated medical benchmarks and RewardBench. Based on the preferences of the constructed reward models, the UltraMedical LMs are continuously optimized through a self-generated response strategy, resulting in more powerful models.
The UltraMedical dataset is constructed by combining manual and synthetic biomedical instructions, including medical exam problems, PubMed literature research, and open-ended questions. The dataset includes about 100,000 instructions annotated with preferences over completions from advanced medical and general models, contributing to fine-tuning, reward modeling, and preference learning. The dataset is annotated with completions from various LLMs, including GPT-4, and preference annotations are generated by ranking the completions based on score and explanation. The dataset is also used to train a reward model, which is then used to label responses from UltraMedical LMs and provide "on-policy" completion pairs for preference learning.
The UltraMedical LMs are developed through four steps: supervised fine-tuning, preference learning, reward modeling, and iterative preference learning. The models are evaluated on various medical and general benchmarks, demonstrating superior performance compared to other models. The UltraMedical LMs achieve advanced performance on medical benchmarks, demonstrating the effectiveness of the UltraMedical instructions and preference datasets. The results indicate that the UltraMedical collections can narrow the gap between open-source and proprietary models. The UltraMedical reward models are also evaluated on the general domain, showing competitive performance in both medical and general reward benchmarks. The results indicate that the UltraMedical reward models are effective for online/iterative preference learning methods such as DPO and KTO. The UltraMedical reward models are also effective for re-ranking candidates, outperforming self-consistency ensembles with 8B models but being less effective in supervising 70B models. The results indicate that the UltraMedical reward models are a critical component for the self-evolution of models, and future research could focus on developing more robust reward models. The UltraMedical datasets and models are released to the public on GitHub and Huggingface, aiming to foster collaboration and accelerate progress in the field of biomedical generative AI.