[slides and audio] Advancing entity recognition in biomedicine via instruction tuning of large language models

This study explores the potential of using instruction tuning to enhance the performance of large language models (LLMs) in biomedical named entity recognition (NER). The authors developed an instruction-based learning paradigm that transforms NER from a sequence labeling task into a generation task, streamlining the training and evaluation process by automatically repurposing existing biomedical NER datasets. They created BioNER-LLaMA using the LLaMA-7B model and evaluated it on three widely recognized biomedical NER datasets, including entities related to diseases, chemicals, and genes. The results showed that BioNER-LLaMA achieved higher F1-scores ranging from 5% to 30% compared to the few-shot learning capabilities of GPT-3 on datasets with different biomedical entities. The study also compared BioNER-LLaMA with PMC-LLaMA, a medical-specific LLM, and found that BioNER-LLaMA performed similarly or better in most cases. The authors further analyzed the impact of instruction dataset size and prompt structure on the model's performance, providing valuable insights into optimizing the proposed methodology. The findings highlight the potential of instruction tuning in developing general-domain LLMs that can rival state-of-the-art performances in multi-task, multi-domain biomedical and health applications.This study explores the potential of using instruction tuning to enhance the performance of large language models (LLMs) in biomedical named entity recognition (NER). The authors developed an instruction-based learning paradigm that transforms NER from a sequence labeling task into a generation task, streamlining the training and evaluation process by automatically repurposing existing biomedical NER datasets. They created BioNER-LLaMA using the LLaMA-7B model and evaluated it on three widely recognized biomedical NER datasets, including entities related to diseases, chemicals, and genes. The results showed that BioNER-LLaMA achieved higher F1-scores ranging from 5% to 30% compared to the few-shot learning capabilities of GPT-3 on datasets with different biomedical entities. The study also compared BioNER-LLaMA with PMC-LLaMA, a medical-specific LLM, and found that BioNER-LLaMA performed similarly or better in most cases. The authors further analyzed the impact of instruction dataset size and prompt structure on the model's performance, providing valuable insights into optimizing the proposed methodology. The findings highlight the potential of instruction tuning in developing general-domain LLMs that can rival state-of-the-art performances in multi-task, multi-domain biomedical and health applications.

Advancing entity recognition in biomedicine via instruction tuning of large language models

2024 | Vipina K. Keloth, Yan Hu, Qianqian Xie, Xueqing Peng, Yan Wang, Andrew Zheng, Melih Selek, Kalpana Raja, Chih Hsuan Wei, Qiao Jin, Zhiyong Lu, Qingyu Chen, Hua Xu