21 March 2024 | Vipina K. Keloth, Yan Hu, Qianqian Xie, Xueqing Peng, Yan Wang, Andrew Zheng, Melih Selek, Kalpana Raja, Chih Hsuan Wei, Qiao Jin, Zhiyong Lu, Qingyu Chen, Hua Xu
This paper introduces BioNER-LLaMA, an instruction-tuned large language model (LLM) for biomedical named entity recognition (NER). The study demonstrates that a general-domain LLM can achieve performance comparable to domain-specific models like PubMedBERT and PMC-LLaMA, which are fine-tuned for biomedical NER. The proposed approach transforms NER from a sequence labeling task into a generation task by leveraging existing biomedical NER datasets to create instruction examples. This method allows for end-to-end training and evaluation, and it is shown to outperform GPT-4 in several biomedical NER datasets, achieving F1 scores ranging from 5% to 30% higher. The results indicate that instruction tuning can effectively enhance the performance of LLMs in biomedical NER tasks without the need for extensive domain-specific fine-tuning. The study also highlights the importance of instruction dataset size and prompt structure in the performance of instruction-following NER models. The BioNER-LLaMA model was evaluated on three widely recognized biomedical NER datasets, and it achieved strong performance across different entity types, including diseases, chemicals, and genes. The study further shows that the proposed approach can be applied to other biomedical NLP tasks, offering a generalizable framework for NER. The results suggest that instruction tuning can be a promising approach for improving the performance of LLMs in biomedical NLP tasks. The study also discusses the limitations of LLMs, including high computational requirements and challenges with entity disambiguation. The authors conclude that the proposed instruction tuning paradigm has the potential to transform the development and paradigm of biomedical NLP tasks.This paper introduces BioNER-LLaMA, an instruction-tuned large language model (LLM) for biomedical named entity recognition (NER). The study demonstrates that a general-domain LLM can achieve performance comparable to domain-specific models like PubMedBERT and PMC-LLaMA, which are fine-tuned for biomedical NER. The proposed approach transforms NER from a sequence labeling task into a generation task by leveraging existing biomedical NER datasets to create instruction examples. This method allows for end-to-end training and evaluation, and it is shown to outperform GPT-4 in several biomedical NER datasets, achieving F1 scores ranging from 5% to 30% higher. The results indicate that instruction tuning can effectively enhance the performance of LLMs in biomedical NER tasks without the need for extensive domain-specific fine-tuning. The study also highlights the importance of instruction dataset size and prompt structure in the performance of instruction-following NER models. The BioNER-LLaMA model was evaluated on three widely recognized biomedical NER datasets, and it achieved strong performance across different entity types, including diseases, chemicals, and genes. The study further shows that the proposed approach can be applied to other biomedical NLP tasks, offering a generalizable framework for NER. The results suggest that instruction tuning can be a promising approach for improving the performance of LLMs in biomedical NLP tasks. The study also discusses the limitations of LLMs, including high computational requirements and challenges with entity disambiguation. The authors conclude that the proposed instruction tuning paradigm has the potential to transform the development and paradigm of biomedical NLP tasks.