LLMs in Biomedical: A Study on Named Entity Recognition

LLMs in Biomedical: A Study on Named Entity Recognition

11 Jul 2024 | Masoud Monajatipoor, Jiaxin Yang, Joel Stremmel, Melika Emami, Fazlollah Mohaghegh, Mozhdeh Rouhshedaghat, Kai-Wei Chang
This paper explores the application of Large Language Models (LLMs) in biomedical Named Entity Recognition (NER) and investigates strategies to enhance their performance. LLMs face challenges in biomedical NER due to the complexity of medical language and limited high-quality biomedical data. The study highlights the importance of carefully designed prompts and the strategic selection of in-context examples, which can significantly improve F1 scores by up to 15–20% across benchmark datasets. Integrating external biomedical knowledge through prompting strategies enhances the proficiency of general-purpose LLMs for specialized biomedical NER tasks. The proposed method, DiRAG, inspired by Retrieval-Augmented Generation (RAG), leverages a medical knowledge base like UMLS to boost zero-shot F1 scores for biomedical NER. The paper compares two input-output formats, TANL and DICE, for biomedical NER. TANL is a text-to-text format that directly tags entity types within the text, while DICE adds descriptions for each entity type in a template. The effectiveness of each format varies based on dataset complexity and model size. The TANL format is chosen for consistency in subsequent experiments due to its simpler pattern. In-Context Learning (ICL) is shown to benefit from strategic selection of examples, with KATE (K-nearest Neighbor Augmented Example Selection) outperforming random selection. BioClinicalRoBERTa, a model pretrained on biomedical text, achieves the best results among example encoders tested. The study also compares ICL and fine-tuning for biomedical NER. While fine-tuning Llama2-7B is cost-effective for NCBI-disease, GPT-4 with KATE using a biomedical encoder performs better on I2B2 and BC2GM datasets. The proposed DiRAG method uses UMLS as an external knowledge base to augment input data, significantly improving zero-shot NER performance for I2B2 and NCBI-disease datasets. However, it is less effective for BC2GM due to the nature of the UMLS knowledge base. The study concludes that customizing prompting techniques is crucial for biomedical NER. Strategic ICL example selection and data augmentation using external knowledge enhance LLM performance. The research highlights the importance of using biomedical text-pretrained models and the potential of integrating external knowledge to improve zero-shot NER capabilities.This paper explores the application of Large Language Models (LLMs) in biomedical Named Entity Recognition (NER) and investigates strategies to enhance their performance. LLMs face challenges in biomedical NER due to the complexity of medical language and limited high-quality biomedical data. The study highlights the importance of carefully designed prompts and the strategic selection of in-context examples, which can significantly improve F1 scores by up to 15–20% across benchmark datasets. Integrating external biomedical knowledge through prompting strategies enhances the proficiency of general-purpose LLMs for specialized biomedical NER tasks. The proposed method, DiRAG, inspired by Retrieval-Augmented Generation (RAG), leverages a medical knowledge base like UMLS to boost zero-shot F1 scores for biomedical NER. The paper compares two input-output formats, TANL and DICE, for biomedical NER. TANL is a text-to-text format that directly tags entity types within the text, while DICE adds descriptions for each entity type in a template. The effectiveness of each format varies based on dataset complexity and model size. The TANL format is chosen for consistency in subsequent experiments due to its simpler pattern. In-Context Learning (ICL) is shown to benefit from strategic selection of examples, with KATE (K-nearest Neighbor Augmented Example Selection) outperforming random selection. BioClinicalRoBERTa, a model pretrained on biomedical text, achieves the best results among example encoders tested. The study also compares ICL and fine-tuning for biomedical NER. While fine-tuning Llama2-7B is cost-effective for NCBI-disease, GPT-4 with KATE using a biomedical encoder performs better on I2B2 and BC2GM datasets. The proposed DiRAG method uses UMLS as an external knowledge base to augment input data, significantly improving zero-shot NER performance for I2B2 and NCBI-disease datasets. However, it is less effective for BC2GM due to the nature of the UMLS knowledge base. The study concludes that customizing prompting techniques is crucial for biomedical NER. Strategic ICL example selection and data augmentation using external knowledge enhance LLM performance. The research highlights the importance of using biomedical text-pretrained models and the potential of integrating external knowledge to improve zero-shot NER capabilities.
Reach us at info@study.space
Understanding LLMs in Biomedicine%3A A study on clinical Named Entity Recognition