Understanding BioBERT%3A a pre-trained biomedical language representation model for biomedical text mining

The article introduces BioBERT, a pre-trained language representation model designed for biomedical text mining. BioBERT is a domain-specific adaptation of BERT, a pre-trained language model, specifically trained on biomedical corpora to address the word distribution shift from general domain corpora to biomedical texts. The authors highlight the limitations of applying general NLP models to biomedical text mining due to the unique characteristics of biomedical corpora, such as domain-specific terms and proper nouns. By pre-training BERT on biomedical corpora, BioBERT significantly improves performance on various biomedical text mining tasks, including named entity recognition (NER), relation extraction (RE), and question answering (QA). The study demonstrates that BioBERT outperforms BERT and previous state-of-the-art models in these tasks, achieving higher F1 scores in NER and RE, and a higher MRR score in QA. The pre-trained weights and source code for BioBERT are made publicly available, facilitating further research and application in biomedical text mining.The article introduces BioBERT, a pre-trained language representation model designed for biomedical text mining. BioBERT is a domain-specific adaptation of BERT, a pre-trained language model, specifically trained on biomedical corpora to address the word distribution shift from general domain corpora to biomedical texts. The authors highlight the limitations of applying general NLP models to biomedical text mining due to the unique characteristics of biomedical corpora, such as domain-specific terms and proper nouns. By pre-training BERT on biomedical corpora, BioBERT significantly improves performance on various biomedical text mining tasks, including named entity recognition (NER), relation extraction (RE), and question answering (QA). The study demonstrates that BioBERT outperforms BERT and previous state-of-the-art models in these tasks, achieving higher F1 scores in NER and RE, and a higher MRR score in QA. The pre-trained weights and source code for BioBERT are made publicly available, facilitating further research and application in biomedical text mining.

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

2019 | Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, Jaewoo Kang