Understanding BioLORD-2023%3A semantic textual representations fusing large language models and clinical knowledge graph insights

This study introduces BioLORD-2023, a state-of-the-art model for semantic textual similarity (STS) and biomedical concept representation (BCR) in the clinical domain. The research aims to leverage large language models (LLMs) to complement biomedical knowledge graphs (BKGs) in training semantic models. The proposed approach consists of three main steps: an improved contrastive learning phase, a novel self-distillation phase, and a weight averaging phase. The results demonstrate significant improvements over previous models in various downstream tasks, including STS, BCR, and named entity linking across 15+ datasets. Additionally, a multilingual model compatible with 50+ languages is released, enabling broader applicability. The study highlights the benefits of integrating LLMs and BKGs, particularly in enhancing the biomedical expertise of semantic models and reducing the trade-off between biomedical knowledge and general language understanding. The authors discuss the practical implications of their findings and outline future directions, including the integration of recent biomedical literature and larger STS models.This study introduces BioLORD-2023, a state-of-the-art model for semantic textual similarity (STS) and biomedical concept representation (BCR) in the clinical domain. The research aims to leverage large language models (LLMs) to complement biomedical knowledge graphs (BKGs) in training semantic models. The proposed approach consists of three main steps: an improved contrastive learning phase, a novel self-distillation phase, and a weight averaging phase. The results demonstrate significant improvements over previous models in various downstream tasks, including STS, BCR, and named entity linking across 15+ datasets. Additionally, a multilingual model compatible with 50+ languages is released, enabling broader applicability. The study highlights the benefits of integrating LLMs and BKGs, particularly in enhancing the biomedical expertise of semantic models and reducing the trade-off between biomedical knowledge and general language understanding. The authors discuss the practical implications of their findings and outline future directions, including the integration of recent biomedical literature and larger STS models.

BioLORD-2023: semantic textual representations fusing large language models and clinical knowledge graph insights

2024 | François Remy, Kris Demuynck, Thomas Demeester