ClinicalMamba: A Generative Clinical Language Model on Longitudinal Clinical Notes

ClinicalMamba: A Generative Clinical Language Model on Longitudinal Clinical Notes

2024 | Zhichao Yang, Avijit Mitra, Sunjae Kwon, Hong Yu
ClinicalMamba is a specialized version of the Mamba language model pretrained on longitudinal clinical notes to address the unique linguistic characteristics and information processing needs of the medical domain. It has 130 million and 2.8 billion parameters and demonstrates superior performance in modeling clinical language across extended text lengths compared to Mamba and clinical Llama. ClinicalMamba achieves notable benchmarks in speed and performance, outperforming existing clinical language models and large language models like GPT-4 in longitudinal clinical tasks. It is the first clinical autoregressive language model with a 16k maximum token length. ClinicalMamba was pretrained on MIMIC-III clinical notes and demonstrates superior performance in clinical information extraction tasks such as cohort selection for clinical trials and ICD coding. It outperforms other models in these tasks, including Hierarchical-ClinicalRoberta and ClinicalLongformer. ClinicalMamba also offers a great tradeoff between language modeling abilities and inference speed, being significantly faster than ClinicalLlama-7b. The study highlights the importance of long-context models in clinical NLP and the potential of ClinicalMamba to improve clinical data processing. However, the study has limitations, including the use of a single hospital's ICU data and the focus on English clinical notes. The research emphasizes the need for further development of multimodal and parameter-efficient models to enhance the applicability of clinical NLP in diverse medical fields.ClinicalMamba is a specialized version of the Mamba language model pretrained on longitudinal clinical notes to address the unique linguistic characteristics and information processing needs of the medical domain. It has 130 million and 2.8 billion parameters and demonstrates superior performance in modeling clinical language across extended text lengths compared to Mamba and clinical Llama. ClinicalMamba achieves notable benchmarks in speed and performance, outperforming existing clinical language models and large language models like GPT-4 in longitudinal clinical tasks. It is the first clinical autoregressive language model with a 16k maximum token length. ClinicalMamba was pretrained on MIMIC-III clinical notes and demonstrates superior performance in clinical information extraction tasks such as cohort selection for clinical trials and ICD coding. It outperforms other models in these tasks, including Hierarchical-ClinicalRoberta and ClinicalLongformer. ClinicalMamba also offers a great tradeoff between language modeling abilities and inference speed, being significantly faster than ClinicalLlama-7b. The study highlights the importance of long-context models in clinical NLP and the potential of ClinicalMamba to improve clinical data processing. However, the study has limitations, including the use of a single hospital's ICU data and the focus on English clinical notes. The research emphasizes the need for further development of multimodal and parameter-efficient models to enhance the applicability of clinical NLP in diverse medical fields.
Reach us at info@study.space
[slides and audio] ClinicalMamba%3A A Generative Clinical Language Model on Longitudinal Clinical Notes