This paper introduces HEAL, a 13B parameter LLaMA2-based large language model (LLM) specifically designed for medical note generation. HEAL outperforms GPT-4 and PMC-LLaMA in medical note accuracy, achieving 78.4% accuracy on PubMedQA. It also matches GPT-4 in generating medical notes and surpasses GPT-4 and Med-PaLM 2 in identifying correct medical concepts. HEAL is trained on a diverse dataset including medical and general web corpora, GPT-4 task instructions, and EHRs, enabling it to produce medical SOAP notes approved by physicians.
HEAL is trained using continued pretraining on diverse data and explanation tuning. The training data includes non-medical public datasets, medical public datasets, and proprietary medical datasets. The model was trained on 14.89B tokens, with a focus on improving medical note generation and understanding. The training used 32 A100 80 GB GPUs with FSDP pipeline parallelism and flash attention 2.
HEAL was evaluated on several tasks, including medical note generation, PubMedQA, and MedQA. It outperformed other models and human scribes in terms of completeness, correctness, and conciseness. HEAL achieved 47.2% accuracy on MedQA, surpassing the LLaMA2 13B model but falling short of PMC-LLaMA. HEAL's performance on PubMedQA was 78.4%, surpassing GPT-4 and PMC-LLaMA.
HEAL is designed for medical note summarization and has shown impressive results in generating accurate and complete medical notes. It is used for internal medical tasks such as summarization, transcription-based Q&A, and note review. The model is trained on proprietary medical instruction data, which focuses on medical understanding and improves its ability to generate accurate medical notes.
The paper concludes that HEAL is a promising development in healthcare documentation and other medical areas. It demonstrates that continued pretraining of smaller LLMs can achieve impressive results. Future work could explore using more sophisticated base models, curating higher quality data, and scaling up the experiments.This paper introduces HEAL, a 13B parameter LLaMA2-based large language model (LLM) specifically designed for medical note generation. HEAL outperforms GPT-4 and PMC-LLaMA in medical note accuracy, achieving 78.4% accuracy on PubMedQA. It also matches GPT-4 in generating medical notes and surpasses GPT-4 and Med-PaLM 2 in identifying correct medical concepts. HEAL is trained on a diverse dataset including medical and general web corpora, GPT-4 task instructions, and EHRs, enabling it to produce medical SOAP notes approved by physicians.
HEAL is trained using continued pretraining on diverse data and explanation tuning. The training data includes non-medical public datasets, medical public datasets, and proprietary medical datasets. The model was trained on 14.89B tokens, with a focus on improving medical note generation and understanding. The training used 32 A100 80 GB GPUs with FSDP pipeline parallelism and flash attention 2.
HEAL was evaluated on several tasks, including medical note generation, PubMedQA, and MedQA. It outperformed other models and human scribes in terms of completeness, correctness, and conciseness. HEAL achieved 47.2% accuracy on MedQA, surpassing the LLaMA2 13B model but falling short of PMC-LLaMA. HEAL's performance on PubMedQA was 78.4%, surpassing GPT-4 and PMC-LLaMA.
HEAL is designed for medical note summarization and has shown impressive results in generating accurate and complete medical notes. It is used for internal medical tasks such as summarization, transcription-based Q&A, and note review. The model is trained on proprietary medical instruction data, which focuses on medical understanding and improves its ability to generate accurate medical notes.
The paper concludes that HEAL is a promising development in healthcare documentation and other medical areas. It demonstrates that continued pretraining of smaller LLMs can achieve impressive results. Future work could explore using more sophisticated base models, curating higher quality data, and scaling up the experiments.