The paper introduces HEAL, a 13B LLaMA2-based LLM designed for medical conversations and automated scribing. HEAL outperforms GPT-4 and PMC-LLaMA in PubMedQA with an accuracy of 78.4% and matches GPT-4 in generating medical notes. It surpasses GPT-4 and Med-PaLM 2 in identifying correct medical concepts and exceeds human scribes and other models in correctness and completeness. The model is trained using a combination of non-medical and medical public datasets, as well as proprietary medical data, including doctor-patient conversations, EHRs, and SOAP notes. The training process involves continued pretraining on diverse data and explanation tuning to enhance the model's understanding of medical contexts. Evaluations show that HEAL consistently improves in generating long and medium texts and multi-choice classification tasks. In medical note generation, HEAL outperforms other models and human scribes in completeness, correctness, and conciseness. On public benchmarks like PubMedQA and MedQA, HEAL achieves high accuracy, demonstrating its effectiveness in medical tasks. The paper concludes that small-scale continued pretraining can lead to impressive gains and suggests further improvements through larger-scale training.The paper introduces HEAL, a 13B LLaMA2-based LLM designed for medical conversations and automated scribing. HEAL outperforms GPT-4 and PMC-LLaMA in PubMedQA with an accuracy of 78.4% and matches GPT-4 in generating medical notes. It surpasses GPT-4 and Med-PaLM 2 in identifying correct medical concepts and exceeds human scribes and other models in correctness and completeness. The model is trained using a combination of non-medical and medical public datasets, as well as proprietary medical data, including doctor-patient conversations, EHRs, and SOAP notes. The training process involves continued pretraining on diverse data and explanation tuning to enhance the model's understanding of medical contexts. Evaluations show that HEAL consistently improves in generating long and medium texts and multi-choice classification tasks. In medical note generation, HEAL outperforms other models and human scribes in completeness, correctness, and conciseness. On public benchmarks like PubMedQA and MedQA, HEAL achieves high accuracy, demonstrating its effectiveness in medical tasks. The paper concludes that small-scale continued pretraining can lead to impressive gains and suggests further improvements through larger-scale training.