4 Jan 2024 | YUNKUN ZHANG, JIN GAO, ZHELING TAN, LINGFENG ZHOU, KEXIN DING, MU ZHOU, SHAOTING ZHANG, DEQUAN WANG
This survey explores data-centric foundation models (FMs) in computational healthcare, emphasizing their role in improving healthcare workflows through better data characterization, quality, and scale. FMs, which are trained on large-scale data, have shown promise in handling diverse clinical data, including medical conversations, patient health profiling, and treatment planning. The survey discusses key aspects of AI security, assessment, and alignment with human values, highlighting the importance of data-centric approaches in healthcare. It provides an updated list of healthcare-related FMs and datasets at https://github.com/Yunkun-Zhang/Data-Centric-FM-Healthcare.
Foundation models are trained on extensive data to achieve high performance on downstream tasks. They differ from traditional deep learning models in scale and training data. The Transformer architecture is central to FM development, enabling efficient and scalable training through parallel processing and self-attention mechanisms. FMs are pre-trained on large-scale, multi-modal data, including labeled and unlabeled annotations, allowing them to understand concepts and their relationships. Large language models (LLMs) are a concrete example of FMs, pre-trained on internet-scale corpora for semantic understanding and text generation.
The survey discusses essential concepts and techniques in FM development, including large-scale pre-training, fine-tuning, and in-context learning. Pre-training involves training models on vast data to capture extensive information, while fine-tuning adjusts pre-trained models for specific tasks. In-context learning allows FMs to generate desired outputs without fine-tuning by adjusting input queries. These techniques are crucial for building healthcare-focused FMs.
The survey also addresses challenges in healthcare data, including multi-modal data fusion, limited data volume, annotation burden, and patient privacy. Multi-modal FMs offer scalable data fusion strategies for various data formats, while self-supervised learning helps address data scarcity and privacy concerns. The survey highlights the importance of AI-human alignment in healthcare, emphasizing the need for ethical, equitable, and socially responsible AI solutions.
In healthcare, FMs are applied to tasks such as medical imaging, text annotation, and diagnosis. They demonstrate potential in improving patient outcomes and clinical workflows. The survey discusses the benefits of multi-modal FMs, data augmentation, and data efficiency in addressing data quantity and annotation challenges. It also highlights the role of LLMs in simplifying healthcare text annotation and improving medical image segmentation through segmentation FMs like SAM. Overall, the survey provides a comprehensive overview of data-centric FMs in healthcare, emphasizing their potential to enhance healthcare workflows and AI-human alignment.This survey explores data-centric foundation models (FMs) in computational healthcare, emphasizing their role in improving healthcare workflows through better data characterization, quality, and scale. FMs, which are trained on large-scale data, have shown promise in handling diverse clinical data, including medical conversations, patient health profiling, and treatment planning. The survey discusses key aspects of AI security, assessment, and alignment with human values, highlighting the importance of data-centric approaches in healthcare. It provides an updated list of healthcare-related FMs and datasets at https://github.com/Yunkun-Zhang/Data-Centric-FM-Healthcare.
Foundation models are trained on extensive data to achieve high performance on downstream tasks. They differ from traditional deep learning models in scale and training data. The Transformer architecture is central to FM development, enabling efficient and scalable training through parallel processing and self-attention mechanisms. FMs are pre-trained on large-scale, multi-modal data, including labeled and unlabeled annotations, allowing them to understand concepts and their relationships. Large language models (LLMs) are a concrete example of FMs, pre-trained on internet-scale corpora for semantic understanding and text generation.
The survey discusses essential concepts and techniques in FM development, including large-scale pre-training, fine-tuning, and in-context learning. Pre-training involves training models on vast data to capture extensive information, while fine-tuning adjusts pre-trained models for specific tasks. In-context learning allows FMs to generate desired outputs without fine-tuning by adjusting input queries. These techniques are crucial for building healthcare-focused FMs.
The survey also addresses challenges in healthcare data, including multi-modal data fusion, limited data volume, annotation burden, and patient privacy. Multi-modal FMs offer scalable data fusion strategies for various data formats, while self-supervised learning helps address data scarcity and privacy concerns. The survey highlights the importance of AI-human alignment in healthcare, emphasizing the need for ethical, equitable, and socially responsible AI solutions.
In healthcare, FMs are applied to tasks such as medical imaging, text annotation, and diagnosis. They demonstrate potential in improving patient outcomes and clinical workflows. The survey discusses the benefits of multi-modal FMs, data augmentation, and data efficiency in addressing data quantity and annotation challenges. It also highlights the role of LLMs in simplifying healthcare text annotation and improving medical image segmentation through segmentation FMs like SAM. Overall, the survey provides a comprehensive overview of data-centric FMs in healthcare, emphasizing their potential to enhance healthcare workflows and AI-human alignment.