5 Aug 2024 | Yingxue Xu, Yihui Wang, Fengtao Zhou, Jiabo Ma, Shu Yang, Huangjing Lin, Xin Wang, Jiguang Wang, Li Liang, Anjia Han, Ronald Cheong Kin Chan, Hao Chen
A Multimodal Knowledge-enhanced Whole-slide Pathology Foundation Model
This paper introduces a novel whole-slide pretraining paradigm called Multimodal Self-Taught PRetraining (mSTAR) to enhance the performance of pathology foundation models (FMs) by integrating multimodal knowledge at the slide level. The proposed method leverages a large multimodal dataset consisting of H&E diagnostic whole slide images (WSIs), pathology reports, and RNA-Seq data from 10,275 patients across 32 cancer types, resulting in 26,169 slide-level modality pairs. The mSTAR paradigm consists of two stages: 1) Slide-level Contrastive Learning, which injects multimodal knowledge into a slide aggregator, and 2) Patch-level Self-Taught Training, which transfers this knowledge to the patch extractor. This approach enables the pathology FM to acquire whole-slide context, expanding the modeling context from unimodal to multimodal knowledge and from patch-level to slide-level.
The mSTAR model was evaluated on 7 diverse types of tasks across 43 subtasks, demonstrating significant performance improvements compared to other state-of-the-art FMs. The results showed that mSTAR consistently outperformed other models in both unimodal and multimodal applications, with statistically significant differences observed. In particular, mSTAR showcased promising superiority in multimodal capabilities through the integration of multimodal knowledge.
The study also evaluated mSTAR's performance in various clinical tasks, including slide classification, survival analysis, and pathological report generation. The results indicated that mSTAR achieved significant improvements in these tasks, demonstrating its effectiveness in enhancing the performance of pathology FMs. The integration of multimodal knowledge at the slide level enabled mSTAR to capture more comprehensive and accurate information, leading to better performance in tasks such as cancer survival prediction and molecular prediction.
The study highlights the importance of incorporating multimodal knowledge in pathology FMs to improve their performance in clinical tasks. The mSTAR paradigm provides a new approach to pretraining pathology FMs by leveraging multimodal data, enabling them to better understand and predict complex clinical scenarios. The results demonstrate that mSTAR is a promising foundation model for computational pathology, capable of achieving significant improvements in various clinical tasks.A Multimodal Knowledge-enhanced Whole-slide Pathology Foundation Model
This paper introduces a novel whole-slide pretraining paradigm called Multimodal Self-Taught PRetraining (mSTAR) to enhance the performance of pathology foundation models (FMs) by integrating multimodal knowledge at the slide level. The proposed method leverages a large multimodal dataset consisting of H&E diagnostic whole slide images (WSIs), pathology reports, and RNA-Seq data from 10,275 patients across 32 cancer types, resulting in 26,169 slide-level modality pairs. The mSTAR paradigm consists of two stages: 1) Slide-level Contrastive Learning, which injects multimodal knowledge into a slide aggregator, and 2) Patch-level Self-Taught Training, which transfers this knowledge to the patch extractor. This approach enables the pathology FM to acquire whole-slide context, expanding the modeling context from unimodal to multimodal knowledge and from patch-level to slide-level.
The mSTAR model was evaluated on 7 diverse types of tasks across 43 subtasks, demonstrating significant performance improvements compared to other state-of-the-art FMs. The results showed that mSTAR consistently outperformed other models in both unimodal and multimodal applications, with statistically significant differences observed. In particular, mSTAR showcased promising superiority in multimodal capabilities through the integration of multimodal knowledge.
The study also evaluated mSTAR's performance in various clinical tasks, including slide classification, survival analysis, and pathological report generation. The results indicated that mSTAR achieved significant improvements in these tasks, demonstrating its effectiveness in enhancing the performance of pathology FMs. The integration of multimodal knowledge at the slide level enabled mSTAR to capture more comprehensive and accurate information, leading to better performance in tasks such as cancer survival prediction and molecular prediction.
The study highlights the importance of incorporating multimodal knowledge in pathology FMs to improve their performance in clinical tasks. The mSTAR paradigm provides a new approach to pretraining pathology FMs by leveraging multimodal data, enabling them to better understand and predict complex clinical scenarios. The results demonstrate that mSTAR is a promising foundation model for computational pathology, capable of achieving significant improvements in various clinical tasks.