2024 | Andrew H. Song, Richard J. Chen, Guillaume Jaume, Anurag Vaidya, Alexander S. Baras, Faisal Mahmood
This paper introduces a multimodal prototyping framework (MMP) for cancer survival prediction, combining histology whole-slide images (WSIs) and transcriptomic profiles. The framework reduces the number of tokens by summarizing histology using morphological prototypes and transcriptomics using biological pathway prototypes, enabling efficient multimodal fusion. Instead of using all patches from WSIs (which can exceed 10^4 patches), MMP condenses them into a small set of prototypes, achieving over 300× compression. Similarly, transcriptomic data is summarized into 50 Cancer Hallmark pathway prototypes. This significantly reduces the number of tokens, allowing for efficient processing with a Transformer or optimal transport cross-alignment. The resulting multimodal tokens are then fused to predict survival outcomes.
Extensive evaluation on six cancer types from The Cancer Genome Atlas (TCGA) shows that MMP outperforms state-of-the-art methods with much less computation, while enabling new interpretability analyses. The framework allows for visualization of bi-directional interactions between morphological and pathway prototypes, which is not possible with previous multimodal frameworks that rely on uni-directional interpretation.
MMP introduces a prototype-based tokenization method that effectively reduces the number of tokens and computational complexity in multimodal fusion. This reduction leads to improved prognostic performance and allows for bidirectional concept-based interpretation of how morphology and transcriptomes interact. The framework is designed for research applications and not yet intended for clinical use. Future work includes extending the framework to different outcomes and rare diseases, leveraging advances in single-cell foundation models, and validating the framework with larger external cohorts.This paper introduces a multimodal prototyping framework (MMP) for cancer survival prediction, combining histology whole-slide images (WSIs) and transcriptomic profiles. The framework reduces the number of tokens by summarizing histology using morphological prototypes and transcriptomics using biological pathway prototypes, enabling efficient multimodal fusion. Instead of using all patches from WSIs (which can exceed 10^4 patches), MMP condenses them into a small set of prototypes, achieving over 300× compression. Similarly, transcriptomic data is summarized into 50 Cancer Hallmark pathway prototypes. This significantly reduces the number of tokens, allowing for efficient processing with a Transformer or optimal transport cross-alignment. The resulting multimodal tokens are then fused to predict survival outcomes.
Extensive evaluation on six cancer types from The Cancer Genome Atlas (TCGA) shows that MMP outperforms state-of-the-art methods with much less computation, while enabling new interpretability analyses. The framework allows for visualization of bi-directional interactions between morphological and pathway prototypes, which is not possible with previous multimodal frameworks that rely on uni-directional interpretation.
MMP introduces a prototype-based tokenization method that effectively reduces the number of tokens and computational complexity in multimodal fusion. This reduction leads to improved prognostic performance and allows for bidirectional concept-based interpretation of how morphology and transcriptomes interact. The framework is designed for research applications and not yet intended for clinical use. Future work includes extending the framework to different outcomes and rare diseases, leveraging advances in single-cell foundation models, and validating the framework with larger external cohorts.