Transcriptomics-guided Slide Representation Learning in Computational Pathology

Transcriptomics-guided Slide Representation Learning in Computational Pathology

19 May 2024 | Guillaume Jaume, Lukas Oldenburg, Anurag Vaidya, Richard J. Chen, Drew F.K. Williamson, Thomas Peeters, Andrew H. Song, Faisal Mahmood
This paper introduces TANGLE, a transcriptomics-guided slide representation learning framework that leverages multimodal pre-training to learn slide embeddings from whole-slide images (WSIs) using gene expression profiles. TANGLE is trained on large-scale datasets of histology slides and corresponding gene expression data from three organs (liver, breast, and lung) across two species (Homo sapiens and Rattus norvegicus). The framework employs modality-specific encoders to extract slide and gene expression embeddings, which are aligned via contrastive learning. TANGLE outperforms existing self-supervised learning (SSL) and supervised baselines in few-shot classification, prototype-based classification, and slide retrieval tasks. It is particularly effective in capturing task-agnostic features that can be used for downstream tasks. TANGLE also demonstrates strong performance in interpreting gene expression data and aligning it with histology slide features. The results show that combining gene expression data with histology images can significantly improve the performance of slide representation learning, especially in tasks such as cancer subtyping and lesion detection. The framework is tested on multiple downstream tasks, including liver lesion classification, breast and lung cancer subtyping, and slide retrieval. The results highlight the potential of (S+E) pre-training in computational pathology and pave the way for further developments in this area.This paper introduces TANGLE, a transcriptomics-guided slide representation learning framework that leverages multimodal pre-training to learn slide embeddings from whole-slide images (WSIs) using gene expression profiles. TANGLE is trained on large-scale datasets of histology slides and corresponding gene expression data from three organs (liver, breast, and lung) across two species (Homo sapiens and Rattus norvegicus). The framework employs modality-specific encoders to extract slide and gene expression embeddings, which are aligned via contrastive learning. TANGLE outperforms existing self-supervised learning (SSL) and supervised baselines in few-shot classification, prototype-based classification, and slide retrieval tasks. It is particularly effective in capturing task-agnostic features that can be used for downstream tasks. TANGLE also demonstrates strong performance in interpreting gene expression data and aligning it with histology slide features. The results show that combining gene expression data with histology images can significantly improve the performance of slide representation learning, especially in tasks such as cancer subtyping and lesion detection. The framework is tested on multiple downstream tasks, including liver lesion classification, breast and lung cancer subtyping, and slide retrieval. The results highlight the potential of (S+E) pre-training in computational pathology and pave the way for further developments in this area.
Reach us at info@study.space
Understanding Transcriptomics-Guided Slide Representation Learning in Computational Pathology