22 May 2024 | George Shaikovski, Adam Casson, Kristen Severson, Eric Zimmermann, Yi Kan Wang, Jeremy D. Kunz, Juan A. Retamero, Gerard Oakley, David Klimstra, Christopher Kanan, Matthew Hanna, Michal Zelechowski, Julian Viret, Neil Tenenholtz, James Hall, Nicolò Fusi, Razik Yousfi, Peter Hamilton, William A. Moye, Eugene Vorontsov, Siqi Liu, Thomas J. Fuchs
PRISM is a multi-modal generative foundation model designed for slide-level histopathology, leveraging clinical report text for pre-training. The model addresses the mismatch between clinical analysis, which operates at the level of whole slide images (WSIs), and existing foundation models that process individual image tiles separately. By aggregating tile embeddings into a single slide embedding, PRISM can generate clinical reports and perform zero-shot cancer detection and sub-typing with performance approaching or surpassing supervised aggregator models. Additionally, fine-tuning PRISM's slide encoder yields label-efficient training for biomarker prediction, even with limited training data. The model is pre-trained using 587,196 WSIs and 195,344 associated clinical text reports, demonstrating its effectiveness in various downstream tasks such as cancer detection, tissue sub-typing, and biomarker prediction. PRISM's capabilities include generating text-based diagnosis reports, zero-shot prediction, and slide-level linear classification, making it a versatile tool for computational pathology.PRISM is a multi-modal generative foundation model designed for slide-level histopathology, leveraging clinical report text for pre-training. The model addresses the mismatch between clinical analysis, which operates at the level of whole slide images (WSIs), and existing foundation models that process individual image tiles separately. By aggregating tile embeddings into a single slide embedding, PRISM can generate clinical reports and perform zero-shot cancer detection and sub-typing with performance approaching or surpassing supervised aggregator models. Additionally, fine-tuning PRISM's slide encoder yields label-efficient training for biomarker prediction, even with limited training data. The model is pre-trained using 587,196 WSIs and 195,344 associated clinical text reports, demonstrating its effectiveness in various downstream tasks such as cancer detection, tissue sub-typing, and biomarker prediction. PRISM's capabilities include generating text-based diagnosis reports, zero-shot prediction, and slide-level linear classification, making it a versatile tool for computational pathology.