PATHALIGN: A vision-language model for whole slide images in histopathology

PATHALIGN: A vision-language model for whole slide images in histopathology

27 Jun 2024 | Faruk Ahmed, Andrew Sellergren, Lin Yang, Shawn Xu, Boris Babenko, Abbi Ward, Niels Olson, Arash Mohtashamian, Yossi Matias, Greg S. Corrado, Quang Duong, Dale R. Webster, Shrvya Shetty, Daniel Golden, Yun Liu, David F. Steiner, and Ellery Wulczyn
PATHALIGN is a vision-language model designed for whole slide images (WSIs) in histopathology. The model leverages WSIs paired with curated text from pathology reports to enable applications such as text or image retrieval and integration with a frozen large language model (LLM) for WSI-based text generation. The model is trained using a de-identified dataset of over 350,000 WSIs and diagnostic text pairs, spanning a wide range of diagnoses, procedures, and tissue types. The model uses a patch-level foundation model and a frozen LLM to generate text and enable visual question answering. Pathologists evaluated the model's text generation and retrieval capabilities, finding that 78% of WSIs were rated as accurate without clinically significant errors or omissions. The model also demonstrated promising results in WSI classification and workflow prioritization. PATHALIGN addresses challenges in aligning WSIs with diagnostic text, enabling applications such as automatic report generation and case-level visual question answering. The model's performance was evaluated on tasks such as NSCLC subtyping, RCC subtyping, BRCA subtyping, and procedure type classification. The study highlights the potential of language-aligned WSI embeddings for computational pathology.PATHALIGN is a vision-language model designed for whole slide images (WSIs) in histopathology. The model leverages WSIs paired with curated text from pathology reports to enable applications such as text or image retrieval and integration with a frozen large language model (LLM) for WSI-based text generation. The model is trained using a de-identified dataset of over 350,000 WSIs and diagnostic text pairs, spanning a wide range of diagnoses, procedures, and tissue types. The model uses a patch-level foundation model and a frozen LLM to generate text and enable visual question answering. Pathologists evaluated the model's text generation and retrieval capabilities, finding that 78% of WSIs were rated as accurate without clinically significant errors or omissions. The model also demonstrated promising results in WSI classification and workflow prioritization. PATHALIGN addresses challenges in aligning WSIs with diagnostic text, enabling applications such as automatic report generation and case-level visual question answering. The model's performance was evaluated on tasks such as NSCLC subtyping, RCC subtyping, BRCA subtyping, and procedure type classification. The study highlights the potential of language-aligned WSI embeddings for computational pathology.
Reach us at info@study.space
[slides and audio] PathAlign%3A A vision-language model for whole slide images in histopathology