18 Jun 2024 | Zhengrui Guo, Jiabo Ma, Yingxue Xu, Yihui Wang, Liansheng Wang, and Hao Chen
**HistGen: Histopathology Report Generation via Local-Global Feature Encoding and Cross-modal Context Interaction**
**Authors:** Zhengrui Guo, Jiabo Ma, Yingxue Xu, Yihui Wang, Liansheng Wang, and Hao Chen
**Abstract:**
Histopathology is crucial in cancer diagnosis, and clinical reports play a vital role in interpreting and guiding treatment. The automation of histopathology report generation using deep learning can significantly enhance clinical efficiency and reduce the burden on pathologists. HistGen is a multiple instance learning (MIL)-empowered framework designed for histopathology report generation, along with the first benchmark dataset for evaluation. Inspired by diagnostic and report-writing workflows, HistGen features two modules: a local-global hierarchical encoder and a cross-modal context module. The local-global encoder efficiently aggregates visual features from a region-to-slide perspective, while the cross-modal context module facilitates alignment and interaction between visual sequences and corresponding reports. Experimental results show that HistGen outperforms state-of-the-art (SOTA) models in WSI report generation and demonstrates superior transfer learning capabilities in cancer subtyping and survival analysis tasks.
**Keywords:** Histopathology Report Generation, Multiple Instance Learning, Cross-Modal Alignment
**Introduction:**
Histopathology tissue analysis is essential for cancer diagnosis and prognosis. Computational pathology (CPath) has advanced this field, but the labor-intensive and time-consuming task of writing reports remains a challenge. HistGen addresses this by leveraging MIL and cross-modal interactions to generate WSI reports. The framework includes a local-global hierarchical encoder and a cross-modal context module, pre-trained on a large dataset of WSIs. Extensive experiments validate the model's superior performance in WSI report generation and its strong transfer learning capabilities in downstream tasks.
**Method:**
- **WSI-Report Dataset Curation:** A benchmark dataset of 7,800 WSI-report pairs is curated from the TCGA platform.
- **Local-Global Hierarchical Encoder (LGH):** efficient encoding and aggregation of extensive WSI patch features.
- **Cross-Modal Context Module (CMC):** enables interactions between visual encoding and textual decoding.
- **Pre-trained Feature Extractor:** a general-purpose MIL feature extractor pre-trained on over 55,000 WSIs.
- **Loss Function:** maximizes the conditional probability of reports given WSIs.
**Experiments:**
- **WSI Report Generation:** HistGen outperforms SOTA models in NLG metrics.
- **Ablation Studies:** confirms the effectiveness of the proposed modules.
- **Transfer Learning for Cancer Diagnosis and Prognosis:** shows superior performance in cancer subtyping and survival analysis tasks.
**Conclusion:**
HistGen is a MIL-empowered framework for automated histopathology report generation, demonstrating strong performance and transfer learning capabilities. Future work will expand to other fields like radiology and ophthalmology.**HistGen: Histopathology Report Generation via Local-Global Feature Encoding and Cross-modal Context Interaction**
**Authors:** Zhengrui Guo, Jiabo Ma, Yingxue Xu, Yihui Wang, Liansheng Wang, and Hao Chen
**Abstract:**
Histopathology is crucial in cancer diagnosis, and clinical reports play a vital role in interpreting and guiding treatment. The automation of histopathology report generation using deep learning can significantly enhance clinical efficiency and reduce the burden on pathologists. HistGen is a multiple instance learning (MIL)-empowered framework designed for histopathology report generation, along with the first benchmark dataset for evaluation. Inspired by diagnostic and report-writing workflows, HistGen features two modules: a local-global hierarchical encoder and a cross-modal context module. The local-global encoder efficiently aggregates visual features from a region-to-slide perspective, while the cross-modal context module facilitates alignment and interaction between visual sequences and corresponding reports. Experimental results show that HistGen outperforms state-of-the-art (SOTA) models in WSI report generation and demonstrates superior transfer learning capabilities in cancer subtyping and survival analysis tasks.
**Keywords:** Histopathology Report Generation, Multiple Instance Learning, Cross-Modal Alignment
**Introduction:**
Histopathology tissue analysis is essential for cancer diagnosis and prognosis. Computational pathology (CPath) has advanced this field, but the labor-intensive and time-consuming task of writing reports remains a challenge. HistGen addresses this by leveraging MIL and cross-modal interactions to generate WSI reports. The framework includes a local-global hierarchical encoder and a cross-modal context module, pre-trained on a large dataset of WSIs. Extensive experiments validate the model's superior performance in WSI report generation and its strong transfer learning capabilities in downstream tasks.
**Method:**
- **WSI-Report Dataset Curation:** A benchmark dataset of 7,800 WSI-report pairs is curated from the TCGA platform.
- **Local-Global Hierarchical Encoder (LGH):** efficient encoding and aggregation of extensive WSI patch features.
- **Cross-Modal Context Module (CMC):** enables interactions between visual encoding and textual decoding.
- **Pre-trained Feature Extractor:** a general-purpose MIL feature extractor pre-trained on over 55,000 WSIs.
- **Loss Function:** maximizes the conditional probability of reports given WSIs.
**Experiments:**
- **WSI Report Generation:** HistGen outperforms SOTA models in NLG metrics.
- **Ablation Studies:** confirms the effectiveness of the proposed modules.
- **Transfer Learning for Cancer Diagnosis and Prognosis:** shows superior performance in cancer subtyping and survival analysis tasks.
**Conclusion:**
HistGen is a MIL-empowered framework for automated histopathology report generation, demonstrating strong performance and transfer learning capabilities. Future work will expand to other fields like radiology and ophthalmology.