11 Jun 2024 | Jonas Dippel, Barbara Feulner, Tobias Winterhoff, Timo Milbich, Stephan Tietz, Simon Schallenberg, Gabriel Dernbach, Andreas Kunft, Simon Heinke, Marie-Lisa Eichl, Julika Ribbat-Idel, Rosemarie Krupar, Philipp Anders, Niklas Preinfl, Philipp Jurmeister, David Horst, Lukas Ruff, Klaus-Robert Müller, Frederick Klauschen, Maximilian Alber
**RudolfV: A Foundation Model by Pathologists for Pathologists**
This study introduces RudolfV, a novel foundation model designed to enhance computational pathology by incorporating pathologist expertise and semi-automated data curation. The model is trained on a diverse dataset comprising 134k slides from 34k cases, covering 58 tissue types, 129 staining modalities, and 6 scanner types from over 15 laboratories. Key aspects of the approach include:
1. **Data Curation**: Pathologists and computational scientists collaborated to group similar slides and tissue patches, optimizing data sampling for training.
2. **AI Training**: The model was trained using the DINOv2 framework, with data sampled from specific distributions derived from slide groups and tissue clusters to balance frequent and infrequent diseases.
3. **Applications**: The model is evaluated on various benchmarks, including tumor microenvironment characterization, immunohistochemistry biomarker evaluation, and reference case search.
**Results**:
- **Tumor Microenvironment Characterization**: RudolfV outperformed state-of-the-art models on 10 out of 12 benchmarks and 28 out of 31 datasets.
- **Immunohistochemistry Biomarker Evaluation**: The model showed significant improvements in cell type classification and biomarker scoring.
- **Reference Case Search**: RudolfV effectively retrieved histologically similar cases for rare diseases, demonstrating its utility in clinical practice.
**Discussion**:
- The study highlights the importance of domain-specific knowledge and data diversity in improving foundation model performance.
- Future research should explore the impact of larger datasets and more advanced pretraining methods on foundation models.
**Conclusion**:
RudolfV demonstrates the potential of integrating pathologist expertise into foundation model design, leading to improved performance and broader clinical applications in computational pathology.**RudolfV: A Foundation Model by Pathologists for Pathologists**
This study introduces RudolfV, a novel foundation model designed to enhance computational pathology by incorporating pathologist expertise and semi-automated data curation. The model is trained on a diverse dataset comprising 134k slides from 34k cases, covering 58 tissue types, 129 staining modalities, and 6 scanner types from over 15 laboratories. Key aspects of the approach include:
1. **Data Curation**: Pathologists and computational scientists collaborated to group similar slides and tissue patches, optimizing data sampling for training.
2. **AI Training**: The model was trained using the DINOv2 framework, with data sampled from specific distributions derived from slide groups and tissue clusters to balance frequent and infrequent diseases.
3. **Applications**: The model is evaluated on various benchmarks, including tumor microenvironment characterization, immunohistochemistry biomarker evaluation, and reference case search.
**Results**:
- **Tumor Microenvironment Characterization**: RudolfV outperformed state-of-the-art models on 10 out of 12 benchmarks and 28 out of 31 datasets.
- **Immunohistochemistry Biomarker Evaluation**: The model showed significant improvements in cell type classification and biomarker scoring.
- **Reference Case Search**: RudolfV effectively retrieved histologically similar cases for rare diseases, demonstrating its utility in clinical practice.
**Discussion**:
- The study highlights the importance of domain-specific knowledge and data diversity in improving foundation model performance.
- Future research should explore the impact of larger datasets and more advanced pretraining methods on foundation models.
**Conclusion**:
RudolfV demonstrates the potential of integrating pathologist expertise into foundation model design, leading to improved performance and broader clinical applications in computational pathology.