20 Aug 2024 | Dmitry Nechaev*, Alexey Pchelnikov†‡, and Ekaterina Ivanova‡† HistAI
The paper introduces the Hibou family of foundational vision transformers for pathology, which leverages the DINOv2 framework to pretrain two model variants, Hibou-B and Hibou-L, on a proprietary dataset of over 1 million whole slide images (WSIs). These models are designed to enhance diagnostic accuracy, consistency, and efficiency in digital pathology by enabling automated image analysis and large-scale data processing. The pretraining process uses self-supervised learning to learn from vast amounts of unannotated data, making the models robust and generalizable.
The models demonstrate superior performance on both patch-level and slide-level benchmarks, with Hibou-L achieving the highest average accuracy across multiple datasets. The paper also discusses the methodology, including data preparation, data augmentations, and training details. The results show that Hibou-L outperforms existing state-of-the-art methods, while Hibou-B, despite having fewer parameters, surpasses Prov-GigaPath in two out of three benchmarks.
Future work will focus on expanding evaluation benchmarks, investigating slide-level pretraining, and integrating Hibou models into Large Vision-Language Models (LVLMs) to enhance diagnostic accuracy and streamline workflows. The Hibou-B model is open-sourced to support further research and development in the community.The paper introduces the Hibou family of foundational vision transformers for pathology, which leverages the DINOv2 framework to pretrain two model variants, Hibou-B and Hibou-L, on a proprietary dataset of over 1 million whole slide images (WSIs). These models are designed to enhance diagnostic accuracy, consistency, and efficiency in digital pathology by enabling automated image analysis and large-scale data processing. The pretraining process uses self-supervised learning to learn from vast amounts of unannotated data, making the models robust and generalizable.
The models demonstrate superior performance on both patch-level and slide-level benchmarks, with Hibou-L achieving the highest average accuracy across multiple datasets. The paper also discusses the methodology, including data preparation, data augmentations, and training details. The results show that Hibou-L outperforms existing state-of-the-art methods, while Hibou-B, despite having fewer parameters, surpasses Prov-GigaPath in two out of three benchmarks.
Future work will focus on expanding evaluation benchmarks, investigating slide-level pretraining, and integrating Hibou models into Large Vision-Language Models (LVLMs) to enhance diagnostic accuracy and streamline workflows. The Hibou-B model is open-sourced to support further research and development in the community.