Hibou: A Family of Foundational Vision Transformers for Pathology

Hibou: A Family of Foundational Vision Transformers for Pathology

20 Aug 2024 | Dmitry Nechaev, Alexey Pchelnikov, and Ekaterina Ivanova
The Hibou family of foundational vision transformers for pathology is introduced in this paper. Leveraging the DINOv2 framework, two models, Hibou-B and Hibou-L, are pretrained on a proprietary dataset of over 1 million whole slide images (WSIs) representing diverse tissue types and staining techniques. The pretrained models demonstrate superior performance on both patch-level and slide-level benchmarks, with Hibou-L achieving the highest average accuracy across multiple benchmark datasets. The Hibou models are open-sourced to support further research and application in the field. Pathology involves the microscopic examination of diseased tissue and is critical for diagnosing various medical conditions, particularly cancers. Traditional methods are labor-intensive and prone to human error. Digital pathology, which converts glass slides into high-resolution digital images for analysis by computer algorithms, has revolutionized the field by enhancing diagnostic accuracy, consistency, and efficiency through automated image analysis and large-scale data processing. Vision Transformers (ViTs) have shown great promise in image analysis for digital pathology due to their ability to model long-range dependencies. Foundational pretraining techniques, including self-supervised learning, enable models to learn robust features from unlabeled data, which is valuable in histopathology where annotated datasets are often limited. Recent works in ViT pretraining for histopathology have predominantly utilized frameworks such as iBot and DINOv2. The Hibou models are trained on a proprietary dataset of over 1 million WSIs, with data augmentation techniques including random angle rotation, random flips, and RandStainNA. Training details include using the DINOv2 framework with different GPU configurations for Hibou-B and Hibou-L. The models are evaluated on various public datasets, showing strong performance in both patch-level and slide-level tasks. Hibou-L achieves the highest AUC across three datasets, while Hibou-B surpasses ProvGigaPath in two out of three benchmarks despite having 13 times fewer parameters. The Hibou models are also evaluated on segmentation tasks, showing superior performance compared to other models. The Hibou-B model is open-sourced under an Apache 2.0 license to support further research and development in the field. The models are expected to contribute to accurate and efficient histopathological analysis, with potential applications in clinical settings.The Hibou family of foundational vision transformers for pathology is introduced in this paper. Leveraging the DINOv2 framework, two models, Hibou-B and Hibou-L, are pretrained on a proprietary dataset of over 1 million whole slide images (WSIs) representing diverse tissue types and staining techniques. The pretrained models demonstrate superior performance on both patch-level and slide-level benchmarks, with Hibou-L achieving the highest average accuracy across multiple benchmark datasets. The Hibou models are open-sourced to support further research and application in the field. Pathology involves the microscopic examination of diseased tissue and is critical for diagnosing various medical conditions, particularly cancers. Traditional methods are labor-intensive and prone to human error. Digital pathology, which converts glass slides into high-resolution digital images for analysis by computer algorithms, has revolutionized the field by enhancing diagnostic accuracy, consistency, and efficiency through automated image analysis and large-scale data processing. Vision Transformers (ViTs) have shown great promise in image analysis for digital pathology due to their ability to model long-range dependencies. Foundational pretraining techniques, including self-supervised learning, enable models to learn robust features from unlabeled data, which is valuable in histopathology where annotated datasets are often limited. Recent works in ViT pretraining for histopathology have predominantly utilized frameworks such as iBot and DINOv2. The Hibou models are trained on a proprietary dataset of over 1 million WSIs, with data augmentation techniques including random angle rotation, random flips, and RandStainNA. Training details include using the DINOv2 framework with different GPU configurations for Hibou-B and Hibou-L. The models are evaluated on various public datasets, showing strong performance in both patch-level and slide-level tasks. Hibou-L achieves the highest AUC across three datasets, while Hibou-B surpasses ProvGigaPath in two out of three benchmarks despite having 13 times fewer parameters. The Hibou models are also evaluated on segmentation tasks, showing superior performance compared to other models. The Hibou-B model is open-sourced under an Apache 2.0 license to support further research and development in the field. The models are expected to contribute to accurate and efficient histopathological analysis, with potential applications in clinical settings.
Reach us at info@study.space
[slides] Hibou%3A A Family of Foundational Vision Transformers for Pathology | StudySpace