HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis

HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis

23 Jun 2024 | Guillaume Jaume, Paul Doucet, Andrew H. Song, Ming Y. Lu, Cristina Almagro-Pérez, Sophia J. Wagner, Amurag J. Vaidya, Richard J. Chen, Drew F.K. Williamson, Ahrong Kim, Faisal Mahmood
HEST-1k is a comprehensive dataset that integrates spatial transcriptomics (ST) and H&E-stained whole-slide images (WSIs) from 131 public and internal cohorts, covering 25 organs, two species (Homo Sapiens and Mus Musculus), and 320 cancer samples from 25 subtypes. The dataset includes 1.1 million expression-morphology pairs and 60 million detected nuclei. HEST-1k is designed to address the limitations of existing computational methods in ST by providing rich, multi-modal data for benchmarking, biomarker discovery, and multimodal representation learning. The dataset is accompanied by the HEST-Library, a Python package for querying and processing HEST-1k data, and the HEST-Benchmark, a set of ten tasks for gene expression prediction from histology, evaluated on ten state-of-the-art models. The HEST-Benchmark reveals insights into the predictive capabilities of foundation models for histology, highlighting the need for diverse and challenging benchmarks. The dataset and tools are freely available for research purposes.HEST-1k is a comprehensive dataset that integrates spatial transcriptomics (ST) and H&E-stained whole-slide images (WSIs) from 131 public and internal cohorts, covering 25 organs, two species (Homo Sapiens and Mus Musculus), and 320 cancer samples from 25 subtypes. The dataset includes 1.1 million expression-morphology pairs and 60 million detected nuclei. HEST-1k is designed to address the limitations of existing computational methods in ST by providing rich, multi-modal data for benchmarking, biomarker discovery, and multimodal representation learning. The dataset is accompanied by the HEST-Library, a Python package for querying and processing HEST-1k data, and the HEST-Benchmark, a set of ten tasks for gene expression prediction from histology, evaluated on ten state-of-the-art models. The HEST-Benchmark reveals insights into the predictive capabilities of foundation models for histology, highlighting the need for diverse and challenging benchmarks. The dataset and tools are freely available for research purposes.
Reach us at info@study.space
[slides and audio] HEST-1k%3A A Dataset for Spatial Transcriptomics and Histology Image Analysis