Nicheformer: a foundation model for single-cell and spatial omics

Nicheformer: a foundation model for single-cell and spatial omics

April 17, 2024 | Anna C. Schaar, Alejandro Tejada-Lapuerta, Giovanni Palla, Robert Gutgesell, Lennard Halle, Maria Minaeva, Larsen Vornholz, Leander Dony, Francesca Drummer, Mojtaba Bahrami, Fabian J. Theis
Nicheformer is a transformer-based foundation model designed for spatial omics, combining dissociated single-cell and spatial transcriptomics data to learn a unified cellular representation. It is pretrained on over 57 million dissociated and 53 million spatially resolved cells across 73 tissues from both human and mouse, enabling predictions for spatially dependent tasks with limited data. The model is fine-tuned on spatial tasks for spatial omics data to decode spatially resolved cellular information. Nicheformer demonstrates effectiveness in zero-shot-like and fine-tuning scenarios on spatially relevant downstream tasks such as spatial density prediction or niche and region label prediction. It enables the prediction of the spatial context of dissociated cells, allowing the transfer of rich spatial information to scRNA-seq datasets. The model's ability to learn a joint representation of single-cell and spatial genomics is enhanced by incorporating contextual information through modality, organism, and assay tokens. Nicheformer outperforms existing embedding models like scVI and PCA in various tasks, demonstrating the advantage of the improved model capacity of the underlying transformer. The model's large-scale resource of over 110 million cells in a partial spatial context, along with novel spatial learning tasks, will pave the way for the next generation of machine-learning models for spatial single-cell analysis. Nicheformer accurately transfers spatial context identified in spatial transcriptomics onto dissociated single-cell data, enriching classical single-cell RNA-seq data with spatial context. The model's ability to capture nuanced spatial information enables powerful cell representations, with linear probing surpassing existing baselines and fine-tuning further refining the representation. Nicheformer enables the direct transfer of spatially-aware annotations from spatial to dissociated single-cell data, unlocking new possibilities for analyzing single-cell data across different modalities. Nicheformer also predicts neighborhood compositions in spatial and dissociated single-cell data, showing its ability to accurately relate changes in gene expression to differences in neighborhood compositions. The model's performance is evaluated across multiple organs and technologies, demonstrating its effectiveness in predicting neighborhood density and other spatial features. Nicheformer's embeddings are able to capture neighborhood density variation solely on transcriptome information better than baselines, highlighting its potential for spatial single-cell analysis.Nicheformer is a transformer-based foundation model designed for spatial omics, combining dissociated single-cell and spatial transcriptomics data to learn a unified cellular representation. It is pretrained on over 57 million dissociated and 53 million spatially resolved cells across 73 tissues from both human and mouse, enabling predictions for spatially dependent tasks with limited data. The model is fine-tuned on spatial tasks for spatial omics data to decode spatially resolved cellular information. Nicheformer demonstrates effectiveness in zero-shot-like and fine-tuning scenarios on spatially relevant downstream tasks such as spatial density prediction or niche and region label prediction. It enables the prediction of the spatial context of dissociated cells, allowing the transfer of rich spatial information to scRNA-seq datasets. The model's ability to learn a joint representation of single-cell and spatial genomics is enhanced by incorporating contextual information through modality, organism, and assay tokens. Nicheformer outperforms existing embedding models like scVI and PCA in various tasks, demonstrating the advantage of the improved model capacity of the underlying transformer. The model's large-scale resource of over 110 million cells in a partial spatial context, along with novel spatial learning tasks, will pave the way for the next generation of machine-learning models for spatial single-cell analysis. Nicheformer accurately transfers spatial context identified in spatial transcriptomics onto dissociated single-cell data, enriching classical single-cell RNA-seq data with spatial context. The model's ability to capture nuanced spatial information enables powerful cell representations, with linear probing surpassing existing baselines and fine-tuning further refining the representation. Nicheformer enables the direct transfer of spatially-aware annotations from spatial to dissociated single-cell data, unlocking new possibilities for analyzing single-cell data across different modalities. Nicheformer also predicts neighborhood compositions in spatial and dissociated single-cell data, showing its ability to accurately relate changes in gene expression to differences in neighborhood compositions. The model's performance is evaluated across multiple organs and technologies, demonstrating its effectiveness in predicting neighborhood density and other spatial features. Nicheformer's embeddings are able to capture neighborhood density variation solely on transcriptome information better than baselines, highlighting its potential for spatial single-cell analysis.
Reach us at info@study.space