February 2025 | Siyu He, Yinuo Jin, Achille Nazaret, Lingting Shi, Xueer Chen, Sham Rampersaud, Bahawar S. Dhillon, Izabella Valdez, Lauren E. Friend, Joy Linyue Fan, Cameron Y. Park, Rachel L. Mintz, Yeh-Hsing Lao, David Carrera, Kaylee W. Fang, Kaleem Mehdi, Madeline Rohde, José L. McFadine-Figueroa, David Blei, Kam W. Leong, Alexander Y. Rudensky, George Plitas & Elham Azizi
Starfysh is a computational tool that integrates spatial transcriptomic (ST) and histologic data to reveal heterogeneous tumor–immune hubs. It uses a deep generative model with archetypal analysis and known cell type markers to characterize cell states without requiring single-cell reference data. Starfysh improves the characterization of spatial dynamics in complex tissues using histology images and enables the comparison of niches as spatial hubs across tissues. Integrative analysis of primary estrogen receptor (ER)-positive breast cancer, triple-negative breast cancer (TNBC), and metaplastic breast cancer (MBC) tissues led to the identification of spatial hubs with patient- and disease-specific cell type compositions and revealed metabolic reprogramming shaping immunosuppressive hubs in aggressive MBC.
In multicellular organisms, the function of diverse cell types is strongly influenced by their surroundings. Uncovering the spatial organization and communication between cell types in tissues provides insight into their development, response to stimuli, adaptations to their microenvironment or transformation into malignant or diseased states. By sampling the entire transcriptome, ST has enabled unbiased gene expression mapping in a spatially resolved manner, providing an opportunity to study the spatial arrangement of cells and microenvironments. These technologies have been employed in diverse fields, including organ development, disease modeling and immunology.
However, sequencing-based methods are limited in cellular resolution due to technical limitations, including artifacts from lateral RNA diffusion. Hence, measurements from capture locations (spots) involve mixtures of multiple cells, leading to analytical challenges in dissecting the cellular disposition, particularly in complex cancerous tissues.
Accurate characterization of cell types and refined states is critical for comparing their spatial organization and communication across tissues. This is essential, for example, when studying changes in cellular wiring during development or disease progression. In tumor tissues, the mixing of signals from patient-specific tumor cells and immune cells hinders the comparison of anti-tumor immune mechanisms between patients or disease subtypes.
Most existing computational methods for analyzing ST data require paired and annotated single-cell data as references to overcome this challenge and are not capable of integrating tissue samples. The references, whether from the same tissue or public databases, could introduce biases without accounting for sample or batch variation and variable cell density across spots. Indeed, using a single-cell atlas reference has been shown to increase deconvolution error compared to reference-free approaches.
Importantly, access to paired single-cell data may not be cost-effective or practical, especially in cases like clinical core biopsies. This limitation further motivates the development of reference-free methods capable of integrating prior knowledge of cell type markers and data from multiple tissues to improve statistical power. Reference-free methods including STdeconvolve, Smoother and CARD deconvolve spots into latent factors. However, some factors cannot be explicitly mapped to refined cell states in complex tissues. Additionally, these methods are not scalable and do not allow the integration of multiple ST datasets. Batch correction methods designed for single-cell RNA sequencing (scRNA-seqStarfysh is a computational tool that integrates spatial transcriptomic (ST) and histologic data to reveal heterogeneous tumor–immune hubs. It uses a deep generative model with archetypal analysis and known cell type markers to characterize cell states without requiring single-cell reference data. Starfysh improves the characterization of spatial dynamics in complex tissues using histology images and enables the comparison of niches as spatial hubs across tissues. Integrative analysis of primary estrogen receptor (ER)-positive breast cancer, triple-negative breast cancer (TNBC), and metaplastic breast cancer (MBC) tissues led to the identification of spatial hubs with patient- and disease-specific cell type compositions and revealed metabolic reprogramming shaping immunosuppressive hubs in aggressive MBC.
In multicellular organisms, the function of diverse cell types is strongly influenced by their surroundings. Uncovering the spatial organization and communication between cell types in tissues provides insight into their development, response to stimuli, adaptations to their microenvironment or transformation into malignant or diseased states. By sampling the entire transcriptome, ST has enabled unbiased gene expression mapping in a spatially resolved manner, providing an opportunity to study the spatial arrangement of cells and microenvironments. These technologies have been employed in diverse fields, including organ development, disease modeling and immunology.
However, sequencing-based methods are limited in cellular resolution due to technical limitations, including artifacts from lateral RNA diffusion. Hence, measurements from capture locations (spots) involve mixtures of multiple cells, leading to analytical challenges in dissecting the cellular disposition, particularly in complex cancerous tissues.
Accurate characterization of cell types and refined states is critical for comparing their spatial organization and communication across tissues. This is essential, for example, when studying changes in cellular wiring during development or disease progression. In tumor tissues, the mixing of signals from patient-specific tumor cells and immune cells hinders the comparison of anti-tumor immune mechanisms between patients or disease subtypes.
Most existing computational methods for analyzing ST data require paired and annotated single-cell data as references to overcome this challenge and are not capable of integrating tissue samples. The references, whether from the same tissue or public databases, could introduce biases without accounting for sample or batch variation and variable cell density across spots. Indeed, using a single-cell atlas reference has been shown to increase deconvolution error compared to reference-free approaches.
Importantly, access to paired single-cell data may not be cost-effective or practical, especially in cases like clinical core biopsies. This limitation further motivates the development of reference-free methods capable of integrating prior knowledge of cell type markers and data from multiple tissues to improve statistical power. Reference-free methods including STdeconvolve, Smoother and CARD deconvolve spots into latent factors. However, some factors cannot be explicitly mapped to refined cell states in complex tissues. Additionally, these methods are not scalable and do not allow the integration of multiple ST datasets. Batch correction methods designed for single-cell RNA sequencing (scRNA-seq