August 2024 | Yanay Rosen, Maria Brbić, Yusuf Roohani, Kyle Swanson, Ziang Li & Jure Leskovec
SATURN is a deep learning method that integrates single-cell RNA-seq datasets across species by combining gene expression with protein embeddings from large protein language models. It enables the creation of universal cell embeddings that capture functional similarities of genes, even when species are evolutionarily distant. SATURN maps cross-species datasets into a shared space of functionally related genes, allowing for the identification of functionally related genes coexpressed across species and redefining differential expression for cross-species analysis. By integrating datasets from three species and frog and zebrafish embryogenesis, SATURN effectively transfers annotations across species and identifies homologous and species-specific cell types. It also reveals potentially divergent gene functions between glaucoma-associated genes in humans and four other species.
SATURN addresses the challenge of cross-species integration by using protein embeddings to represent gene function, which allows for the identification of functionally related genes even when they lack one-to-one homologs. It integrates scRNA-seq datasets by mapping them to a joint low-dimensional embedding space using gene expression and protein representations. SATURN takes as input scRNA-seq count data, protein embeddings, and initial within-species cell annotations. It learns an interpretable feature space shared between multiple species, referred to as a macrogene space, which represents a joint space composed of genes inferred to be functionally related based on their protein embeddings.
SATURN enables multispecies differential expression analysis by performing differential expression on macrogenes, which are groups of genes with similar protein embeddings. This approach allows for the characterization of cell-type-specific macrogenes across different datasets. SATURN's ability to identify differentially expressed genes that lack one-to-one homologs and provide natural gene modules for interpretation makes it superior to existing integration methods. It also reveals functional similarities between genes that are not considered homologs by sequence-based similarity tools.
SATURN outperforms other methods in cross-species integration, achieving high accuracy in transferring cell-type labels between species. It generates cell clusters that reflect conserved cell types across species and facilitates the analysis of protein embeddings by creating multispecies macrogenes. SATURN is scalable and applicable to large-scale cross-species cell atlas datasets. It has important implications for the development of new multi-omic machine learning methods, including those that incorporate protein assay information. However, SATURN requires a reference proteome, which may be missing for some species of interest, and it may not handle smaller cell clusters effectively. Overall, SATURN provides a powerful tool for understanding the conservation and diversification of cell types across species and revealing fundamental evolutionary processes.SATURN is a deep learning method that integrates single-cell RNA-seq datasets across species by combining gene expression with protein embeddings from large protein language models. It enables the creation of universal cell embeddings that capture functional similarities of genes, even when species are evolutionarily distant. SATURN maps cross-species datasets into a shared space of functionally related genes, allowing for the identification of functionally related genes coexpressed across species and redefining differential expression for cross-species analysis. By integrating datasets from three species and frog and zebrafish embryogenesis, SATURN effectively transfers annotations across species and identifies homologous and species-specific cell types. It also reveals potentially divergent gene functions between glaucoma-associated genes in humans and four other species.
SATURN addresses the challenge of cross-species integration by using protein embeddings to represent gene function, which allows for the identification of functionally related genes even when they lack one-to-one homologs. It integrates scRNA-seq datasets by mapping them to a joint low-dimensional embedding space using gene expression and protein representations. SATURN takes as input scRNA-seq count data, protein embeddings, and initial within-species cell annotations. It learns an interpretable feature space shared between multiple species, referred to as a macrogene space, which represents a joint space composed of genes inferred to be functionally related based on their protein embeddings.
SATURN enables multispecies differential expression analysis by performing differential expression on macrogenes, which are groups of genes with similar protein embeddings. This approach allows for the characterization of cell-type-specific macrogenes across different datasets. SATURN's ability to identify differentially expressed genes that lack one-to-one homologs and provide natural gene modules for interpretation makes it superior to existing integration methods. It also reveals functional similarities between genes that are not considered homologs by sequence-based similarity tools.
SATURN outperforms other methods in cross-species integration, achieving high accuracy in transferring cell-type labels between species. It generates cell clusters that reflect conserved cell types across species and facilitates the analysis of protein embeddings by creating multispecies macrogenes. SATURN is scalable and applicable to large-scale cross-species cell atlas datasets. It has important implications for the development of new multi-omic machine learning methods, including those that incorporate protein assay information. However, SATURN requires a reference proteome, which may be missing for some species of interest, and it may not handle smaller cell clusters effectively. Overall, SATURN provides a powerful tool for understanding the conservation and diversification of cell types across species and revealing fundamental evolutionary processes.