Fast, sensitive, and accurate integration of single cell data with Harmony

Fast, sensitive, and accurate integration of single cell data with Harmony

2019 December | Ilya Korsunsky, Nghia Millard, Jean Fan, Kamil Slowikowski, Fan Zhang, Kevin Wei, Yuriy Baglaenko, Michael Brenner, Po-ru Loh, Soumya Raychaudhuri
Harmony is an algorithm for integrating single-cell RNA sequencing (scRNAseq) data across multiple datasets. It addresses key challenges in unsupervised integration, including scalability, identification of both broad and fine-grained subpopulations, flexibility in experimental design, and cross-modality integration. Harmony projects cells into a shared embedding where they group by cell type rather than dataset-specific conditions. It accounts for multiple experimental and biological factors and is the only currently available algorithm that makes integration of up to 10^6 cells feasible on a personal computer. Harmony was tested on various datasets, including PBMCs, pancreatic islet cells, mouse embryogenesis, and cross-modality spatial integration. It demonstrated superior performance compared to existing algorithms, requiring fewer computational resources. It effectively integrates datasets with large experimental differences and identifies rare cell subtypes, such as ER stress cells in pancreatic islets. Harmony also successfully integrates dissociated scRNAseq data with spatially resolved datasets, enabling the inference of gene expression patterns in unmeasured genes. Harmony's performance was evaluated using metrics like the Local Inverse Simpson's Index (LISI), which quantifies integration and accuracy. It showed improved integration and accuracy compared to other methods. Harmony was able to identify previously unknown cell subtypes, such as rare ER stress cells in pancreatic islets, and preserved the structure of developmental trajectories in mouse hematopoiesis. Harmony integrates data across different modalities and technologies, even when there is limited overlap in gene expression. It effectively models both terminal populations and transition states, retaining smooth transitions and bifurcation events. Harmony is efficient, requiring only 7.2 GB of memory to integrate 500,000 cells, and is currently the only algorithm that enables integration of large datasets on personal computers. Harmony is available as an R package on GitHub and can be used for standalone and Seurat pipeline analyses. It is designed to handle complex experimental designs and is robust to parameter choices, particularly the diversity penalty. Harmony is also effective in identifying rare populations and maintaining high-quality results even when datasets are imbalanced or have non-overlapping cell types. Harmony's ability to integrate data across different modalities and technologies is crucial for understanding complex biological interactions. It enables the analysis of spatial patterns of gene expression and the identification of new transcription factors associated with specific cell types. Harmony's framework provides a foundation for future applications, including the modeling of gene counts and the mapping of cells to large reference datasets.Harmony is an algorithm for integrating single-cell RNA sequencing (scRNAseq) data across multiple datasets. It addresses key challenges in unsupervised integration, including scalability, identification of both broad and fine-grained subpopulations, flexibility in experimental design, and cross-modality integration. Harmony projects cells into a shared embedding where they group by cell type rather than dataset-specific conditions. It accounts for multiple experimental and biological factors and is the only currently available algorithm that makes integration of up to 10^6 cells feasible on a personal computer. Harmony was tested on various datasets, including PBMCs, pancreatic islet cells, mouse embryogenesis, and cross-modality spatial integration. It demonstrated superior performance compared to existing algorithms, requiring fewer computational resources. It effectively integrates datasets with large experimental differences and identifies rare cell subtypes, such as ER stress cells in pancreatic islets. Harmony also successfully integrates dissociated scRNAseq data with spatially resolved datasets, enabling the inference of gene expression patterns in unmeasured genes. Harmony's performance was evaluated using metrics like the Local Inverse Simpson's Index (LISI), which quantifies integration and accuracy. It showed improved integration and accuracy compared to other methods. Harmony was able to identify previously unknown cell subtypes, such as rare ER stress cells in pancreatic islets, and preserved the structure of developmental trajectories in mouse hematopoiesis. Harmony integrates data across different modalities and technologies, even when there is limited overlap in gene expression. It effectively models both terminal populations and transition states, retaining smooth transitions and bifurcation events. Harmony is efficient, requiring only 7.2 GB of memory to integrate 500,000 cells, and is currently the only algorithm that enables integration of large datasets on personal computers. Harmony is available as an R package on GitHub and can be used for standalone and Seurat pipeline analyses. It is designed to handle complex experimental designs and is robust to parameter choices, particularly the diversity penalty. Harmony is also effective in identifying rare populations and maintaining high-quality results even when datasets are imbalanced or have non-overlapping cell types. Harmony's ability to integrate data across different modalities and technologies is crucial for understanding complex biological interactions. It enables the analysis of spatial patterns of gene expression and the identification of new transcription factors associated with specific cell types. Harmony's framework provides a foundation for future applications, including the modeling of gene counts and the mapping of cells to large reference datasets.
Reach us at info@study.space
[slides] Fast%2C sensitive%2C and accurate integration of single cell data with Harmony | StudySpace