Understanding SCANPY%3A large-scale single-cell gene expression data analysis

SCANPY is a scalable toolkit for analyzing large-scale single-cell gene expression data, offering methods for preprocessing, visualization, clustering, pseudotime and trajectory inference, differential expression testing, and simulation of gene regulatory networks. It is implemented in Python, making it efficient for datasets with over one million cells. Along with SCANPY, the authors introduce ANNDATA, a generic class for handling annotated data matrices. SCANPY integrates established R-based frameworks into a scalable and modular form, providing similar analysis capabilities but with significantly improved speed and efficiency. Benchmarking against existing tools like CELL RANGER, SCANPY demonstrates speedups of 5 to 16 times for datasets of 68,579 peripheral blood mononuclear cells (PBMCs). It can also analyze 1.3 million cells without subsampling in a few hours on a small computing server. SCANPY's modular design allows for easy addition of new functionalities and interoperability with advanced machine-learning packages like TENSORFLOW. The toolkit includes efficient implementations for graph analysis and neighborhood relation computation, making it suitable for large-scale data analysis. The authors highlight the importance of SCANPY's scalability in addressing the growing need for handling larger datasets in various experimental setups, such as the Human Cell Atlas project.SCANPY is a scalable toolkit for analyzing large-scale single-cell gene expression data, offering methods for preprocessing, visualization, clustering, pseudotime and trajectory inference, differential expression testing, and simulation of gene regulatory networks. It is implemented in Python, making it efficient for datasets with over one million cells. Along with SCANPY, the authors introduce ANNDATA, a generic class for handling annotated data matrices. SCANPY integrates established R-based frameworks into a scalable and modular form, providing similar analysis capabilities but with significantly improved speed and efficiency. Benchmarking against existing tools like CELL RANGER, SCANPY demonstrates speedups of 5 to 16 times for datasets of 68,579 peripheral blood mononuclear cells (PBMCs). It can also analyze 1.3 million cells without subsampling in a few hours on a small computing server. SCANPY's modular design allows for easy addition of new functionalities and interoperability with advanced machine-learning packages like TENSORFLOW. The toolkit includes efficient implementations for graph analysis and neighborhood relation computation, making it suitable for large-scale data analysis. The authors highlight the importance of SCANPY's scalability in addressing the growing need for handling larger datasets in various experimental setups, such as the Human Cell Atlas project.

SCANPY: large-scale single-cell gene expression data analysis

2018 | F. Alexander Wolf, Philipp Angerer1 and Fabian J. Theis1,2

SCANPY: large-scale single-cell gene expression data analysis

2018 | F. Alexander Wolf*, Philipp Angerer1 and Fabian J. Theis1,2*

2018 | F. Alexander Wolf, Philipp Angerer1 and Fabian J. Theis1,2