Current best practices in single-cell RNA-seq analysis: a tutorial

Current best practices in single-cell RNA-seq analysis: a tutorial

2019 | Malte D Luecken & Fabian J Theis
This review provides a comprehensive guide to best practices in single-cell RNA-seq (scRNA-seq) analysis. The authors detail the steps involved in a typical scRNA-seq workflow, including pre-processing (quality control, normalization, data correction, feature selection, and dimensionality reduction) and downstream analysis at the cell and gene levels. Based on independent comparison studies, they formulate current best practices for these steps and integrate them into a workflow applied to a public dataset. The tutorial serves as a guide for newcomers to the field and helps established users update their analysis pipelines. Single-cell RNA-seq has enabled unprecedented resolution in studying gene expression. However, the field is still in its early stages, leading to a lack of standardization. The number of analysis tools has grown significantly, and dataset sizes have increased, making it challenging to navigate the landscape. The authors address challenges such as the diversity of programming languages used for analysis tools and the need for standardized workflows. The tutorial outlines current best practices for scRNA-seq analysis, independent of programming language. It covers pre-processing steps, including quality control, normalization, and data correction, as well as downstream analysis techniques. The authors emphasize the importance of considering multiple QC covariates together and using permissive thresholds to avoid filtering out viable cells. They also discuss normalization methods, including count depth scaling and more advanced techniques like Scran, and highlight the importance of log transformation for downstream analysis. Data correction and integration are also discussed, with a focus on regressing out biological and technical effects. The authors recommend using ComBat for batch correction and highlight the importance of distinguishing between batch correction and data integration. They also address expression recovery, noting the challenges of denoising and the potential for false correlation signals. Feature selection, dimensionality reduction, and visualization are covered, with recommendations for methods like PCA, t-SNE, UMAP, and diffusion maps. The authors emphasize the importance of using appropriate methods for different downstream applications and highlight the benefits of UMAP for exploratory visualization. The tutorial concludes with a summary of the stages of pre-processed data, emphasizing the importance of measured, corrected, and reduced data layers for different downstream applications. The authors stress the need for careful consideration of data correction methods and the importance of visual and statistical comparisons on different data layers. Overall, the review provides a comprehensive guide to best practices in scRNA-seq analysis, helping users navigate the complexities of the field.This review provides a comprehensive guide to best practices in single-cell RNA-seq (scRNA-seq) analysis. The authors detail the steps involved in a typical scRNA-seq workflow, including pre-processing (quality control, normalization, data correction, feature selection, and dimensionality reduction) and downstream analysis at the cell and gene levels. Based on independent comparison studies, they formulate current best practices for these steps and integrate them into a workflow applied to a public dataset. The tutorial serves as a guide for newcomers to the field and helps established users update their analysis pipelines. Single-cell RNA-seq has enabled unprecedented resolution in studying gene expression. However, the field is still in its early stages, leading to a lack of standardization. The number of analysis tools has grown significantly, and dataset sizes have increased, making it challenging to navigate the landscape. The authors address challenges such as the diversity of programming languages used for analysis tools and the need for standardized workflows. The tutorial outlines current best practices for scRNA-seq analysis, independent of programming language. It covers pre-processing steps, including quality control, normalization, and data correction, as well as downstream analysis techniques. The authors emphasize the importance of considering multiple QC covariates together and using permissive thresholds to avoid filtering out viable cells. They also discuss normalization methods, including count depth scaling and more advanced techniques like Scran, and highlight the importance of log transformation for downstream analysis. Data correction and integration are also discussed, with a focus on regressing out biological and technical effects. The authors recommend using ComBat for batch correction and highlight the importance of distinguishing between batch correction and data integration. They also address expression recovery, noting the challenges of denoising and the potential for false correlation signals. Feature selection, dimensionality reduction, and visualization are covered, with recommendations for methods like PCA, t-SNE, UMAP, and diffusion maps. The authors emphasize the importance of using appropriate methods for different downstream applications and highlight the benefits of UMAP for exploratory visualization. The tutorial concludes with a summary of the stages of pre-processed data, emphasizing the importance of measured, corrected, and reduced data layers for different downstream applications. The authors stress the need for careful consideration of data correction methods and the importance of visual and statistical comparisons on different data layers. Overall, the review provides a comprehensive guide to best practices in scRNA-seq analysis, helping users navigate the complexities of the field.
Reach us at info@study.space