Bayesian approach to single-cell differential expression analysis

Bayesian approach to single-cell differential expression analysis

2014 July | Peter V. Kharchenko1,2,3,*, Lev Silberstein3,4,5, and David T. Scadden3,4,5
A Bayesian approach is introduced for analyzing differential expression in single-cell RNA sequencing data. Single-cell RNA sequencing allows for the detailed analysis of cellular states but is affected by high technical noise and biological variability. The authors propose a probabilistic model to account for expression magnitude distortions, enabling more accurate detection of differential expression signatures and identification of cell subpopulations. Single-cell RNA sequencing has enabled large-scale analysis of individual cell transcriptional states, but challenges arise due to technical noise and biological variability. The authors model single-cell measurements as a mixture of successful amplification and detection failure events, which helps in identifying differential expression signatures and distinguishing subpopulations. This approach is particularly effective in datasets such as the 92-cell mouse embryonic fibroblast (MEF) embryonic stem cell (ES) study and early mouse embryo data. Comparisons of single-cell RNA-seq data show higher variability than bulk RNA-seq measurements, with notable outliers and drop-out events. Standard RNA-seq methods struggle with this variability, leading to inconsistent results. The authors address this by modeling the measurement of each cell as a mixture of two probabilistic processes: one for successful amplification and detection, and one for drop-out events. The first process is modeled using a negative binomial distribution, while the second uses a low-magnitude Poisson process to account for background signal. To fit the error model, the authors use a subset of genes with reliable expression estimates. They analyze pairs of cells from the same subpopulation and determine the expected expression magnitude. This information is used to fit the parameters of the negative binomial distribution and the drop-out rate dependency on expression magnitude. Logistic regression is used to approximate this dependency. The Bayesian approach is applied for differential expression analysis, estimating the likelihood of gene expression in different subpopulations and fold changes between them. This method outperforms traditional RNA-seq methods in sensitivity, particularly for genes expressed at higher magnitudes in ES cells. The approach also improves the classification of cell types by accounting for drop-out events in similarity measures. The study highlights the importance of probabilistic models in handling the high variability of single-cell data, enabling more accurate and reliable analysis of cellular states. The methods are validated using benchmark datasets and show improved performance in detecting differentially expressed genes and classifying cell types.A Bayesian approach is introduced for analyzing differential expression in single-cell RNA sequencing data. Single-cell RNA sequencing allows for the detailed analysis of cellular states but is affected by high technical noise and biological variability. The authors propose a probabilistic model to account for expression magnitude distortions, enabling more accurate detection of differential expression signatures and identification of cell subpopulations. Single-cell RNA sequencing has enabled large-scale analysis of individual cell transcriptional states, but challenges arise due to technical noise and biological variability. The authors model single-cell measurements as a mixture of successful amplification and detection failure events, which helps in identifying differential expression signatures and distinguishing subpopulations. This approach is particularly effective in datasets such as the 92-cell mouse embryonic fibroblast (MEF) embryonic stem cell (ES) study and early mouse embryo data. Comparisons of single-cell RNA-seq data show higher variability than bulk RNA-seq measurements, with notable outliers and drop-out events. Standard RNA-seq methods struggle with this variability, leading to inconsistent results. The authors address this by modeling the measurement of each cell as a mixture of two probabilistic processes: one for successful amplification and detection, and one for drop-out events. The first process is modeled using a negative binomial distribution, while the second uses a low-magnitude Poisson process to account for background signal. To fit the error model, the authors use a subset of genes with reliable expression estimates. They analyze pairs of cells from the same subpopulation and determine the expected expression magnitude. This information is used to fit the parameters of the negative binomial distribution and the drop-out rate dependency on expression magnitude. Logistic regression is used to approximate this dependency. The Bayesian approach is applied for differential expression analysis, estimating the likelihood of gene expression in different subpopulations and fold changes between them. This method outperforms traditional RNA-seq methods in sensitivity, particularly for genes expressed at higher magnitudes in ES cells. The approach also improves the classification of cell types by accounting for drop-out events in similarity measures. The study highlights the importance of probabilistic models in handling the high variability of single-cell data, enabling more accurate and reliable analysis of cellular states. The methods are validated using benchmark datasets and show improved performance in detecting differentially expressed genes and classifying cell types.
Reach us at info@study.space
[slides] Bayesian approach to single-cell differential expression analysis | StudySpace