[slides and audio] Scrublet%3A computational identification of cell doublets in single-cell transcriptomic data

Scrublet is a computational method for identifying cell doublets in single-cell transcriptomic data. It uses a nearest neighbor classifier to simulate doublets from the data and predict their impact on downstream analysis. The method is validated on datasets with known doublet information and is freely available for download. The approach avoids the need for expert knowledge or cell clustering by simulating doublets and building a classifier based on their relative densities. Scrublet identifies neotypic doublets, which generate new features in single-cell data, and embedded doublets, which are less impactful. The method estimates the fraction of detectable doublets and provides a score for each transcriptome to classify doublets. It also calculates a standard error for the score and a binary label for each cell indicating neotypic doublets. Scrublet performs well on simulated and real datasets, accurately identifying doublets and improving the quality of downstream analyses by removing artifactual states. The method is sensitive to the structure of the single-cell state manifold and has limitations when assumptions are violated, such as when cell aggregate doublets cannot be detected. Overall, Scrublet provides a useful tool for estimating the impact of doublets on downstream hypothesis generation and identifying bona fide doublet states for exclusion.Scrublet is a computational method for identifying cell doublets in single-cell transcriptomic data. It uses a nearest neighbor classifier to simulate doublets from the data and predict their impact on downstream analysis. The method is validated on datasets with known doublet information and is freely available for download. The approach avoids the need for expert knowledge or cell clustering by simulating doublets and building a classifier based on their relative densities. Scrublet identifies neotypic doublets, which generate new features in single-cell data, and embedded doublets, which are less impactful. The method estimates the fraction of detectable doublets and provides a score for each transcriptome to classify doublets. It also calculates a standard error for the score and a binary label for each cell indicating neotypic doublets. Scrublet performs well on simulated and real datasets, accurately identifying doublets and improving the quality of downstream analyses by removing artifactual states. The method is sensitive to the structure of the single-cell state manifold and has limitations when assumptions are violated, such as when cell aggregate doublets cannot be detected. Overall, Scrublet provides a useful tool for estimating the impact of doublets on downstream hypothesis generation and identifying bona fide doublet states for exclusion.

Scrublet: Computational Identification of Cell Doublets in Single-Cell Transcriptomic Data

2019 April 24 | Samuel L. Wolock, Romain Lopez, and Allon M. Klein