April 11, 2016 | Aaron T. L. Lun, Karsten Bach and John C. Marioni
The supplementary materials provide a detailed exploration of the methods and simulations used to validate the performance of the deconvolution method for normalizing single-cell RNA sequencing data with many zero counts. The key points include:
1. **Justifying the Choice of Pooling Strategy**: The performance of the ring arrangement and sliding window methods is compared to random pools of cells. Simulations show that the ring arrangement provides a modest improvement in estimation precision, with a median absolute deviation (MAD) of 0.051 compared to 0.079 for random pools.
2. **Resolving Linear Dependencies**: The linear system for deconvolution is analyzed, showing that the addition of equations relating each size factor to its direct estimate ensures identifiability. This approach is shown to be effective even with a large number of cells, as the sum of errors approaches zero, leading to accurate estimation of size factors.
3. **Implementation Details of the Clustering Approach**: The clustering method used for normalization is described, including the use of Spearman's rank correlation coefficient and hierarchical clustering. The baseline pseudo-cell is chosen based on mean library size, and the method is shown to be robust to high-coverage simulations.
4. **Assessing Normalization in High-Coverage Simulations**: High-coverage simulations are conducted to test the performance of deconvolution and existing methods. Results show that deconvolution outperforms existing methods, even in scenarios with high coverage and differential expression (DE) genes.
5. **Computational Complexity of the Deconvolution Method**: The computational complexity of deconvolution is discussed, showing that it is quadratic with respect to the number of cells and cubic with respect to the size of the linear system. Clustering helps mitigate these complexities.
6. **Comparing Normalization Accuracy on Real Data**: A framework is developed to compare the accuracy of different normalization methods using real data. The results show that deconvolution consistently outperforms existing methods in detecting DE genes, suggesting its effectiveness in real-world applications.
Overall, the supplementary materials provide a comprehensive validation of the deconvolution method, demonstrating its robustness and accuracy in handling zero counts and high-coverage data.The supplementary materials provide a detailed exploration of the methods and simulations used to validate the performance of the deconvolution method for normalizing single-cell RNA sequencing data with many zero counts. The key points include:
1. **Justifying the Choice of Pooling Strategy**: The performance of the ring arrangement and sliding window methods is compared to random pools of cells. Simulations show that the ring arrangement provides a modest improvement in estimation precision, with a median absolute deviation (MAD) of 0.051 compared to 0.079 for random pools.
2. **Resolving Linear Dependencies**: The linear system for deconvolution is analyzed, showing that the addition of equations relating each size factor to its direct estimate ensures identifiability. This approach is shown to be effective even with a large number of cells, as the sum of errors approaches zero, leading to accurate estimation of size factors.
3. **Implementation Details of the Clustering Approach**: The clustering method used for normalization is described, including the use of Spearman's rank correlation coefficient and hierarchical clustering. The baseline pseudo-cell is chosen based on mean library size, and the method is shown to be robust to high-coverage simulations.
4. **Assessing Normalization in High-Coverage Simulations**: High-coverage simulations are conducted to test the performance of deconvolution and existing methods. Results show that deconvolution outperforms existing methods, even in scenarios with high coverage and differential expression (DE) genes.
5. **Computational Complexity of the Deconvolution Method**: The computational complexity of deconvolution is discussed, showing that it is quadratic with respect to the number of cells and cubic with respect to the size of the linear system. Clustering helps mitigate these complexities.
6. **Comparing Normalization Accuracy on Real Data**: A framework is developed to compare the accuracy of different normalization methods using real data. The results show that deconvolution consistently outperforms existing methods in detecting DE genes, suggesting its effectiveness in real-world applications.
Overall, the supplementary materials provide a comprehensive validation of the deconvolution method, demonstrating its robustness and accuracy in handling zero counts and high-coverage data.