A comparison of methods for differential expression analysis of RNA-seq data

A comparison of methods for differential expression analysis of RNA-seq data

2013 | Charlotte Soneson and Mauro Delorenzi
This study compares eleven methods for differential expression analysis of RNA-seq data. The methods are evaluated using both simulated and real RNA-seq data. The results show that small sample sizes, which are common in RNA-seq experiments, pose challenges for all methods, and results should be interpreted with caution. For larger sample sizes, methods combining a variance-stabilizing transformation with the 'limma' method for differential expression analysis perform well under many conditions, as does the nonparametric SAMseq method. The study also highlights the importance of normalization and the impact of different experimental designs on the performance of the methods. The results indicate that the choice of method depends on the experimental conditions, and no single method is optimal under all circumstances. The study concludes that the methods based on a variance-stabilizing transformation combined with limma (voom+limma and vst+limma) perform well under many conditions, are relatively unaffected by outliers, and are computationally fast. However, they require at least 3 samples per condition to have sufficient power to detect differentially expressed genes. The nonparametric SAMseq method, which performed well for large sample sizes, requires at least 4-5 samples per condition to have sufficient power to find differentially expressed genes. For highly expressed genes, the fold change required for statistical significance by SAMseq is lower than for many other methods, which can potentially compromise the biological significance of some of the statistically significantly differentially expressed genes. The study also shows that the results can vary significantly depending on the parameters used, and that the recommended parameters are well chosen and often provide the best results. The study emphasizes the need for caution when interpreting results from small sample sizes and highlights the importance of considering the true false discovery rate when interpreting results.This study compares eleven methods for differential expression analysis of RNA-seq data. The methods are evaluated using both simulated and real RNA-seq data. The results show that small sample sizes, which are common in RNA-seq experiments, pose challenges for all methods, and results should be interpreted with caution. For larger sample sizes, methods combining a variance-stabilizing transformation with the 'limma' method for differential expression analysis perform well under many conditions, as does the nonparametric SAMseq method. The study also highlights the importance of normalization and the impact of different experimental designs on the performance of the methods. The results indicate that the choice of method depends on the experimental conditions, and no single method is optimal under all circumstances. The study concludes that the methods based on a variance-stabilizing transformation combined with limma (voom+limma and vst+limma) perform well under many conditions, are relatively unaffected by outliers, and are computationally fast. However, they require at least 3 samples per condition to have sufficient power to detect differentially expressed genes. The nonparametric SAMseq method, which performed well for large sample sizes, requires at least 4-5 samples per condition to have sufficient power to find differentially expressed genes. For highly expressed genes, the fold change required for statistical significance by SAMseq is lower than for many other methods, which can potentially compromise the biological significance of some of the statistically significantly differentially expressed genes. The study also shows that the results can vary significantly depending on the parameters used, and that the recommended parameters are well chosen and often provide the best results. The study emphasizes the need for caution when interpreting results from small sample sizes and highlights the importance of considering the true false discovery rate when interpreting results.
Reach us at info@study.space