Fast gene set enrichment analysis

Fast gene set enrichment analysis

February 1, 2021 | Gennady Korotkevich, Vladimir Sukhov, Nikolay Budin, Boris Shpak, Maxim N. Artyomov, and Alexey Sergushichev
Fast Gene Set Enrichment Analysis (FGSEA) is a method for efficiently estimating GSEA P-values for gene sets. The method allows for accurate estimation of very small P-values, even as low as 10^-100, and is significantly faster than traditional implementations. FGSEA consists of two main procedures: FGSEA-simple and FGSEA-multilevel. FGSEA-simple provides a fast, approximate estimation of P-values for multiple gene sets, while FGSEA-multilevel is designed for accurate estimation of extremely low P-values for individual gene sets. FGSEA-simple improves efficiency by reusing random gene set samples across different pathways, reducing the number of required samples. It calculates enrichment scores for all prefixes of a gene set using a square root heuristic, which allows for efficient computation. This approach reduces the time complexity compared to the naive method, achieving a speedup of O(K log K / sqrt(K)). FGSEA-multilevel uses an adaptive multi-level split Monte Carlo scheme to estimate extremely low P-values. It sequentially determines levels of enrichment scores and calculates probabilities for each level, allowing for accurate estimation of P-values even when they are very small. This method is particularly effective for pathways with low P-values, where traditional methods may not be feasible due to computational constraints. The method is validated using a collection of 605 datasets from Gene Expression Omnibus (GEO), demonstrating its ability to recover statistically significant pathways with high accuracy. FGSEA is open-source and available as an R package in Bioconductor and on GitHub. It is also compared with other implementations, showing that FGSEA can estimate P-values as low as 10^-5 with high accuracy and efficiency, and even lower P-values with FGSEA-multilevel. The method is further enhanced by a procedure to filter redundant gene sets, improving the conciseness of results. FGSEA is also compared with the Broad GSEA implementation, showing that it can achieve better sensitivity and detect significant pathways in cases where other methods fail. The algorithm is described in detail, including formal definitions, datasets, and implementation details, making it a powerful tool for gene set enrichment analysis.Fast Gene Set Enrichment Analysis (FGSEA) is a method for efficiently estimating GSEA P-values for gene sets. The method allows for accurate estimation of very small P-values, even as low as 10^-100, and is significantly faster than traditional implementations. FGSEA consists of two main procedures: FGSEA-simple and FGSEA-multilevel. FGSEA-simple provides a fast, approximate estimation of P-values for multiple gene sets, while FGSEA-multilevel is designed for accurate estimation of extremely low P-values for individual gene sets. FGSEA-simple improves efficiency by reusing random gene set samples across different pathways, reducing the number of required samples. It calculates enrichment scores for all prefixes of a gene set using a square root heuristic, which allows for efficient computation. This approach reduces the time complexity compared to the naive method, achieving a speedup of O(K log K / sqrt(K)). FGSEA-multilevel uses an adaptive multi-level split Monte Carlo scheme to estimate extremely low P-values. It sequentially determines levels of enrichment scores and calculates probabilities for each level, allowing for accurate estimation of P-values even when they are very small. This method is particularly effective for pathways with low P-values, where traditional methods may not be feasible due to computational constraints. The method is validated using a collection of 605 datasets from Gene Expression Omnibus (GEO), demonstrating its ability to recover statistically significant pathways with high accuracy. FGSEA is open-source and available as an R package in Bioconductor and on GitHub. It is also compared with other implementations, showing that FGSEA can estimate P-values as low as 10^-5 with high accuracy and efficiency, and even lower P-values with FGSEA-multilevel. The method is further enhanced by a procedure to filter redundant gene sets, improving the conciseness of results. FGSEA is also compared with the Broad GSEA implementation, showing that it can achieve better sensitivity and detect significant pathways in cases where other methods fail. The algorithm is described in detail, including formal definitions, datasets, and implementation details, making it a powerful tool for gene set enrichment analysis.
Reach us at info@study.space
[slides] Fast gene set enrichment analysis | StudySpace