2010 | Matthew D Young, Matthew J Wakefield, Gordon K Smyth and Alicia Oshlack
GOseq is a method for performing Gene Ontology (GO) analysis on RNA-seq data. GO analysis is used to identify biological processes in genome-wide expression studies, but standard methods are biased due to over-detection of differential expression for long and highly expressed transcripts. GOseq corrects for this selection bias, leading to more accurate results. The method involves identifying differentially expressed genes, quantifying the likelihood of differential expression based on transcript length, and incorporating this into the statistical test for category significance. This approach accounts for the fact that longer or more highly expressed genes are more likely to be detected as differentially expressed. GOseq was tested on a prostate cancer dataset, showing results more consistent with known biology. The method also uses a Wallenius approximation to improve computational efficiency. Comparisons with microarray data showed that GOseq produces more reliable results. The software is freely available and can be used for both RNA-seq and microarray data. The method addresses the issue of selection bias in RNA-seq data, providing a more accurate analysis of gene ontology categories.GOseq is a method for performing Gene Ontology (GO) analysis on RNA-seq data. GO analysis is used to identify biological processes in genome-wide expression studies, but standard methods are biased due to over-detection of differential expression for long and highly expressed transcripts. GOseq corrects for this selection bias, leading to more accurate results. The method involves identifying differentially expressed genes, quantifying the likelihood of differential expression based on transcript length, and incorporating this into the statistical test for category significance. This approach accounts for the fact that longer or more highly expressed genes are more likely to be detected as differentially expressed. GOseq was tested on a prostate cancer dataset, showing results more consistent with known biology. The method also uses a Wallenius approximation to improve computational efficiency. Comparisons with microarray data showed that GOseq produces more reliable results. The software is freely available and can be used for both RNA-seq and microarray data. The method addresses the issue of selection bias in RNA-seq data, providing a more accurate analysis of gene ontology categories.