Understanding DEGseq%3A an R package for identifying differentially expressed genes from RNA-seq data

DEGseq is an R package for identifying differentially expressed genes from RNA-seq data. It integrates three existing methods and introduces two novel methods based on MA-plot to detect and visualize gene expression differences. The package uses uniquely mapped reads from RNA-seq data with gene annotations or expression values from other programs like RPKM. The output includes a text file with expression values, P-values, and Q-values for each gene, and an XHTML summary page with statistical reports. RNA-seq data is modeled as a random sampling process, with read counts following a binomial distribution. Fisher's exact test and likelihood ratio test are used to identify differentially expressed genes. Two novel methods based on MA-plot are introduced: one using a random sampling model and another using technical replicates to estimate noise levels. The random sampling model uses MA-plot to detect expression differences, while the technical replicates method estimates noise by comparing replicates. Multiple testing correction is performed using two strategies, allowing users to set a P-value or FDR threshold. DEGseq also supports comparing two groups of samples using the samr package. It can be applied to identify differential expression of exons or transcript pieces, with users defining their own 'genes' and using UCSC refFlat annotation files. DEGseq supports using raw read counts or normalized expression values like RPKM. For methods based on the random sampling model, raw counts are recommended. The package can also export gene expression values in a table format for use with edgeR. Funding comes from various Chinese research programs, and there are no conflicts of interest. The package is available at http://bioinfo.au.tsinghua.edu.cn/software/degseq.DEGseq is an R package for identifying differentially expressed genes from RNA-seq data. It integrates three existing methods and introduces two novel methods based on MA-plot to detect and visualize gene expression differences. The package uses uniquely mapped reads from RNA-seq data with gene annotations or expression values from other programs like RPKM. The output includes a text file with expression values, P-values, and Q-values for each gene, and an XHTML summary page with statistical reports. RNA-seq data is modeled as a random sampling process, with read counts following a binomial distribution. Fisher's exact test and likelihood ratio test are used to identify differentially expressed genes. Two novel methods based on MA-plot are introduced: one using a random sampling model and another using technical replicates to estimate noise levels. The random sampling model uses MA-plot to detect expression differences, while the technical replicates method estimates noise by comparing replicates. Multiple testing correction is performed using two strategies, allowing users to set a P-value or FDR threshold. DEGseq also supports comparing two groups of samples using the samr package. It can be applied to identify differential expression of exons or transcript pieces, with users defining their own 'genes' and using UCSC refFlat annotation files. DEGseq supports using raw read counts or normalized expression values like RPKM. For methods based on the random sampling model, raw counts are recommended. The package can also export gene expression values in a table format for use with edgeR. Funding comes from various Chinese research programs, and there are no conflicts of interest. The package is available at http://bioinfo.au.tsinghua.edu.cn/software/degseq.

DEGseq: an R package for identifying differentially expressed genes from RNA-seq data

October 24, 2009 | Likun Wang, Zhixing Feng, Xi Wang, Xiaowo Wang, Xuegong Zhang