[slides and audio] An empirical Bayes approach to inferring large-scale gene association networks

This paper presents an empirical Bayes approach for inferring large-scale gene association networks from microarray data. The method is designed to address the challenges of small-sample inference in graphical models, where the number of genes (G) is much larger than the number of samples (N). The approach is based on three key components: (1) improved small-sample estimates of partial correlations, (2) an exact test for edge inclusion with adaptive estimation of degrees of freedom, and (3) a heuristic network search using the false discovery rate (FDR) multiple testing framework. These steps correspond to an empirical Bayes estimate of the network topology. The method is applied to simulated data and real-world breast cancer gene expression data. In simulations, the approach successfully recovers the true network topology with high accuracy even for small-sample datasets. When applied to real data, it infers a large-scale gene association network for 3883 genes. The method is implemented in the R package 'GeneTS', which is freely available for use. The paper discusses the statistical properties of the proposed framework, including the accuracy and power of network selection. It also highlights the advantages and potential drawbacks of the approach, and suggests that the estimator $\hat{\Pi}^{2}$ is particularly suitable for small-sample gene expression data. The results show that the method performs well in terms of sensitivity and specificity, and that it can effectively identify statistically significant edges in the network. The paper also discusses the biological implications of the inferred network, including the role of certain genes in cancer biology.This paper presents an empirical Bayes approach for inferring large-scale gene association networks from microarray data. The method is designed to address the challenges of small-sample inference in graphical models, where the number of genes (G) is much larger than the number of samples (N). The approach is based on three key components: (1) improved small-sample estimates of partial correlations, (2) an exact test for edge inclusion with adaptive estimation of degrees of freedom, and (3) a heuristic network search using the false discovery rate (FDR) multiple testing framework. These steps correspond to an empirical Bayes estimate of the network topology. The method is applied to simulated data and real-world breast cancer gene expression data. In simulations, the approach successfully recovers the true network topology with high accuracy even for small-sample datasets. When applied to real data, it infers a large-scale gene association network for 3883 genes. The method is implemented in the R package 'GeneTS', which is freely available for use. The paper discusses the statistical properties of the proposed framework, including the accuracy and power of network selection. It also highlights the advantages and potential drawbacks of the approach, and suggests that the estimator $\hat{\Pi}^{2}$ is particularly suitable for small-sample gene expression data. The results show that the method performs well in terms of sensitivity and specificity, and that it can effectively identify statistically significant edges in the network. The paper also discusses the biological implications of the inferred network, including the role of certain genes in cancer biology.

An empirical Bayes approach to inferring large-scale gene association networks

October 12, 2004 | Juliane Schäfer and Korbinian Strimmer