A new efficient method for exact genome-wide association analysis using linear mixed models (LMMs) has been developed. This method, called GEMMA, is significantly faster than existing exact methods like EMMA, with computational complexity roughly n times faster, where n is the sample size. GEMMA achieves this by replacing the computationally expensive eigen-decomposition step in EMMA with a matrix and vector multiplication, reducing the per-SNP computational complexity from O(n³) to O(n²). This makes exact genome-wide association analysis computationally feasible for large datasets.
The method requires complete or imputed genotype data for all SNPs and involves only one eigen-decomposition of the relatedness matrix at the beginning. For each SNP tested, it effectively replaces the expensive additional eigen-decomposition step in EMMA with one matrix and vector multiplication. After this, each iteration of the following optimization step requires cheap operations (complexity O(n)) to evaluate both first and second derivatives of the target functions.
GEMMA was tested on two datasets: a mouse GWAS for high-density lipoprotein cholesterol (HDL-C) levels from the Hybrid Mouse Diversity Panel (HMDP) and a human GWAS for Crohn's disease from the Wellcome Trust Case Control Consortium (WTCCC). The results showed that GEMMA is comparable in speed with EMMAX and produces exact test statistics. In the HMDP dataset, GEMMA was 12 times faster than the Lippert et al algorithm, while in the WTCCC dataset, it was 2 times faster.
The study also compared the accuracy of different approximation methods, including EMMAX and GRAMMAR. In the HMDP dataset, EMMAX led to systematic underestimation of p values, while GRAMMAR led to dramatic underestimation. In contrast, in the WTCCC dataset, the p values from EMMAX were very close to the exact values. The results suggest that EMMAX is more accurate than GRAMMAR, even in cases where the sample structure is subtle.
The study also discusses the computational efficiency of different methods, noting that the use of a lower-rank relatedness matrix can reduce computational time and memory requirements but may affect the accuracy of p values. The choice of relatedness matrix can impact both computational and statistical efficiency. The study concludes that GEMMA provides a more accurate and efficient method for genome-wide association analysis using LMMs.A new efficient method for exact genome-wide association analysis using linear mixed models (LMMs) has been developed. This method, called GEMMA, is significantly faster than existing exact methods like EMMA, with computational complexity roughly n times faster, where n is the sample size. GEMMA achieves this by replacing the computationally expensive eigen-decomposition step in EMMA with a matrix and vector multiplication, reducing the per-SNP computational complexity from O(n³) to O(n²). This makes exact genome-wide association analysis computationally feasible for large datasets.
The method requires complete or imputed genotype data for all SNPs and involves only one eigen-decomposition of the relatedness matrix at the beginning. For each SNP tested, it effectively replaces the expensive additional eigen-decomposition step in EMMA with one matrix and vector multiplication. After this, each iteration of the following optimization step requires cheap operations (complexity O(n)) to evaluate both first and second derivatives of the target functions.
GEMMA was tested on two datasets: a mouse GWAS for high-density lipoprotein cholesterol (HDL-C) levels from the Hybrid Mouse Diversity Panel (HMDP) and a human GWAS for Crohn's disease from the Wellcome Trust Case Control Consortium (WTCCC). The results showed that GEMMA is comparable in speed with EMMAX and produces exact test statistics. In the HMDP dataset, GEMMA was 12 times faster than the Lippert et al algorithm, while in the WTCCC dataset, it was 2 times faster.
The study also compared the accuracy of different approximation methods, including EMMAX and GRAMMAR. In the HMDP dataset, EMMAX led to systematic underestimation of p values, while GRAMMAR led to dramatic underestimation. In contrast, in the WTCCC dataset, the p values from EMMAX were very close to the exact values. The results suggest that EMMAX is more accurate than GRAMMAR, even in cases where the sample structure is subtle.
The study also discusses the computational efficiency of different methods, noting that the use of a lower-rank relatedness matrix can reduce computational time and memory requirements but may affect the accuracy of p values. The choice of relatedness matrix can impact both computational and statistical efficiency. The study concludes that GEMMA provides a more accurate and efficient method for genome-wide association analysis using LMMs.