Understanding Mendelian Randomization Analysis With Multiple Genetic Variants Using Summarized Data

This paper presents methods for using summarized genetic data in Mendelian randomization (MR) analyses to estimate the causal effect of a risk factor on an outcome. The study compares the performance of MR methods using summarized data (from genome-wide association studies, GWAS) with those using individual-level data. It investigates the impact of gene-gene interactions, linkage disequilibrium, and weak instruments on the estimates. The study demonstrates that combining ratio estimates from multiple genetic variants using inverse-variance weighted (IVW) methods or likelihood-based approaches can yield similar results to two-stage least squares (2SLS) methods using individual-level data, even in the presence of gene-gene interactions. However, these methods overstate precision when variants are in linkage disequilibrium. Weak instrument bias is small if the P-value in a linear regression of the risk factor for each variant is less than 1 × 10⁻⁵. The study applies these methods to estimate the causal association of low-density lipoprotein cholesterol (LDL-C) on coronary artery disease (CAD) using published data on five genetic variants. A 30% reduction in LDL-C is estimated to reduce CAD risk by 67% (95% CI: 54% to 76%). The paper concludes that MR analyses using summarized data from uncorrelated variants are similarly efficient to those using individual-level data, although the necessary assumptions cannot be fully assessed. The likelihood-based method is recommended for applied analysis of summarized data, but analyses should be restricted to uncorrelated genetic variants (no linkage disequilibrium) to avoid potential weak instrument bias. The study also highlights the importance of checking the validity of instrumental variable assumptions and the potential for bias when using summarized data.This paper presents methods for using summarized genetic data in Mendelian randomization (MR) analyses to estimate the causal effect of a risk factor on an outcome. The study compares the performance of MR methods using summarized data (from genome-wide association studies, GWAS) with those using individual-level data. It investigates the impact of gene-gene interactions, linkage disequilibrium, and weak instruments on the estimates. The study demonstrates that combining ratio estimates from multiple genetic variants using inverse-variance weighted (IVW) methods or likelihood-based approaches can yield similar results to two-stage least squares (2SLS) methods using individual-level data, even in the presence of gene-gene interactions. However, these methods overstate precision when variants are in linkage disequilibrium. Weak instrument bias is small if the P-value in a linear regression of the risk factor for each variant is less than 1 × 10⁻⁵. The study applies these methods to estimate the causal association of low-density lipoprotein cholesterol (LDL-C) on coronary artery disease (CAD) using published data on five genetic variants. A 30% reduction in LDL-C is estimated to reduce CAD risk by 67% (95% CI: 54% to 76%). The paper concludes that MR analyses using summarized data from uncorrelated variants are similarly efficient to those using individual-level data, although the necessary assumptions cannot be fully assessed. The likelihood-based method is recommended for applied analysis of summarized data, but analyses should be restricted to uncorrelated genetic variants (no linkage disequilibrium) to avoid potential weak instrument bias. The study also highlights the importance of checking the validity of instrumental variable assumptions and the potential for bias when using summarized data.

Mendelian Randomization Analysis With Multiple Genetic Variants Using Summarized Data

2013 | Stephen Burgess, Adam Butterworth, and Simon G. Thompson