Bias due to participant overlap in two-sample Mendelian randomization

Bias due to participant overlap in two-sample Mendelian randomization

14 September 2016 | Stephen Burgess | Neil M. Davies | Simon G. Thompson
This paper investigates the bias in two-sample Mendelian randomization (MR) due to participant overlap between datasets. It shows that in a two-sample MR analysis, the bias is linearly related to the proportion of sample overlap between the two datasets. When using genetic consortia with partially overlapping participants, the direction and extent of bias are uncertain. The study uses simulation studies to assess the magnitude of bias and Type 1 error rate inflation. For a continuous outcome, bias due to sample overlap is a linear function of the proportion of overlap. In a case-control setting, if risk factor measurements are only included for control participants, unbiased estimates are obtained even in a one-sample setting. However, if risk factor data on both control and case participants are used, bias is similar to that with a continuous outcome. The paper recommends that consortia releasing publicly available data on genetic variants should exclude case participants from case-control samples to reduce bias. The study also discusses the implications of weak instrument bias, the use of summarized data, and the potential for bias in the MR-Egger method. It concludes that in two-sample MR analyses, bias due to sample overlap is less severe than in one-sample analyses, and that the use of summarized data can help mitigate this bias. The paper provides analytical formulae for estimating bias and Type 1 error rates under the null hypothesis, and recommends that consortia should avoid including case participants in case-control studies when using summarized data.This paper investigates the bias in two-sample Mendelian randomization (MR) due to participant overlap between datasets. It shows that in a two-sample MR analysis, the bias is linearly related to the proportion of sample overlap between the two datasets. When using genetic consortia with partially overlapping participants, the direction and extent of bias are uncertain. The study uses simulation studies to assess the magnitude of bias and Type 1 error rate inflation. For a continuous outcome, bias due to sample overlap is a linear function of the proportion of overlap. In a case-control setting, if risk factor measurements are only included for control participants, unbiased estimates are obtained even in a one-sample setting. However, if risk factor data on both control and case participants are used, bias is similar to that with a continuous outcome. The paper recommends that consortia releasing publicly available data on genetic variants should exclude case participants from case-control samples to reduce bias. The study also discusses the implications of weak instrument bias, the use of summarized data, and the potential for bias in the MR-Egger method. It concludes that in two-sample MR analyses, bias due to sample overlap is less severe than in one-sample analyses, and that the use of summarized data can help mitigate this bias. The paper provides analytical formulae for estimating bias and Type 1 error rates under the null hypothesis, and recommends that consortia should avoid including case participants in case-control studies when using summarized data.
Reach us at info@study.space