2010 September | Carl A. Anderson, Fredrik H Pettersson, Geraldine M Clarke, Lon R Cardon, Andrew P. Morris, and Krina T. Zondervan
This protocol outlines data quality control (QC) steps for case-control association studies. Key steps include identifying and removing DNA samples and markers that introduce bias, ensuring accurate genotype calling, and assessing relatedness between individuals. Tools like PLINK and SMARTPCA are used for per-individual and per-SNP QC, including checking for Hardy-Weinberg equilibrium, missing genotype rates, and population stratification. The protocol emphasizes the importance of QC before statistical testing to reduce false positives and negatives. For genome-wide association (GWA) studies, QC involves checking for discordant sex information, high missing genotype rates, related individuals, and divergent ancestry. For candidate gene studies, similar QC steps are applied but with fewer SNPs. The protocol also discusses the impact of population stratification and the need for careful QC to avoid bias. The process is estimated to take about 8 hours, with detailed steps for creating BED files, identifying problematic individuals and markers, and removing them to ensure accurate results. The protocol is illustrated with simulated datasets and provides software and data resources for implementation.This protocol outlines data quality control (QC) steps for case-control association studies. Key steps include identifying and removing DNA samples and markers that introduce bias, ensuring accurate genotype calling, and assessing relatedness between individuals. Tools like PLINK and SMARTPCA are used for per-individual and per-SNP QC, including checking for Hardy-Weinberg equilibrium, missing genotype rates, and population stratification. The protocol emphasizes the importance of QC before statistical testing to reduce false positives and negatives. For genome-wide association (GWA) studies, QC involves checking for discordant sex information, high missing genotype rates, related individuals, and divergent ancestry. For candidate gene studies, similar QC steps are applied but with fewer SNPs. The protocol also discusses the impact of population stratification and the need for careful QC to avoid bias. The process is estimated to take about 8 hours, with detailed steps for creating BED files, identifying problematic individuals and markers, and removing them to ensure accurate results. The protocol is illustrated with simulated datasets and provides software and data resources for implementation.