The supplementary methods section of the article provides a detailed overview of the genome-wide association study (GWAS) conducted to identify genetic loci associated with educational attainment. The study focused on two phenotypes: years of schooling completed (EduYears) and college completion (College). The analysis was performed at the cohort level, with summary statistics from participating cohorts uploaded to a central server for meta-analysis. The primary phenotype, EduYears, was chosen due to its higher genetic correlation with College, suggesting that it is a more powerful measure for detecting associations.
The methods section covers several key aspects of the study, including phenotype definition, genotyping and imputation, association analyses, quality control, meta-analysis, and replication analyses. Phenotype definitions were standardized using the 1997 International Standard Classification of Education (ISCED). Genotyping was performed using various genotyping arrays, and imputation was done using the 1000 Genomes project reference panel. Association analyses were conducted using linear regression and logistic regression models, with covariates included to adjust for potential confounding factors.
Quality control measures included filtering out markers with incorrect strand alignment, low imputation quality, and low minor allele counts. Diagnostic tests such as allele frequency plots, P-value vs Z-score plots, QQ plots, and predicted vs reported standard error plots were used to ensure the reliability of the data. Meta-analysis was performed using METAL software, and genomic control factors were applied to account for inflation in $P$-values.
Within-sample replication analyses were conducted to validate the findings in new cohorts, and out-of-sample replication was performed using data from the UK Biobank. The results showed high concordance between the discovery and replication samples, with 72 out of 74 lead SNPs showing the anticipated sign in the replication sample, and 52 replicating at the 5% significance level. The genetic correlation between the discovery and replication samples was 0.946, further supporting the replicability of the findings.The supplementary methods section of the article provides a detailed overview of the genome-wide association study (GWAS) conducted to identify genetic loci associated with educational attainment. The study focused on two phenotypes: years of schooling completed (EduYears) and college completion (College). The analysis was performed at the cohort level, with summary statistics from participating cohorts uploaded to a central server for meta-analysis. The primary phenotype, EduYears, was chosen due to its higher genetic correlation with College, suggesting that it is a more powerful measure for detecting associations.
The methods section covers several key aspects of the study, including phenotype definition, genotyping and imputation, association analyses, quality control, meta-analysis, and replication analyses. Phenotype definitions were standardized using the 1997 International Standard Classification of Education (ISCED). Genotyping was performed using various genotyping arrays, and imputation was done using the 1000 Genomes project reference panel. Association analyses were conducted using linear regression and logistic regression models, with covariates included to adjust for potential confounding factors.
Quality control measures included filtering out markers with incorrect strand alignment, low imputation quality, and low minor allele counts. Diagnostic tests such as allele frequency plots, P-value vs Z-score plots, QQ plots, and predicted vs reported standard error plots were used to ensure the reliability of the data. Meta-analysis was performed using METAL software, and genomic control factors were applied to account for inflation in $P$-values.
Within-sample replication analyses were conducted to validate the findings in new cohorts, and out-of-sample replication was performed using data from the UK Biobank. The results showed high concordance between the discovery and replication samples, with 72 out of 74 lead SNPs showing the anticipated sign in the replication sample, and 52 replicating at the 5% significance level. The genetic correlation between the discovery and replication samples was 0.946, further supporting the replicability of the findings.