March 2013 | Volume 9 | Issue 3 | e1003348 | Frank Dudbridge
Polygenic scores have been used to summarize genetic effects from a large number of markers that individually do not achieve significance in genome-wide association studies (GWAS). These scores are constructed by selecting markers from an initial training sample and forming a weighted sum of associated alleles in an independent replication sample. The association between a trait and this composite score indicates the presence of a genetic signal among the selected markers, and the score can be used for predicting individual trait values. This approach has been used to establish a common genetic basis for related disorders and to construct risk prediction models. However, in some cases, the desired association or prediction has not been achieved. The paper derives the power and predictive accuracy of a polygenic score from a quantitative genetics model, considering the sizes of the two samples, explained genetic variance, selection thresholds for including markers, and methods for weighting effect sizes. Expressions are derived for both quantitative and discrete traits, allowing for case/control sampling. The study shows that published studies with significant associations of polygenic scores have been well powered, while those with negative results can be explained by low sample sizes. It also demonstrates that useful levels of prediction may only be achieved when predictors are estimated from very large samples, up to an order of magnitude greater than currently available. Therefore, polygenic scores are currently more useful for association testing than for predicting complex traits, but prediction will become more feasible as sample sizes continue to grow.Polygenic scores have been used to summarize genetic effects from a large number of markers that individually do not achieve significance in genome-wide association studies (GWAS). These scores are constructed by selecting markers from an initial training sample and forming a weighted sum of associated alleles in an independent replication sample. The association between a trait and this composite score indicates the presence of a genetic signal among the selected markers, and the score can be used for predicting individual trait values. This approach has been used to establish a common genetic basis for related disorders and to construct risk prediction models. However, in some cases, the desired association or prediction has not been achieved. The paper derives the power and predictive accuracy of a polygenic score from a quantitative genetics model, considering the sizes of the two samples, explained genetic variance, selection thresholds for including markers, and methods for weighting effect sizes. Expressions are derived for both quantitative and discrete traits, allowing for case/control sampling. The study shows that published studies with significant associations of polygenic scores have been well powered, while those with negative results can be explained by low sample sizes. It also demonstrates that useful levels of prediction may only be achieved when predictors are estimated from very large samples, up to an order of magnitude greater than currently available. Therefore, polygenic scores are currently more useful for association testing than for predicting complex traits, but prediction will become more feasible as sample sizes continue to grow.