November 2011 | Bryan Howie, Jonathan Marchini, and Matthew Stephens
This paper presents a new framework for genotype imputation that improves accuracy and efficiency when using large, diverse reference panels. The authors propose a method that uses local sequence similarity to select a custom reference panel for each study haplotype in each region of the genome. This approach allows the use of all available reference haplotypes without the need for panel selection, which can improve accuracy at low-frequency variants by capturing unexpected allele sharing among populations. The framework was tested on data from HapMap 3 and the MalariaGEN Project, showing high imputation accuracy across a wide range of human populations. The authors also demonstrate that their approximation reduces computational costs when adding haplotypes to a reference set and that their method is faster and more accurate than another leading method (Beagle) when imputing from large, sequence-based reference panels. The framework is implemented in the IMPUTE2 software package. The paper also discusses computational strategies for modern reference datasets and the importance of using diverse reference panels to improve imputation accuracy. The authors conclude that their framework provides a practical way for investigators to use the rich information available in thousands of reference genomes.This paper presents a new framework for genotype imputation that improves accuracy and efficiency when using large, diverse reference panels. The authors propose a method that uses local sequence similarity to select a custom reference panel for each study haplotype in each region of the genome. This approach allows the use of all available reference haplotypes without the need for panel selection, which can improve accuracy at low-frequency variants by capturing unexpected allele sharing among populations. The framework was tested on data from HapMap 3 and the MalariaGEN Project, showing high imputation accuracy across a wide range of human populations. The authors also demonstrate that their approximation reduces computational costs when adding haplotypes to a reference set and that their method is faster and more accurate than another leading method (Beagle) when imputing from large, sequence-based reference panels. The framework is implemented in the IMPUTE2 software package. The paper also discusses computational strategies for modern reference datasets and the importance of using diverse reference panels to improve imputation accuracy. The authors conclude that their framework provides a practical way for investigators to use the rich information available in thousands of reference genomes.