A One-Penny Imputed Genome from Next-Generation Reference Panels

A One-Penny Imputed Genome from Next-Generation Reference Panels

September 6, 2018 | Brian L. Browning, Ying Zhou, and Sharon R. Browning
A new genotype imputation method, Beagle 5.0, has been developed to significantly reduce the computational cost of imputation from large reference panels. This method achieves faster computation times and better scalability compared to existing methods like Beagle 4.1, Impute4, Minimac3, and Minimac4. Beagle 5.0 uses composite reference haplotypes, an improved reference file format (bref3), and delays imputation until output file construction, which reduces memory usage and improves efficiency. It can impute genotypes from 10M reference samples into 1,000 target samples at a cost of less than one US cent per sample. The method is based on the Li and Stephens haplotype frequency model and uses a parsimonious state space. It also incorporates recent advances in simulation methods and inference of historical effective population size to generate realistic simulated data. Beagle 5.0 was tested using data from the 1000 Genomes Project, Haplotype Reference Consortium, and simulated data for reference panels of varying sizes. The results show that Beagle 5.0 has the lowest computation time and best scaling with increasing reference panel size. The method is freely available and implemented in the open-source Beagle 5.0 software package. The computational efficiency of Beagle 5.0 is due to several methodological improvements, including the use of composite reference haplotypes, an improved reference file format, and delayed imputation. The method is particularly effective for imputing SNV variants in large batches of target samples. However, there is potential for further improvements in computational efficiency when imputing from large reference panels into small batches of target samples, and in the development of imputation methods for non-SNV variants.A new genotype imputation method, Beagle 5.0, has been developed to significantly reduce the computational cost of imputation from large reference panels. This method achieves faster computation times and better scalability compared to existing methods like Beagle 4.1, Impute4, Minimac3, and Minimac4. Beagle 5.0 uses composite reference haplotypes, an improved reference file format (bref3), and delays imputation until output file construction, which reduces memory usage and improves efficiency. It can impute genotypes from 10M reference samples into 1,000 target samples at a cost of less than one US cent per sample. The method is based on the Li and Stephens haplotype frequency model and uses a parsimonious state space. It also incorporates recent advances in simulation methods and inference of historical effective population size to generate realistic simulated data. Beagle 5.0 was tested using data from the 1000 Genomes Project, Haplotype Reference Consortium, and simulated data for reference panels of varying sizes. The results show that Beagle 5.0 has the lowest computation time and best scaling with increasing reference panel size. The method is freely available and implemented in the open-source Beagle 5.0 software package. The computational efficiency of Beagle 5.0 is due to several methodological improvements, including the use of composite reference haplotypes, an improved reference file format, and delayed imputation. The method is particularly effective for imputing SNV variants in large batches of target samples. However, there is potential for further improvements in computational efficiency when imputing from large reference panels into small batches of target samples, and in the development of imputation methods for non-SNV variants.
Reach us at info@study.space
[slides and audio] A one penny imputed genome from next generation reference panels