[slides] A one penny imputed genome from next generation reference panels

The paper presents a new genotype imputation method, Beagle 5.0, which significantly reduces the computational cost of imputing genotypes from large reference panels. The method uses composite reference haplotypes, delayed imputation, and an improved reference file format (bref3) to achieve substantial computational efficiency. Beagle 5.0 is compared with other imputation methods (Beagle 4.1, Impute4, Minimac3, and Minimac4) using 1000 Genomes Project, Haplotype Reference Consortium, and simulated data. The results show that Beagle 5.0 has the lowest computation time and the best scaling of computation time with increasing reference panel size. For 10k, 100k, 1M, and 10M reference samples and 1,000 phased target samples, Beagle 5.0's computation time is 3×, 12×, 43×, and 533× faster than the fastest alternative method, respectively. Additionally, Beagle 5.0 can perform genome-wide imputation from 10M reference samples into 1,000 phased target samples at a cost of less than one US cent per sample when using the Amazon Elastic Compute Cloud.The paper presents a new genotype imputation method, Beagle 5.0, which significantly reduces the computational cost of imputing genotypes from large reference panels. The method uses composite reference haplotypes, delayed imputation, and an improved reference file format (bref3) to achieve substantial computational efficiency. Beagle 5.0 is compared with other imputation methods (Beagle 4.1, Impute4, Minimac3, and Minimac4) using 1000 Genomes Project, Haplotype Reference Consortium, and simulated data. The results show that Beagle 5.0 has the lowest computation time and the best scaling of computation time with increasing reference panel size. For 10k, 100k, 1M, and 10M reference samples and 1,000 phased target samples, Beagle 5.0's computation time is 3×, 12×, 43×, and 533× faster than the fastest alternative method, respectively. Additionally, Beagle 5.0 can perform genome-wide imputation from 10M reference samples into 1,000 phased target samples at a cost of less than one US cent per sample when using the Amazon Elastic Compute Cloud.

A One-Penny Imputed Genome from Next-Generation Reference Panels

September 6, 2018 | Brian L. Browning, Ying Zhou, and Sharon R. Browning