HiPhase: jointly phasing small, structural, and tandem repeat variants from HiFi sequencing

HiPhase: jointly phasing small, structural, and tandem repeat variants from HiFi sequencing

25 January 2024 | James M. Holt, Christopher T. Saunders, William J. Rowell, Zev Kronenberg, Aaron M. Wenger, Michael Eberle
**HiPhase: Jointly Phasing Small, Structural, and Tandem Repeat Variants from HiFi Sequencing** **Motivation:** Phasing is crucial for diploid organisms to assign alleles at heterozygous variants to one of two haplotypes. PacBio HiFi sequencing provides long, accurate reads that are ideal for variant calling and phasing, especially for larger variants like structural or tandem repeats. However, current phasing tools often only phase small variants, leaving larger variants unphased. **Results:** HiPhase is a new tool that jointly phases SNVs, indels, structural, and tandem repeat variants. Key benefits include dual-mode allele assignment for large variants, a novel application of the A* algorithm, and logic to span alignment errors around reference gaps and homozygous deletions. In assessments, HiPhase achieved an average of 99.9% phased reads with 629 snaphit errors and fully phased 83.8% of genes, outperforming current state-of-the-art tools. HiPhase also supports multi-threading, statistics gathering, and concurrent phased alignment output generation. **Availability and Implementation:** HiPhase is available as source code and a pre-compiled Linux binary with a user guide at <https://github.com/PacificBiosciences/HiPhase>. **Introduction:** Phasing is essential for clinical diagnostics, HLA typing, and autosomal recessive conditions. HiPhase addresses the limitations of existing tools like WhatsHap, which down-samples data, does not support multi-allelic variants, and is limited to small variants. HiPhase uses dual-mode allele assignment and an A* search algorithm to efficiently phase a wide range of variants. **Materials and Methods:** HiPhase breaks the phasing problem into phase block generation, allele assignment, and diplo-type solving. Phase block generation identifies consecutive heterozygous variant calls connected by reads. Allele assignment converts read mappings into condensed allelic observations, supporting multi-allelic variation. Diplo-type solving uses the A* algorithm to find optimal haplotypes. **Results:** HiPhase generated 480 kb phase block NG50, fully phased 93.8% of genes, and phased 3.1 million variants with 929 switchflip errors. It also generated over 11,000 phased structural variants and 68,000 phased tandem repeat variants, features not available in WhatsHap. **Conclusion:** HiPhase is the first tool to jointly phase SNVs, indels, structural, and tandem repeat variants. It outperforms WhatsHap in phase block length, error rate, and total phased variants. HiPhase includes usability features and will be extended to support other structural variants.**HiPhase: Jointly Phasing Small, Structural, and Tandem Repeat Variants from HiFi Sequencing** **Motivation:** Phasing is crucial for diploid organisms to assign alleles at heterozygous variants to one of two haplotypes. PacBio HiFi sequencing provides long, accurate reads that are ideal for variant calling and phasing, especially for larger variants like structural or tandem repeats. However, current phasing tools often only phase small variants, leaving larger variants unphased. **Results:** HiPhase is a new tool that jointly phases SNVs, indels, structural, and tandem repeat variants. Key benefits include dual-mode allele assignment for large variants, a novel application of the A* algorithm, and logic to span alignment errors around reference gaps and homozygous deletions. In assessments, HiPhase achieved an average of 99.9% phased reads with 629 snaphit errors and fully phased 83.8% of genes, outperforming current state-of-the-art tools. HiPhase also supports multi-threading, statistics gathering, and concurrent phased alignment output generation. **Availability and Implementation:** HiPhase is available as source code and a pre-compiled Linux binary with a user guide at <https://github.com/PacificBiosciences/HiPhase>. **Introduction:** Phasing is essential for clinical diagnostics, HLA typing, and autosomal recessive conditions. HiPhase addresses the limitations of existing tools like WhatsHap, which down-samples data, does not support multi-allelic variants, and is limited to small variants. HiPhase uses dual-mode allele assignment and an A* search algorithm to efficiently phase a wide range of variants. **Materials and Methods:** HiPhase breaks the phasing problem into phase block generation, allele assignment, and diplo-type solving. Phase block generation identifies consecutive heterozygous variant calls connected by reads. Allele assignment converts read mappings into condensed allelic observations, supporting multi-allelic variation. Diplo-type solving uses the A* algorithm to find optimal haplotypes. **Results:** HiPhase generated 480 kb phase block NG50, fully phased 93.8% of genes, and phased 3.1 million variants with 929 switchflip errors. It also generated over 11,000 phased structural variants and 68,000 phased tandem repeat variants, features not available in WhatsHap. **Conclusion:** HiPhase is the first tool to jointly phase SNVs, indels, structural, and tandem repeat variants. It outperforms WhatsHap in phase block length, error rate, and total phased variants. HiPhase includes usability features and will be extended to support other structural variants.
Reach us at info@study.space