YaHS: yet another Hi-C scaffolding tool

YaHS: yet another Hi-C scaffolding tool

June 30, 2022 | Chenxi Zhou, Shane A. McCarthy, and Richard Durbin
YaHS is a user-friendly command-line tool for constructing chromosome-scale scaffolds from Hi-C data. It requires minimal input (an assembly file and an alignment file) and provides results in multiple formats, enabling rapid, robust, and scalable construction of high-quality genome assemblies. YaHS is implemented in C and licensed under the MIT License. The source code, documentation, and tutorial are available at https://github.com/c-zhou/yahs. The introduction discusses the challenges of assembling high-quality, chromosome-scale genomes using long-read sequencing data alone. Hi-C data, which provides contact information between loci, is often used to construct chromosome-scale scaffolds. Several scaffolding tools, including LACHESIS, HiRise, 3D-DNA, SALSA2, and pin_hic, have been developed for this purpose, but each has limitations. YaHS introduces a novel method for building the contact matrix, which improves the accuracy of contig joins. It is more robust to assembly errors and generates higher accuracy and contiguity in genome assemblies compared to previous tools. In the results section, YaHS was tested on simulated human genome assemblies and Darwin Tree of Life assemblies. On simulated data, YaHS assembled over 92% of sequences into 25 major scaffolds with higher N50 and N90 values compared to SALSA2 and pin_hic. It also corrected more assembly errors than the other tools. On the Darwin Tree of Life assemblies, YaHS consistently generated assemblies with higher contiguity, particularly for the L90 statistics. The conclusion states that YaHS is a fast, reliable, and accurate tool for constructing chromosome-scale scaffolds with Hi-C data. It outperforms other state-of-the-art Hi-C scaffolding tools in terms of genome assembly accuracy and contiguity across a wide range of species and genome sizes. It is open source, easy to use, and well-documented. The authors thank individuals who provided feedback on the tool. The work was supported by Wellcome. Conflict of interest: R.D. is a consultant for Dovetail Inc.YaHS is a user-friendly command-line tool for constructing chromosome-scale scaffolds from Hi-C data. It requires minimal input (an assembly file and an alignment file) and provides results in multiple formats, enabling rapid, robust, and scalable construction of high-quality genome assemblies. YaHS is implemented in C and licensed under the MIT License. The source code, documentation, and tutorial are available at https://github.com/c-zhou/yahs. The introduction discusses the challenges of assembling high-quality, chromosome-scale genomes using long-read sequencing data alone. Hi-C data, which provides contact information between loci, is often used to construct chromosome-scale scaffolds. Several scaffolding tools, including LACHESIS, HiRise, 3D-DNA, SALSA2, and pin_hic, have been developed for this purpose, but each has limitations. YaHS introduces a novel method for building the contact matrix, which improves the accuracy of contig joins. It is more robust to assembly errors and generates higher accuracy and contiguity in genome assemblies compared to previous tools. In the results section, YaHS was tested on simulated human genome assemblies and Darwin Tree of Life assemblies. On simulated data, YaHS assembled over 92% of sequences into 25 major scaffolds with higher N50 and N90 values compared to SALSA2 and pin_hic. It also corrected more assembly errors than the other tools. On the Darwin Tree of Life assemblies, YaHS consistently generated assemblies with higher contiguity, particularly for the L90 statistics. The conclusion states that YaHS is a fast, reliable, and accurate tool for constructing chromosome-scale scaffolds with Hi-C data. It outperforms other state-of-the-art Hi-C scaffolding tools in terms of genome assembly accuracy and contiguity across a wide range of species and genome sizes. It is open source, easy to use, and well-documented. The authors thank individuals who provided feedback on the tool. The work was supported by Wellcome. Conflict of interest: R.D. is a consultant for Dovetail Inc.
Reach us at info@study.space