YaHS: yet another Hi-C scaffolding tool

YaHS: yet another Hi-C scaffolding tool

June 30, 2022 | Chenxi Zhou, Shane A. McCarthy, Richard Durbin
**YaHS: Yet Another Hi-C Scaffolding Tool** Chenxi Zhou, Shane A. McCarthy, and Richard Durbin present YaHS, a user-friendly command-line tool for constructing chromosome-scale scaffolds from Hi-C data. YaHS is designed to be run with a single-line command, requiring minimal user input (an assembly file and an alignment file) and providing results in multiple formats. It is implemented in C and licensed under the MIT License, with source code, documentation, and tutorials available on GitHub. The introduction highlights the importance of long-read, single-molecule DNA sequencing technologies in *de novo* genome assembly, particularly for projects like the Earth Biogenome Project, Vertebrate Genomes Project, and Darwin Tree of Life Project. Despite technological advancements, high-quality, chromosome-scale genomes remain challenging to assemble using long-read sequencing data alone. Hi-C, a sequencing-based proximity ligation assay, provides valuable contact information for constructing chromosome-scale scaffolds. Several scaffolding tools have been developed, but each has limitations, such as genome complexity and repeat content. YaHS follows a standard Hi-C scaffolding pipeline, including mapping Hi-C reads to input contigs, breaking contigs to correct assembly errors, building a contact matrix, constructing and pruning a scaffolding graph, and outputting scaffolds. A novel method for building the contact matrix distinguishes YaHS from other tools, enabling more accurate inferences of contig joins. Comparisons with other tools show that YaHS generates higher-quality genome assemblies with greater accuracy and contiguity, and is more robust to assembly errors. The results section details the performance of YaHS on simulated human genome assemblies and real-world assemblies from the Darwin Tree of Life project. YaHS consistently outperforms other tools in terms of contiguity, especially for the L90 statistics. The tool is also evaluated on assemblies with errors, demonstrating its ability to correct assembly errors effectively. The conclusion emphasizes that YaHS is a fast, reliable, and accurate tool for constructing chromosome-scale scaffolds from Hi-C data, widely used in the DToL project and other applications. It is open-source, easy to use, and well-documented, making it a valuable resource for researchers in genomics and genome assembly.**YaHS: Yet Another Hi-C Scaffolding Tool** Chenxi Zhou, Shane A. McCarthy, and Richard Durbin present YaHS, a user-friendly command-line tool for constructing chromosome-scale scaffolds from Hi-C data. YaHS is designed to be run with a single-line command, requiring minimal user input (an assembly file and an alignment file) and providing results in multiple formats. It is implemented in C and licensed under the MIT License, with source code, documentation, and tutorials available on GitHub. The introduction highlights the importance of long-read, single-molecule DNA sequencing technologies in *de novo* genome assembly, particularly for projects like the Earth Biogenome Project, Vertebrate Genomes Project, and Darwin Tree of Life Project. Despite technological advancements, high-quality, chromosome-scale genomes remain challenging to assemble using long-read sequencing data alone. Hi-C, a sequencing-based proximity ligation assay, provides valuable contact information for constructing chromosome-scale scaffolds. Several scaffolding tools have been developed, but each has limitations, such as genome complexity and repeat content. YaHS follows a standard Hi-C scaffolding pipeline, including mapping Hi-C reads to input contigs, breaking contigs to correct assembly errors, building a contact matrix, constructing and pruning a scaffolding graph, and outputting scaffolds. A novel method for building the contact matrix distinguishes YaHS from other tools, enabling more accurate inferences of contig joins. Comparisons with other tools show that YaHS generates higher-quality genome assemblies with greater accuracy and contiguity, and is more robust to assembly errors. The results section details the performance of YaHS on simulated human genome assemblies and real-world assemblies from the Darwin Tree of Life project. YaHS consistently outperforms other tools in terms of contiguity, especially for the L90 statistics. The tool is also evaluated on assemblies with errors, demonstrating its ability to correct assembly errors effectively. The conclusion emphasizes that YaHS is a fast, reliable, and accurate tool for constructing chromosome-scale scaffolds from Hi-C data, widely used in the DToL project and other applications. It is open-source, easy to use, and well-documented, making it a valuable resource for researchers in genomics and genome assembly.
Reach us at info@study.space
[slides] YaHS%3A yet another Hi-C scaffolding tool | StudySpace