De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds

De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds

2017 April 07 | Olga Dudchenko, Sanjit S. Batra, Arina D. Omer, Sarah K. Nyquist, Marie Hoeger, Neva C. Durand, Muhammad S. Shamim, Ido Machol, Eric S. Lander, Aviva Presser Aiden, and Erez Lieberman Aiden
A team of researchers developed a method to assemble the genome of the Aedes aegypti mosquito using Hi-C data, resulting in chromosome-length scaffolds. This method combines Hi-C data with existing draft assemblies to generate accurate and high-quality genome assemblies. The approach was validated by assembling a human genome from short reads alone and then applying it to mosquito genomes. The resulting assemblies of Aedes aegypti and Culex quinquefasciatus each consist of three scaffolds corresponding to the three chromosomes in each species. These assemblies indicate that most genomic rearrangements occur within, rather than between, chromosome arms. The method is fast, inexpensive, and accurate, and can be applied to many species. Hi-C data provide links across various length scales, including whole chromosomes, and can be used to assign draft scaffolds to chromosomes and order them within each chromosome. However, the resulting predictions may contain errors, such as chromosome-scale inversions and misjoins. To address this, the researchers developed a robust procedure that uses Hi-C data to identify and correct errors in the initial assembly, then anchors, orders, and orients the resulting sequences based on contact frequency. They also merge contigs and scaffolds that correspond to overlapping regions of the genome by identifying pairs with strong sequence homology and similar long-range contact patterns. The method was validated by creating a de novo assembly of the human genome using short Illumina reads and comparing it to the human genome reference. The resulting assembly of the Aedes aegypti genome showed high accuracy, with 99.7% of scaffolds assigned to the correct chromosome. The assembly of the Culex quinquefasciatus genome also showed high accuracy, with 99% agreement in the order of scaffolds assigned to the same chromosome-length scaffold. The assemblies were compared to genetic and physical maps, showing close correspondence. The results suggest that each chromosome arm in the Aedes, Culex, and Anopheles species descends from a single arm present in their common ancestor about 150 to 200 million years ago. The method also allows for the study of genome evolution, showing that sequence content among chromosome arms is conserved across species. The findings highlight the importance of Hi-C data in generating accurate genome assemblies and suggest that this approach can accelerate genomic analysis of many organisms.A team of researchers developed a method to assemble the genome of the Aedes aegypti mosquito using Hi-C data, resulting in chromosome-length scaffolds. This method combines Hi-C data with existing draft assemblies to generate accurate and high-quality genome assemblies. The approach was validated by assembling a human genome from short reads alone and then applying it to mosquito genomes. The resulting assemblies of Aedes aegypti and Culex quinquefasciatus each consist of three scaffolds corresponding to the three chromosomes in each species. These assemblies indicate that most genomic rearrangements occur within, rather than between, chromosome arms. The method is fast, inexpensive, and accurate, and can be applied to many species. Hi-C data provide links across various length scales, including whole chromosomes, and can be used to assign draft scaffolds to chromosomes and order them within each chromosome. However, the resulting predictions may contain errors, such as chromosome-scale inversions and misjoins. To address this, the researchers developed a robust procedure that uses Hi-C data to identify and correct errors in the initial assembly, then anchors, orders, and orients the resulting sequences based on contact frequency. They also merge contigs and scaffolds that correspond to overlapping regions of the genome by identifying pairs with strong sequence homology and similar long-range contact patterns. The method was validated by creating a de novo assembly of the human genome using short Illumina reads and comparing it to the human genome reference. The resulting assembly of the Aedes aegypti genome showed high accuracy, with 99.7% of scaffolds assigned to the correct chromosome. The assembly of the Culex quinquefasciatus genome also showed high accuracy, with 99% agreement in the order of scaffolds assigned to the same chromosome-length scaffold. The assemblies were compared to genetic and physical maps, showing close correspondence. The results suggest that each chromosome arm in the Aedes, Culex, and Anopheles species descends from a single arm present in their common ancestor about 150 to 200 million years ago. The method also allows for the study of genome evolution, showing that sequence content among chromosome arms is conserved across species. The findings highlight the importance of Hi-C data in generating accurate genome assemblies and suggest that this approach can accelerate genomic analysis of many organisms.
Reach us at info@study.space
[slides and audio] De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds