28 April 2021 | A list of authors and their affiliations appears at the end of the paper.
The article discusses the development and application of high-quality, complete reference genomes for all vertebrate species. The authors highlight the importance of long-read sequencing technologies in maximizing genome quality and addressing issues such as unresolved complex repeats and haplotype heterozygosity, which are major sources of assembly errors. They present lessons learned from generating assemblies for 16 species representing six major vertebrate lineages, including the identification of false gene duplications, increases in gene sizes, chromosome rearrangements, and a GC-rich pattern in protein-coding genes and their regulatory regions. The Vertebrate Genomes Project (VGP) is introduced as an international effort to generate high-quality, complete reference genomes for all approximately 70,000 extant vertebrate species, aiming to enable new discoveries in biology, disease, and biodiversity conservation. The article also details the assembly pipeline, the impact of repeats and heterozygosity on assembly quality, the detection and removal of false duplications, the importance of curation, and the effects of polishing on accuracy. Additionally, it explores the GC-rich regulatory regions of coding genes and chromosomal evolution among vertebrates. The authors propose assembly quality metrics and outline the VGP's future phases, emphasizing the need for continued improvements in haplotype phasing, base-call accuracy, and resolution of long repetitive regions.The article discusses the development and application of high-quality, complete reference genomes for all vertebrate species. The authors highlight the importance of long-read sequencing technologies in maximizing genome quality and addressing issues such as unresolved complex repeats and haplotype heterozygosity, which are major sources of assembly errors. They present lessons learned from generating assemblies for 16 species representing six major vertebrate lineages, including the identification of false gene duplications, increases in gene sizes, chromosome rearrangements, and a GC-rich pattern in protein-coding genes and their regulatory regions. The Vertebrate Genomes Project (VGP) is introduced as an international effort to generate high-quality, complete reference genomes for all approximately 70,000 extant vertebrate species, aiming to enable new discoveries in biology, disease, and biodiversity conservation. The article also details the assembly pipeline, the impact of repeats and heterozygosity on assembly quality, the detection and removal of false duplications, the importance of curation, and the effects of polishing on accuracy. Additionally, it explores the GC-rich regulatory regions of coding genes and chromosomal evolution among vertebrates. The authors propose assembly quality metrics and outline the VGP's future phases, emphasizing the need for continued improvements in haplotype phasing, base-call accuracy, and resolution of long repetitive regions.