2015 | Felipe A. Simão, Robert M. Waterhouse, Panagiotis Ioannidis, Evgenia V. Kriventseva and Evgeny M. Zdobnov
The article introduces BUSCO, a tool for assessing the completeness of genome assemblies and annotations using single-copy orthologs. It addresses the challenge of evaluating genome assembly quality, which is often limited to technical metrics like N50. BUSCO uses evolutionary expectations of gene content to provide a more accurate measure of completeness. The tool is implemented in Python and available online. It uses a set of benchmarking universal single-copy orthologs (BUSCOs) to evaluate genome assemblies, gene sets, and transcriptomes. BUSCO sets are defined for six major phylogenetic clades and include genes that are expected to be present in a single copy in most species. The tool classifies genes as complete, duplicated, fragmented, or missing based on their presence and sequence characteristics. BUSCO assessments provide intuitive metrics for genome, gene set, and transcriptome completeness. The tool is efficient, with run-times varying based on the size of the BUSCO set and the genome being assessed. BUSCO assessments are more accurate than traditional metrics and can detect issues in genome assemblies and annotations. The tool is useful for evaluating the quality of genome assemblies and annotations, and for comparing different assemblies and annotations. The study highlights the importance of completeness in genome assemblies and the limitations of traditional metrics. The tool is applicable to a wide range of organisms and is particularly useful for assessing the completeness of genomes and annotations. The study also discusses the limitations of BUSCO assessments, including potential errors in gene prediction and the impact of evolutionary history on completeness assessments. The tool is recommended for use in genome assembly and annotation quality assessment.The article introduces BUSCO, a tool for assessing the completeness of genome assemblies and annotations using single-copy orthologs. It addresses the challenge of evaluating genome assembly quality, which is often limited to technical metrics like N50. BUSCO uses evolutionary expectations of gene content to provide a more accurate measure of completeness. The tool is implemented in Python and available online. It uses a set of benchmarking universal single-copy orthologs (BUSCOs) to evaluate genome assemblies, gene sets, and transcriptomes. BUSCO sets are defined for six major phylogenetic clades and include genes that are expected to be present in a single copy in most species. The tool classifies genes as complete, duplicated, fragmented, or missing based on their presence and sequence characteristics. BUSCO assessments provide intuitive metrics for genome, gene set, and transcriptome completeness. The tool is efficient, with run-times varying based on the size of the BUSCO set and the genome being assessed. BUSCO assessments are more accurate than traditional metrics and can detect issues in genome assemblies and annotations. The tool is useful for evaluating the quality of genome assemblies and annotations, and for comparing different assemblies and annotations. The study highlights the importance of completeness in genome assemblies and the limitations of traditional metrics. The tool is applicable to a wide range of organisms and is particularly useful for assessing the completeness of genomes and annotations. The study also discusses the limitations of BUSCO assessments, including potential errors in gene prediction and the impact of evolutionary history on completeness assessments. The tool is recommended for use in genome assembly and annotation quality assessment.