May 2021 | Stephen Nayfach, Antonio Pedro Camargo, Frederik Schulz, Emiley Eloe-Fadros, Simon Roux, Nikos C. Kyrpides
CheckV is an automated pipeline for assessing the quality and completeness of metagenome-assembled viral genomes. It identifies closed viral genomes, estimates the completeness of genome fragments, and removes flanking host regions from integrated proviruses. CheckV uses a large database of 76,262 complete viral genomes to estimate completeness, including those from publicly available metagenomes, metatranscriptomes, and metaviromes. After validation, CheckV was applied to large viral sequence collections, revealing 44,652 high-quality viral genomes (over 90% complete), although most sequences were small fragments. Host contamination significantly affects downstream analyses, so CheckV removes it to improve the identification of auxiliary metabolic genes and viral-encoded functions.
Viruses are the most abundant biological entities on Earth and play key roles in microbial communities. However, only a limited fraction of viral diversity can be studied in the lab, so metagenomic sequencing is used to study uncultivated viruses. Metagenomic viral contigs are identified using computational tools that use viral-specific features. CheckV improves the accuracy of genome completeness and host contamination estimation, which is crucial for viral metagenomics. CheckV is computationally efficient and more accurate than existing methods. It uses a database of complete viral genomes to estimate completeness and identify host contamination. CheckV is suitable for single-contig viral genomes but not for multi-contig ones. It can be updated with new viral genomes as they become available. CheckV is useful for evaluating the completeness of metagenome-assembled viral genomes and for identifying host regions on proviruses. It is also effective in distinguishing viral-encoded functions from host contamination. CheckV is a valuable tool for viral metagenomics and will be useful in future studies.CheckV is an automated pipeline for assessing the quality and completeness of metagenome-assembled viral genomes. It identifies closed viral genomes, estimates the completeness of genome fragments, and removes flanking host regions from integrated proviruses. CheckV uses a large database of 76,262 complete viral genomes to estimate completeness, including those from publicly available metagenomes, metatranscriptomes, and metaviromes. After validation, CheckV was applied to large viral sequence collections, revealing 44,652 high-quality viral genomes (over 90% complete), although most sequences were small fragments. Host contamination significantly affects downstream analyses, so CheckV removes it to improve the identification of auxiliary metabolic genes and viral-encoded functions.
Viruses are the most abundant biological entities on Earth and play key roles in microbial communities. However, only a limited fraction of viral diversity can be studied in the lab, so metagenomic sequencing is used to study uncultivated viruses. Metagenomic viral contigs are identified using computational tools that use viral-specific features. CheckV improves the accuracy of genome completeness and host contamination estimation, which is crucial for viral metagenomics. CheckV is computationally efficient and more accurate than existing methods. It uses a database of complete viral genomes to estimate completeness and identify host contamination. CheckV is suitable for single-contig viral genomes but not for multi-contig ones. It can be updated with new viral genomes as they become available. CheckV is useful for evaluating the completeness of metagenome-assembled viral genomes and for identifying host regions on proviruses. It is also effective in distinguishing viral-encoded functions from host contamination. CheckV is a valuable tool for viral metagenomics and will be useful in future studies.