Mercury: reference-free quality, completeness, and phasing assessment for genome assemblies

Mercury: reference-free quality, completeness, and phasing assessment for genome assemblies

2020 | Arang Rhie, Brian P. Walenz, Sergey Koren and Adam M. Phillippy
Merqury is a reference-free tool for evaluating genome assembly quality, completeness, and phasing. It uses efficient k-mer set operations to compare k-mers in a de novo assembly with those from high-accuracy, unassembled reads. Merqury estimates base-level accuracy and completeness by analyzing k-mer frequencies. It also evaluates haplotype-specific accuracy, completeness, phase block continuity, and switch errors for trios. Merqury generates visualizations like k-mer spectrum plots for assembly validation. It has been demonstrated on human and plant genomes to be fast and robust for assembly validation. Merqury addresses the challenge of validating de novo assemblies without a known truth. Existing methods rely on short-read mapping, which can be biased in repetitive regions. BUSCO, a widely used method for evaluating gene content, can be inaccurate when the genome contains true copy number or sequence variants not considered in the initial BUSCO gene set. K-mers, used in a reference-free manner, can assess genome assembly quality. Merqury builds on the k-mer-based analyses introduced by KAT but adds new functionality for evaluating phased diploid genome assemblies. It includes metrics like consensus quality (QV) and k-mer completeness, and when parental genomic sequences are available, it can output haplotype completeness, phase block statistics, switch error rates, and visual representations of phase consistency. Merqury evaluates the accuracy, completeness, and phasing of assemblies by comparing k-mer frequencies in assemblies with those from high-accuracy reads. It generates copy number spectrum plots to identify artificial duplications and missing sequences. It also evaluates haplotype-specific k-mers (hap-mers) to assess phase blocks and switch errors. Merqury provides a suite of metrics for assembly validation, including QV, completeness, and phasing. It is efficient and accurate, providing a broader evaluation of assembly quality compared to traditional metrics like N50 contig size. Merqury is compatible with any high-accuracy sequencing technology and can be used for polyploid genomes. It provides a reference-free approach for measuring assembly phase blocks using parental k-mers. Merqury is designed to receive any pre-computed hap-mer set as input and supports alternative k-mer classification methods. Merqury's k-mer-based method provides better haplotype completeness estimates, as it does not rely on a reference genome. It is efficient and accurate, providing a broader evaluation of assembly quality compared to traditional metrics like N50 contig size. Merqury is recommended for reporting these metrics along with any new genome assembly.Merqury is a reference-free tool for evaluating genome assembly quality, completeness, and phasing. It uses efficient k-mer set operations to compare k-mers in a de novo assembly with those from high-accuracy, unassembled reads. Merqury estimates base-level accuracy and completeness by analyzing k-mer frequencies. It also evaluates haplotype-specific accuracy, completeness, phase block continuity, and switch errors for trios. Merqury generates visualizations like k-mer spectrum plots for assembly validation. It has been demonstrated on human and plant genomes to be fast and robust for assembly validation. Merqury addresses the challenge of validating de novo assemblies without a known truth. Existing methods rely on short-read mapping, which can be biased in repetitive regions. BUSCO, a widely used method for evaluating gene content, can be inaccurate when the genome contains true copy number or sequence variants not considered in the initial BUSCO gene set. K-mers, used in a reference-free manner, can assess genome assembly quality. Merqury builds on the k-mer-based analyses introduced by KAT but adds new functionality for evaluating phased diploid genome assemblies. It includes metrics like consensus quality (QV) and k-mer completeness, and when parental genomic sequences are available, it can output haplotype completeness, phase block statistics, switch error rates, and visual representations of phase consistency. Merqury evaluates the accuracy, completeness, and phasing of assemblies by comparing k-mer frequencies in assemblies with those from high-accuracy reads. It generates copy number spectrum plots to identify artificial duplications and missing sequences. It also evaluates haplotype-specific k-mers (hap-mers) to assess phase blocks and switch errors. Merqury provides a suite of metrics for assembly validation, including QV, completeness, and phasing. It is efficient and accurate, providing a broader evaluation of assembly quality compared to traditional metrics like N50 contig size. Merqury is compatible with any high-accuracy sequencing technology and can be used for polyploid genomes. It provides a reference-free approach for measuring assembly phase blocks using parental k-mers. Merqury is designed to receive any pre-computed hap-mer set as input and supports alternative k-mer classification methods. Merqury's k-mer-based method provides better haplotype completeness estimates, as it does not rely on a reference genome. It is efficient and accurate, providing a broader evaluation of assembly quality compared to traditional metrics like N50 contig size. Merqury is recommended for reporting these metrics along with any new genome assembly.
Reach us at info@study.space