2020 | T. Rhyker Ranallo-Benavidez, Kamil S. Jaron & Michael C. Schatz
GenomeScope 2.0 and Smudgeplot are tools for reference-free profiling of polyploid genomes. GenomeScope 2.0 uses combinatorial theory to model k-mer frequencies in heterozygous and polyploid genomes, enabling accurate inference of genome characteristics. Smudgeplot visualizes and estimates ploidy and genome structure by analyzing heterozygous k-mer pairs. These tools were tested on simulated and real datasets, including polyploid species like Meloidogyne and Fragaria × ananassa. GenomeScope 2.0 outperforms previous versions in accuracy and robustness, especially for low-coverage diploid data. Smudgeplot accurately estimates ploidy in many cases, though it may underestimate or overestimate in extreme heterozygosity or repetitiveness scenarios. The tools are essential for analyzing complex genomes, including polyploid species, which are often underrepresented in genomics studies. GenomeScope 2.0 and Smudgeplot can be used to improve genome assembly and interpretation, and to study the evolutionary impact of polyploidy. The methods rely on k-mer spectra and nonlinear optimization, with Smudgeplot using k-mer pair analysis to infer genome structure. The tools are available for use and further development.GenomeScope 2.0 and Smudgeplot are tools for reference-free profiling of polyploid genomes. GenomeScope 2.0 uses combinatorial theory to model k-mer frequencies in heterozygous and polyploid genomes, enabling accurate inference of genome characteristics. Smudgeplot visualizes and estimates ploidy and genome structure by analyzing heterozygous k-mer pairs. These tools were tested on simulated and real datasets, including polyploid species like Meloidogyne and Fragaria × ananassa. GenomeScope 2.0 outperforms previous versions in accuracy and robustness, especially for low-coverage diploid data. Smudgeplot accurately estimates ploidy in many cases, though it may underestimate or overestimate in extreme heterozygosity or repetitiveness scenarios. The tools are essential for analyzing complex genomes, including polyploid species, which are often underrepresented in genomics studies. GenomeScope 2.0 and Smudgeplot can be used to improve genome assembly and interpretation, and to study the evolutionary impact of polyploidy. The methods rely on k-mer spectra and nonlinear optimization, with Smudgeplot using k-mer pair analysis to infer genome structure. The tools are available for use and further development.