2013 | Jan P Meier-Kolthoff1, Alexander F Auch2, Hans-Peter Klenk1 and Markus Göker1*
The study aims to improve the genome sequence-based species delimitation methods, particularly focusing on the Genome Blast Distance Phylogeny (GBDP) approach. GBDP infers genome-to-genome distances between pairs of sequenced genomes, which can be used to estimate DNA-DNA hybridization (DDH) values, a traditional wet-lab method for species delineation in prokaryotes. The main challenge is to produce digital DDH values that mimic the wet-lab results as closely as possible.
The study uses correlation and regression analyses to determine the best-performing methods and parameters. GBDP is enhanced with new features, including confidence intervals for intergenomic distances obtained via resampling or statistical models for DDH prediction, and an additional family of distance functions. The results show that GBDP obtained the highest agreement with wet-lab DDH among all tested methods, but improved models further increased the accuracy of DDH prediction. Confidence intervals yielded stable results when inferred from statistical models, while those obtained via resampling showed marked differences between underlying distance functions.
Despite the high accuracy of GBDP-based DDH prediction, the study emphasizes the importance of confidence interval estimation to statistically evaluate the outcomes. Methodological advancements, accessible through a web service, are crucial steps towards a consistent and genome sequence-based classification of microorganisms. The recommended GBDP method combines BLAST+ with distance formula $d_{4}$ and optimized settings for word length and e-value filtering, along with a log-generalized linear model (GLM) for predicting DDH including confidence intervals.The study aims to improve the genome sequence-based species delimitation methods, particularly focusing on the Genome Blast Distance Phylogeny (GBDP) approach. GBDP infers genome-to-genome distances between pairs of sequenced genomes, which can be used to estimate DNA-DNA hybridization (DDH) values, a traditional wet-lab method for species delineation in prokaryotes. The main challenge is to produce digital DDH values that mimic the wet-lab results as closely as possible.
The study uses correlation and regression analyses to determine the best-performing methods and parameters. GBDP is enhanced with new features, including confidence intervals for intergenomic distances obtained via resampling or statistical models for DDH prediction, and an additional family of distance functions. The results show that GBDP obtained the highest agreement with wet-lab DDH among all tested methods, but improved models further increased the accuracy of DDH prediction. Confidence intervals yielded stable results when inferred from statistical models, while those obtained via resampling showed marked differences between underlying distance functions.
Despite the high accuracy of GBDP-based DDH prediction, the study emphasizes the importance of confidence interval estimation to statistically evaluate the outcomes. Methodological advancements, accessible through a web service, are crucial steps towards a consistent and genome sequence-based classification of microorganisms. The recommended GBDP method combines BLAST+ with distance formula $d_{4}$ and optimized settings for word length and e-value filtering, along with a log-generalized linear model (GLM) for predicting DDH including confidence intervals.