Received: 1 December 2016 / Accepted: 7 February 2017 / Published online: 15 February 2017 | Seok-Hwan Yoon · Sung-min Ha · Jeongmin Lim · Soonjae Kwon · Jongsik Chun
This study evaluates four algorithms for calculating average nucleotide identity (ANI), a key measure used to define species boundaries in Archaea and Bacteria. The algorithms compared are ANIb (using BLAST), ANIm (using MUMmer), OrthoANIb (using BLAST), and OrthoANIm (using USEARCH). The evaluation was conducted using over 100,000 pairs of genomes with varying sizes. The results show that OrthoANIb and OrthoANIm exhibit good correlation with ANIb across the entire range of ANI values, while ANIm shows poor correlation for ANI values below 90%. OrthoANIm and ANIm run significantly faster than ANIb, with OrthoANIm being 53 and 22 times faster for genomes larger than 7 Mbp, respectively. A web-service and a standalone JAVA program are available for calculating OrthoANIm, offering a fast and accurate method for ANI calculations in large-scale bioinformatics pipelines.This study evaluates four algorithms for calculating average nucleotide identity (ANI), a key measure used to define species boundaries in Archaea and Bacteria. The algorithms compared are ANIb (using BLAST), ANIm (using MUMmer), OrthoANIb (using BLAST), and OrthoANIm (using USEARCH). The evaluation was conducted using over 100,000 pairs of genomes with varying sizes. The results show that OrthoANIb and OrthoANIm exhibit good correlation with ANIb across the entire range of ANI values, while ANIm shows poor correlation for ANI values below 90%. OrthoANIm and ANIm run significantly faster than ANIb, with OrthoANIm being 53 and 22 times faster for genomes larger than 7 Mbp, respectively. A web-service and a standalone JAVA program are available for calculating OrthoANIm, offering a fast and accurate method for ANI calculations in large-scale bioinformatics pipelines.