July 27, 2004 | Koichiro Tamura*, Masatoshi Nei*, and Sudhir Kumar*
The study evaluates the accuracy of the neighbor-joining (NJ) method for inferring large phylogenies. The NJ method is computationally efficient and accurate for small data sets, but its accuracy decreases as data sets grow due to the limited tree space explored. The authors developed a simultaneous estimation (SE) method for pairwise distances using biologically realistic models of nucleotide substitution, which significantly improves the accuracy of NJ trees. This method reduces standard errors and corrects up to 60% of NJ tree errors. Simulations show that the accuracy of NJ trees declines only about 5% when the number of sequences increases from 32 to 4,096, even with high evolutionary rate variation and nucleotide composition biases. The SE method, based on maximum likelihood, provides more reliable estimates of evolutionary distances and improves the accuracy of NJ trees. The study demonstrates that the NJ method remains effective for large phylogenies when combined with sophisticated models of nucleotide substitution. The results suggest that the NJ method, with the SE approach, can be used efficiently for inferring phylogenies with thousands of sequences. The accuracy of NJ-SE trees is higher than NJ-IE trees and p-distance trees, indicating that the SE method significantly enhances the reliability of phylogenetic inference. The study also highlights that the accuracy of NJ trees does not decline significantly with increasing sequence numbers, making the NJ method a viable option for large-scale phylogenetic analysis.The study evaluates the accuracy of the neighbor-joining (NJ) method for inferring large phylogenies. The NJ method is computationally efficient and accurate for small data sets, but its accuracy decreases as data sets grow due to the limited tree space explored. The authors developed a simultaneous estimation (SE) method for pairwise distances using biologically realistic models of nucleotide substitution, which significantly improves the accuracy of NJ trees. This method reduces standard errors and corrects up to 60% of NJ tree errors. Simulations show that the accuracy of NJ trees declines only about 5% when the number of sequences increases from 32 to 4,096, even with high evolutionary rate variation and nucleotide composition biases. The SE method, based on maximum likelihood, provides more reliable estimates of evolutionary distances and improves the accuracy of NJ trees. The study demonstrates that the NJ method remains effective for large phylogenies when combined with sophisticated models of nucleotide substitution. The results suggest that the NJ method, with the SE approach, can be used efficiently for inferring phylogenies with thousands of sequences. The accuracy of NJ-SE trees is higher than NJ-IE trees and p-distance trees, indicating that the SE method significantly enhances the reliability of phylogenetic inference. The study also highlights that the accuracy of NJ trees does not decline significantly with increasing sequence numbers, making the NJ method a viable option for large-scale phylogenetic analysis.