Accuracy of Estimated Phylogenetic Trees from Molecular Data

Accuracy of Estimated Phylogenetic Trees from Molecular Data

1983 | Masatoshi Nei, Fumio Tajima, and Yoshio Tateno*
The accuracy and efficiency of three methods for constructing phylogenetic trees from gene frequency data were evaluated using computer simulations. The methods tested were UPGMA, Farris' method, and the modified Farris method. Simulations assumed eight species evolving according to a model tree, with allele frequency changes tracked using the infinite-allele model. Genetic distances were calculated for all species pairs, and phylogenetic trees were reconstructed. The resulting trees were compared to the model tree. Results showed that the accuracy of tree topology and branch lengths improved with more loci. When the expected number of gene substitutions per locus (M) was 0.1 or more and at least 30 loci were used, topological errors were low, but the probability of correct topology remained below 0.5 even with 60 loci. When M was as low as 0.004, accuracy was significantly lower. UPGMA and the modified Farris method generally performed better than Farris' method, especially in estimating branch lengths. Farris' method often overestimated branch lengths, even when using Rogers' distance, which obeys the triangle inequality. Nei's standard distance was found to be better for estimating branch lengths due to its linear relationship with gene substitutions. Rogers' or Cavalli-Sforza's distance resulted in trees with condensed root areas and elongated other parts. The study recommends using more than 30 loci, including both polymorphic and monomorphic loci, for constructing phylogenetic trees. The conclusions apply to nucleotide difference data from restriction enzyme techniques. Key terms include UPGMA, Farris' method, modified Farris method, genetic distance, topological errors, branch length errors, and triangle inequality. The study highlights the importance of using sufficient loci and appropriate distance measures for accurate phylogenetic reconstruction.The accuracy and efficiency of three methods for constructing phylogenetic trees from gene frequency data were evaluated using computer simulations. The methods tested were UPGMA, Farris' method, and the modified Farris method. Simulations assumed eight species evolving according to a model tree, with allele frequency changes tracked using the infinite-allele model. Genetic distances were calculated for all species pairs, and phylogenetic trees were reconstructed. The resulting trees were compared to the model tree. Results showed that the accuracy of tree topology and branch lengths improved with more loci. When the expected number of gene substitutions per locus (M) was 0.1 or more and at least 30 loci were used, topological errors were low, but the probability of correct topology remained below 0.5 even with 60 loci. When M was as low as 0.004, accuracy was significantly lower. UPGMA and the modified Farris method generally performed better than Farris' method, especially in estimating branch lengths. Farris' method often overestimated branch lengths, even when using Rogers' distance, which obeys the triangle inequality. Nei's standard distance was found to be better for estimating branch lengths due to its linear relationship with gene substitutions. Rogers' or Cavalli-Sforza's distance resulted in trees with condensed root areas and elongated other parts. The study recommends using more than 30 loci, including both polymorphic and monomorphic loci, for constructing phylogenetic trees. The conclusions apply to nucleotide difference data from restriction enzyme techniques. Key terms include UPGMA, Farris' method, modified Farris method, genetic distance, topological errors, branch length errors, and triangle inequality. The study highlights the importance of using sufficient loci and appropriate distance measures for accurate phylogenetic reconstruction.
Reach us at info@study.space
Understanding Accuracy of estimated phylogenetic trees from molecular data