1983 | Masatoshi Nei, Fumio Tajima, and Yoshio Tateno*
This study examines the accuracy and efficiency of three methods—UPGMA, Farris' method, and Tateno et al.'s modified Farris method—used to construct phylogenetic trees from gene frequency data through computer simulations. The simulations assume eight species evolving according to a given model tree, with allele frequency changes tracked using the infinite-allele model. Genetic distances (Nei's standard and minimum distances, Rogers' distance, Cavalli-Sforza's $f_{ij}$, and the modified Cavalli-Sforza distance) are calculated for all pairs of species, and the resulting distance matrices are used to reconstruct phylogenetic trees. The accuracy of the reconstructed trees is compared with the model tree.
Key findings include:
- The accuracy of both the topology and branch lengths of the reconstructed trees is low when the number of loci is less than 20 but improves with more loci.
- When the expected number of gene substitutions (M) for the shortest branch is 0.1 or more per locus and 30 or more loci are used, the topological error (measured by the distortion index $d_{tr}$) is not significant, but the probability of obtaining the correct topology (P) is less than 0.5 even with 60 loci.
- UPGMA and the modified Farris method generally perform better than Farris' method, which often overestimates branch lengths.
- Nei's standard distance performs well for estimating expected branch lengths due to its linear relationship with the number of gene substitutions.
- Rogers' and Cavalli-Sforza's distances result in trees where the parts near the root are condensed and the other parts are elongated.
- It is recommended to use more than 30 loci, including both polymorphic and monomorphic loci, for constructing phylogenetic trees.
The study also discusses the limitations of previous methods and the applicability of the infinite-allele model to real-world allele frequency data.This study examines the accuracy and efficiency of three methods—UPGMA, Farris' method, and Tateno et al.'s modified Farris method—used to construct phylogenetic trees from gene frequency data through computer simulations. The simulations assume eight species evolving according to a given model tree, with allele frequency changes tracked using the infinite-allele model. Genetic distances (Nei's standard and minimum distances, Rogers' distance, Cavalli-Sforza's $f_{ij}$, and the modified Cavalli-Sforza distance) are calculated for all pairs of species, and the resulting distance matrices are used to reconstruct phylogenetic trees. The accuracy of the reconstructed trees is compared with the model tree.
Key findings include:
- The accuracy of both the topology and branch lengths of the reconstructed trees is low when the number of loci is less than 20 but improves with more loci.
- When the expected number of gene substitutions (M) for the shortest branch is 0.1 or more per locus and 30 or more loci are used, the topological error (measured by the distortion index $d_{tr}$) is not significant, but the probability of obtaining the correct topology (P) is less than 0.5 even with 60 loci.
- UPGMA and the modified Farris method generally perform better than Farris' method, which often overestimates branch lengths.
- Nei's standard distance performs well for estimating expected branch lengths due to its linear relationship with the number of gene substitutions.
- Rogers' and Cavalli-Sforza's distances result in trees where the parts near the root are condensed and the other parts are elongated.
- It is recommended to use more than 30 loci, including both polymorphic and monomorphic loci, for constructing phylogenetic trees.
The study also discusses the limitations of previous methods and the applicability of the infinite-allele model to real-world allele frequency data.