April 17, 2009 | Morgan N. Price, Paramvir S. Dehal, and Adam P. Arkin
FastTree is a method for constructing large phylogenies and estimating their reliability. Instead of storing a distance matrix, FastTree stores sequence profiles of internal nodes in the tree. It uses these profiles to implement Neighbor-Joining and uses heuristics to quickly identify candidate joins. FastTree then uses nearest neighbor interchanges to reduce the length of the tree. For an alignment with N sequences, L sites, and a different character, FastTree requires O(NLa + N√N) memory and O(N√N log(N) La) time, which is significantly less than the O(N²) space and O(N²L) time required for a distance matrix. FastTree also uses local bootstrapping to estimate tree reliability, providing a 100-fold speedup over distance matrix methods. For example, FastTree computed a tree and support values for 158,022 distinct 16S ribosomal RNAs in 17 hours and 2.4 GB of memory. FastTree is more accurate than Neighbor-Joining, BIONJ, or FastME in simulations and has higher likelihoods on genuine alignments. It is available at http://microbesonline.org/fasttree. FastTree uses four key ideas to reduce space and time complexity: storing profiles instead of a distance matrix, using heuristics to reduce joins, refining the initial topology with NNIs, and computing local bootstrap values. FastTree is efficient for large alignments and provides accurate phylogenies with support values. It is faster and more memory-efficient than distance matrix methods and is suitable for large gene families. FastTree is available for use and is recommended for phylogenetic analysis of large datasets.FastTree is a method for constructing large phylogenies and estimating their reliability. Instead of storing a distance matrix, FastTree stores sequence profiles of internal nodes in the tree. It uses these profiles to implement Neighbor-Joining and uses heuristics to quickly identify candidate joins. FastTree then uses nearest neighbor interchanges to reduce the length of the tree. For an alignment with N sequences, L sites, and a different character, FastTree requires O(NLa + N√N) memory and O(N√N log(N) La) time, which is significantly less than the O(N²) space and O(N²L) time required for a distance matrix. FastTree also uses local bootstrapping to estimate tree reliability, providing a 100-fold speedup over distance matrix methods. For example, FastTree computed a tree and support values for 158,022 distinct 16S ribosomal RNAs in 17 hours and 2.4 GB of memory. FastTree is more accurate than Neighbor-Joining, BIONJ, or FastME in simulations and has higher likelihoods on genuine alignments. It is available at http://microbesonline.org/fasttree. FastTree uses four key ideas to reduce space and time complexity: storing profiles instead of a distance matrix, using heuristics to reduce joins, refining the initial topology with NNIs, and computing local bootstrap values. FastTree is efficient for large alignments and provides accurate phylogenies with support values. It is faster and more memory-efficient than distance matrix methods and is suitable for large gene families. FastTree is available for use and is recommended for phylogenetic analysis of large datasets.