March 10, 2006 | Tijl De Bie¹, Nello Cristianini², Jeffery P. Demuth³ and Matthew W. Hahn³,*
CAFE is a computational tool for analyzing the evolution of gene family sizes in a phylogenetic context. It uses a stochastic birth and death process to model gene family size changes over a phylogeny. Given a phylogenetic tree and gene family sizes in extant species, CAFE estimates birth and death rates, infers ancestral gene family sizes, identifies gene families with accelerated gain or loss, and determines which branches are responsible for significant changes. The tool can be run via a graphical interface or command-line version, with a user manual available online.
CAFE's main inputs include a Newick-formatted phylogenetic tree with branch lengths and a data file containing gene family sizes for extant species. The data file may include information on one or thousands of families. A third input is λ, the probability of gene gain and loss per gene per unit time. Users can specify λ or let CAFE estimate it. The tool calculates the likelihood of data for multiple values of λ and uses the most likely value for analysis.
Given a phylogenetic tree, gene family sizes, and λ, CAFE calculates the most likely ancestral gene family sizes using a graphical model. It then infers the direction and size of changes in gene family sizes along each branch. The Viterbi assignments are reported in the main output file, along with average expansion/contraction values and the number of families that change on each branch.
For each gene family, CAFE computes a p-value based on the model. Families with large size variance, especially among closely related species, are likely to have low p-values. Low p-values indicate significant changes, possibly due to natural selection or large duplications/deletions.
For gene families with small p-values, CAFE identifies branches where the largest changes occurred. Three methods are used: Viterbi, branch cutting, and likelihood ratio test. These methods help identify branches where the model is violated.
CAFE is implemented in Java and runs on Mac OS X, Windows, or Linux systems with Java 1.5. It generates output files including settings, logs, and main results. Monte Carlo sampling is used to calculate p-values, with 1000 samples generally sufficient. Caching is used to speed up calculations, with an upper bound on gene family sizes chosen based on the largest family in the data.
The likelihood ratio test is computationally intensive, requiring additional caching for different λ values. Users can specify which methods to use and a minimum p-value for further analysis. CAFE is supported by various grants and acknowledges contributions from other researchers.CAFE is a computational tool for analyzing the evolution of gene family sizes in a phylogenetic context. It uses a stochastic birth and death process to model gene family size changes over a phylogeny. Given a phylogenetic tree and gene family sizes in extant species, CAFE estimates birth and death rates, infers ancestral gene family sizes, identifies gene families with accelerated gain or loss, and determines which branches are responsible for significant changes. The tool can be run via a graphical interface or command-line version, with a user manual available online.
CAFE's main inputs include a Newick-formatted phylogenetic tree with branch lengths and a data file containing gene family sizes for extant species. The data file may include information on one or thousands of families. A third input is λ, the probability of gene gain and loss per gene per unit time. Users can specify λ or let CAFE estimate it. The tool calculates the likelihood of data for multiple values of λ and uses the most likely value for analysis.
Given a phylogenetic tree, gene family sizes, and λ, CAFE calculates the most likely ancestral gene family sizes using a graphical model. It then infers the direction and size of changes in gene family sizes along each branch. The Viterbi assignments are reported in the main output file, along with average expansion/contraction values and the number of families that change on each branch.
For each gene family, CAFE computes a p-value based on the model. Families with large size variance, especially among closely related species, are likely to have low p-values. Low p-values indicate significant changes, possibly due to natural selection or large duplications/deletions.
For gene families with small p-values, CAFE identifies branches where the largest changes occurred. Three methods are used: Viterbi, branch cutting, and likelihood ratio test. These methods help identify branches where the model is violated.
CAFE is implemented in Java and runs on Mac OS X, Windows, or Linux systems with Java 1.5. It generates output files including settings, logs, and main results. Monte Carlo sampling is used to calculate p-values, with 1000 samples generally sufficient. Caching is used to speed up calculations, with an upper bound on gene family sizes chosen based on the largest family in the data.
The likelihood ratio test is computationally intensive, requiring additional caching for different λ values. Users can specify which methods to use and a minimum p-value for further analysis. CAFE is supported by various grants and acknowledges contributions from other researchers.