May 11, 2017 | Vincent Lefort, Jean-Emmanuel Longueville, and Olivier Gascuel
The paper introduces "Smart Model Selection" (SMS), a software tool for phylogenetic analysis that improves model selection efficiency. SMS is integrated into the PhyML environment and offers two interfaces: a command-line version for integration into pipelines and a web server (http://www.atgc-montpellier.fr/phyml-sms/). SMS uses heuristic strategies to avoid testing all possible models, reducing computation time by about 50% compared to exhaustive methods. It performs well compared to ProtTest and jModelTest2. SMS provides a simple, user-friendly interface and is freely available.
For proteins, SMS includes 17 substitution matrices and allows users to add their own. It uses two options to model rate variation across sites: +Γ (gamma distribution) and +Γ+I (invariant sites). SMS evaluates only about 30 models on average, compared to 120 in exhaustive methods. With AIC and BIC criteria, SMS selects the best model by first estimating branch lengths and model parameters using BioNJ tree topology, then selecting the best substitution matrix and +F/−F option based on matrix similarity to data frequencies.
For DNA, SMS uses four substitution matrices (GTR, TN93, HKY85, K80) combined with four RAS options, resulting in 16 models. SMS evaluates about 6 models on average with AIC and 7.5 with BIC, reducing computation time by about 50% compared to exhaustive methods. SMS selects the best model by comparing GTR with TN93, HKY85, and K80 in a stepwise manner.
SMS performs well compared to exhaustive methods and other tools like jModelTest2 and ProtTest. It is faster than ProtTest due to tailored heuristics and provides better models in some cases. SMS uses specific substitution matrices (e.g., MtZoa for proteins and TN93 for DNA) not available in other tools. For proteins, SMS and ProtTest often select the same model, but SMS tends to select better models in some cases. For DNA, SMS outperforms jModelTest2 in most cases. SMS is significantly faster than ProtTest, with the largest MSA processed in about 20 hours compared to over 100 hours for ProtTest. The results are consistent across different data sets, confirming the effectiveness of SMS in phylogenetic analysis.The paper introduces "Smart Model Selection" (SMS), a software tool for phylogenetic analysis that improves model selection efficiency. SMS is integrated into the PhyML environment and offers two interfaces: a command-line version for integration into pipelines and a web server (http://www.atgc-montpellier.fr/phyml-sms/). SMS uses heuristic strategies to avoid testing all possible models, reducing computation time by about 50% compared to exhaustive methods. It performs well compared to ProtTest and jModelTest2. SMS provides a simple, user-friendly interface and is freely available.
For proteins, SMS includes 17 substitution matrices and allows users to add their own. It uses two options to model rate variation across sites: +Γ (gamma distribution) and +Γ+I (invariant sites). SMS evaluates only about 30 models on average, compared to 120 in exhaustive methods. With AIC and BIC criteria, SMS selects the best model by first estimating branch lengths and model parameters using BioNJ tree topology, then selecting the best substitution matrix and +F/−F option based on matrix similarity to data frequencies.
For DNA, SMS uses four substitution matrices (GTR, TN93, HKY85, K80) combined with four RAS options, resulting in 16 models. SMS evaluates about 6 models on average with AIC and 7.5 with BIC, reducing computation time by about 50% compared to exhaustive methods. SMS selects the best model by comparing GTR with TN93, HKY85, and K80 in a stepwise manner.
SMS performs well compared to exhaustive methods and other tools like jModelTest2 and ProtTest. It is faster than ProtTest due to tailored heuristics and provides better models in some cases. SMS uses specific substitution matrices (e.g., MtZoa for proteins and TN93 for DNA) not available in other tools. For proteins, SMS and ProtTest often select the same model, but SMS tends to select better models in some cases. For DNA, SMS outperforms jModelTest2 in most cases. SMS is significantly faster than ProtTest, with the largest MSA processed in about 20 hours compared to over 100 hours for ProtTest. The results are consistent across different data sets, confirming the effectiveness of SMS in phylogenetic analysis.