Received on April 1, 2004; revised on April 28, 2004; accepted on May 3, 2004 Advance Access publication May 14, 2004 | W. H. Majoros*, M. Pertea and S. L. Salzberg
The article introduces two new open-source gene-finding programs, TigrScan and GlimmerHMM, which are based on Generalized Hidden Markov Models (GHMMs). These programs are designed to be highly reusable and retrainable by end users, unlike most existing gene-finders. Both programs have been used at TIGR for annotating the genomes of *Aspergillus fumigatus* and *Toxoplasma gondii*. The authors highlight the modular and extensible nature of the programs, which allows for the independent combination of various probabilistic submodels such as Maximal Dependence Decomposition trees and interpolated Markov models. The methods section explains the mathematical framework of GHMMs and how they are used to predict gene models. The results section compares the performance of TigrScan and GlimmerHMM with Genscan+ on sets of Arabidopsis thaliana cDNAs and Aspergillus fumigatus CDSs, showing that both programs outperform Genscan+ in terms of exon sensitivity and specificity. The article also discusses the memory and time requirements of the programs and their ability to handle long sequences. The authors aim to facilitate further research by making the source code and documentation available under the open-source Artistic License.The article introduces two new open-source gene-finding programs, TigrScan and GlimmerHMM, which are based on Generalized Hidden Markov Models (GHMMs). These programs are designed to be highly reusable and retrainable by end users, unlike most existing gene-finders. Both programs have been used at TIGR for annotating the genomes of *Aspergillus fumigatus* and *Toxoplasma gondii*. The authors highlight the modular and extensible nature of the programs, which allows for the independent combination of various probabilistic submodels such as Maximal Dependence Decomposition trees and interpolated Markov models. The methods section explains the mathematical framework of GHMMs and how they are used to predict gene models. The results section compares the performance of TigrScan and GlimmerHMM with Genscan+ on sets of Arabidopsis thaliana cDNAs and Aspergillus fumigatus CDSs, showing that both programs outperform Genscan+ in terms of exon sensitivity and specificity. The article also discusses the memory and time requirements of the programs and their ability to handle long sequences. The authors aim to facilitate further research by making the source code and documentation available under the open-source Artistic License.