2005, Vol. 33, Web Server issue | John Besemer and Mark Borodovsky
The GeneMark web software, available at http://opal.biology.gatech.edu/GeneMark/, is a powerful tool for gene prediction in prokaryotic, eukaryotic, and viral genomes. The software includes two main programs, GeneMark and GeneMark.hmm, which use Markov chain models to identify protein-coding regions. The website provides species-specific models and pre-computed gene models for nearly 200 prokaryotic and over 10 eukaryotic genomes. For novel prokaryotic sequences, GeneMark offers a heuristic approach or the self-training program GeneMarkS for longer sequences. The eukaryotic version of GeneMark.hmm includes extended HMM architecture for splice sites, translation initiation, and interrupted genes. The software is regularly updated to include the latest versions of the programs and gene models. The GeneMark website also features a database of reannotations for over 1000 viral genomes. The accuracy of the *ab initio* gene finders is highly dependent on the selection of appropriate training data and sound model creation methods. The software's output includes predicted gene locations, lengths, and classes, along with graphical representations that help identify regions of interest. Future developments aim to improve the detection of rRNA and tRNA genes and enhance the accuracy of gene start predictions.The GeneMark web software, available at http://opal.biology.gatech.edu/GeneMark/, is a powerful tool for gene prediction in prokaryotic, eukaryotic, and viral genomes. The software includes two main programs, GeneMark and GeneMark.hmm, which use Markov chain models to identify protein-coding regions. The website provides species-specific models and pre-computed gene models for nearly 200 prokaryotic and over 10 eukaryotic genomes. For novel prokaryotic sequences, GeneMark offers a heuristic approach or the self-training program GeneMarkS for longer sequences. The eukaryotic version of GeneMark.hmm includes extended HMM architecture for splice sites, translation initiation, and interrupted genes. The software is regularly updated to include the latest versions of the programs and gene models. The GeneMark website also features a database of reannotations for over 1000 viral genomes. The accuracy of the *ab initio* gene finders is highly dependent on the selection of appropriate training data and sound model creation methods. The software's output includes predicted gene locations, lengths, and classes, along with graphical representations that help identify regions of interest. Future developments aim to improve the detection of rRNA and tRNA genes and enhance the accuracy of gene start predictions.