GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses

GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses

2005 | John Besemer and Mark Borodovsky
GeneMark is a web-based software tool for gene prediction in prokaryotic, eukaryotic, and viral genomes. It provides interfaces to the GeneMark family of programs, which are ab initio gene finders that do not rely on external sequence information. These programs use Markov chain models to predict genes in genomic sequences. The GeneMark website allows analysis of nearly 200 prokaryotic and over 10 eukaryotic genomes using species-specific models and pre-computed gene models. For novel genomes, models can be generated on the fly using heuristic or self-training methods. A database of reannotations of over 1000 viral genomes is also available. The GeneMark family includes GeneMark and GeneMark.hmm. GeneMark uses Bayesian methods to calculate the probability of genetic code presence in DNA sequences, while GeneMark.hmm uses a hidden Markov model (HMM) and the generalized Viterbi algorithm to determine the most likely sequence of hidden states. The HMM architecture can be adapted to fit the structure of different genomes. The prokaryotic version of GeneMark.hmm includes hidden states for ribosomal binding sites, uninterrupted genes, and gene overlaps, while the eukaryotic version includes states for splice sites, Kozak sites, and interrupted genes. The GeneMark website supports the analysis of prokaryotic DNA sequences using 175 pre-computed species-specific models. It also allows the use of a self-training program, GeneMarkS, for longer sequences. The eukaryotic version of GeneMark.hmm is available for 11 eukaryotic genomes and has shown high accuracy in predicting genes in plant genomes like rice and Arabidopsis. The eukaryotic version also includes a special version, GeneMark.SPL, for analyzing cDNA and EST sequences without introns. The GeneMark program provides outputs such as lists of predicted genes, regions of interest, and information on start and stop codons. It also includes options for using different genetic codes and models for RBS and Kozak sites. The GeneMark program is complemented by GeneMark.hmm, which uses the generalized Viterbi algorithm to predict gene structures. The two programs are designed to work together, with GeneMark.hmm being particularly useful for identifying exon-intron structures in eukaryotic DNA. The GeneMark web software is frequently updated and has been used extensively, with over 21,000 nucleotide sequences and 329,000 protein sequences in GenBank referencing the programs. Future developments aim to improve the detection of genomic elements such as rRNA and tRNA genes and enhance the identification of gene 5' ends. The software is supported by research grants from the US National Institutes of Health and is available on the GeneMark website.GeneMark is a web-based software tool for gene prediction in prokaryotic, eukaryotic, and viral genomes. It provides interfaces to the GeneMark family of programs, which are ab initio gene finders that do not rely on external sequence information. These programs use Markov chain models to predict genes in genomic sequences. The GeneMark website allows analysis of nearly 200 prokaryotic and over 10 eukaryotic genomes using species-specific models and pre-computed gene models. For novel genomes, models can be generated on the fly using heuristic or self-training methods. A database of reannotations of over 1000 viral genomes is also available. The GeneMark family includes GeneMark and GeneMark.hmm. GeneMark uses Bayesian methods to calculate the probability of genetic code presence in DNA sequences, while GeneMark.hmm uses a hidden Markov model (HMM) and the generalized Viterbi algorithm to determine the most likely sequence of hidden states. The HMM architecture can be adapted to fit the structure of different genomes. The prokaryotic version of GeneMark.hmm includes hidden states for ribosomal binding sites, uninterrupted genes, and gene overlaps, while the eukaryotic version includes states for splice sites, Kozak sites, and interrupted genes. The GeneMark website supports the analysis of prokaryotic DNA sequences using 175 pre-computed species-specific models. It also allows the use of a self-training program, GeneMarkS, for longer sequences. The eukaryotic version of GeneMark.hmm is available for 11 eukaryotic genomes and has shown high accuracy in predicting genes in plant genomes like rice and Arabidopsis. The eukaryotic version also includes a special version, GeneMark.SPL, for analyzing cDNA and EST sequences without introns. The GeneMark program provides outputs such as lists of predicted genes, regions of interest, and information on start and stop codons. It also includes options for using different genetic codes and models for RBS and Kozak sites. The GeneMark program is complemented by GeneMark.hmm, which uses the generalized Viterbi algorithm to predict gene structures. The two programs are designed to work together, with GeneMark.hmm being particularly useful for identifying exon-intron structures in eukaryotic DNA. The GeneMark web software is frequently updated and has been used extensively, with over 21,000 nucleotide sequences and 329,000 protein sequences in GenBank referencing the programs. Future developments aim to improve the detection of genomic elements such as rRNA and tRNA genes and enhance the identification of gene 5' ends. The software is supported by research grants from the US National Institutes of Health and is available on the GeneMark website.
Reach us at info@study.space
[slides and audio] GeneMark%3A web software for gene finding in prokaryotes%2C eukaryotes and viruses