Received February 14, 2004; Revised and Accepted March 15, 2004 | Mario Stanke*, Rasmus Steinkamp, Stephan Waack1 and Burkhard Morgenstern
The article introduces AUGUSTUS, a web server for gene prediction in eukaryotic genomic sequences. AUGUSTUS is a novel software program based on a generalized Hidden Markov Model (GHMM) with an improved method for modeling intron length distribution. This method allows for more accurate approximation of the true intron length distribution compared to existing programs. The accuracy of AUGUSTUS is superior to existing gene-finding approaches, especially for larger input sequences containing multiple genes. The server is available at http://augustus.gobics.de and supports uploading DNA sequences in FASTA format or multiple FASTA formats. It offers two pre-trained parameter sets for human and Drosophila, with plans to add more species. AUGUSTUS provides two 'expert options' for predicting genes, including the ability to ignore conflicts between gene structures on different strands. The output includes both graphical and text formats, with results in the General Feature Format (GFF). Future work includes integrating external information and utilizing homology information from alignment programs like DIALIGN.The article introduces AUGUSTUS, a web server for gene prediction in eukaryotic genomic sequences. AUGUSTUS is a novel software program based on a generalized Hidden Markov Model (GHMM) with an improved method for modeling intron length distribution. This method allows for more accurate approximation of the true intron length distribution compared to existing programs. The accuracy of AUGUSTUS is superior to existing gene-finding approaches, especially for larger input sequences containing multiple genes. The server is available at http://augustus.gobics.de and supports uploading DNA sequences in FASTA format or multiple FASTA formats. It offers two pre-trained parameter sets for human and Drosophila, with plans to add more species. AUGUSTUS provides two 'expert options' for predicting genes, including the ability to ignore conflicts between gene structures on different strands. The output includes both graphical and text formats, with results in the General Feature Format (GFF). Future work includes integrating external information and utilizing homology information from alignment programs like DIALIGN.