2008 | Mario Stanke*, Mark Diekhans, Robert Baertsch and David Haussler
This paper presents an improved gene prediction method called AUGUSTUS, which integrates multiple types of evidence to enhance the accuracy of de novo gene finding. The method incorporates gene and transcript annotations from related species syntenically mapped to the target genome, evolutionary conservation of DNA, mRNA, and ESTs, and retroposed genes. The predictions include alternative splice variants where evidence supports it. Using only ESTs, AUGUSTUS correctly predicts at least one splice form exactly correct in 57% of human genes. When using evidence from other species and human mRNAs, this number rises to 77%. Syntenic mapping is well-suited for annotating genomes closely related to already annotated genomes or those with extensive transcript evidence. Native cDNA evidence is most helpful when used as compound information rather than independent positionwise information.
AUGUSTUS is open source and available at http://augustus.gobics.de. The gene predictions for human can be browsed and downloaded at the UCSC Genome Browser (http://genome.ucsc.edu). The method uses extrinsic information from conservation, native EST and mRNA alignments, and alien transcript alignments to predict up to 77% of genes correctly on the human genome when using all information. The method is very general and allows users to provide evidence for a gene structure from other sources of extrinsic evidence as input.
The method incorporates hints from extrinsic evidence, which are computed or collected beforehand and given as input to AUGUSTUS in the form of 'hints' in a file in GFF format. Hints are uncertain local pieces of information about the gene structure of the input sequence. The method also incorporates alternative splicing, allowing for the prediction of multiple splice forms. The method also pre-processes hints, discarding those that are unsatisfiable or have a suspiciously high number of incompatible other hints. The method also allows for the generation of hints from various sources, including TransMAP, cDNA alignments, and conservation.
The method was tested on the human ENCODE regions and compared to several other programs, including N-SCAN/EST and JIGSAW. The results showed that AUGUSTUS is significantly more accurate than N-SCAN/EST when using only ESTs and genomic conservation. The method was also tested on genomes with no transcribed data, but where the gene annotation of related genomes is available. The results showed that AUGUSTUS +T is not very much behind the methods that use human mRNAs besides other evidence. The method was also applied to annotate the genes in Galdieria sulphuraria, showing that the method works well even with shorter ESTs from 454 sequencing. The method is particularly strong in two major settings: when the major source of gene evidence for the genome is ESTs and when one or more well-annotated informant genomes are available that are related closely enough to show synteny. The method is very well-suited to annotate mammThis paper presents an improved gene prediction method called AUGUSTUS, which integrates multiple types of evidence to enhance the accuracy of de novo gene finding. The method incorporates gene and transcript annotations from related species syntenically mapped to the target genome, evolutionary conservation of DNA, mRNA, and ESTs, and retroposed genes. The predictions include alternative splice variants where evidence supports it. Using only ESTs, AUGUSTUS correctly predicts at least one splice form exactly correct in 57% of human genes. When using evidence from other species and human mRNAs, this number rises to 77%. Syntenic mapping is well-suited for annotating genomes closely related to already annotated genomes or those with extensive transcript evidence. Native cDNA evidence is most helpful when used as compound information rather than independent positionwise information.
AUGUSTUS is open source and available at http://augustus.gobics.de. The gene predictions for human can be browsed and downloaded at the UCSC Genome Browser (http://genome.ucsc.edu). The method uses extrinsic information from conservation, native EST and mRNA alignments, and alien transcript alignments to predict up to 77% of genes correctly on the human genome when using all information. The method is very general and allows users to provide evidence for a gene structure from other sources of extrinsic evidence as input.
The method incorporates hints from extrinsic evidence, which are computed or collected beforehand and given as input to AUGUSTUS in the form of 'hints' in a file in GFF format. Hints are uncertain local pieces of information about the gene structure of the input sequence. The method also incorporates alternative splicing, allowing for the prediction of multiple splice forms. The method also pre-processes hints, discarding those that are unsatisfiable or have a suspiciously high number of incompatible other hints. The method also allows for the generation of hints from various sources, including TransMAP, cDNA alignments, and conservation.
The method was tested on the human ENCODE regions and compared to several other programs, including N-SCAN/EST and JIGSAW. The results showed that AUGUSTUS is significantly more accurate than N-SCAN/EST when using only ESTs and genomic conservation. The method was also tested on genomes with no transcribed data, but where the gene annotation of related genomes is available. The results showed that AUGUSTUS +T is not very much behind the methods that use human mRNAs besides other evidence. The method was also applied to annotate the genes in Galdieria sulphuraria, showing that the method works well even with shorter ESTs from 454 sequencing. The method is particularly strong in two major settings: when the major source of gene evidence for the genome is ESTs and when one or more well-annotated informant genomes are available that are related closely enough to show synteny. The method is very well-suited to annotate mamm