Gene finding in novel genomes

Gene finding in novel genomes

14 May 2004 | Ian Korf*
This research article discusses the challenges of gene prediction in novel genomes, where experimental data is limited. The authors introduce SNAP, a new ab initio gene finding program that is adaptable to various organisms and has freely available source code. They demonstrate that using a foreign gene finder for a novel genome can lead to inaccurate results, and that the most compatible parameters may not come from the nearest phylogenetic relative. Instead, they show that foreign gene finders can be used to bootstrap parameter estimation, leading to highly accurate results. SNAP was trained and evaluated on four genomes: Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, and Oryza sativa. The results show that SNAP is more accurate than Genscan in all genomes. In C. elegans, SNAP performs better than HMMGene and almost as well as Genefinder. In D. melanogaster, SNAP is similar to Augustus. The HMMs used in SNAP are trained for each genome, which contributes to its superior performance. The study also highlights that gene prediction in novel genomes can be highly inaccurate due to significant compositional differences between genomes. These differences include variations in codon frequency, splice site features, and translation start sites. The results show that foreign parameters may not perform well in a novel genome, and that choosing the best foreign gene finder is not simply a matter of using parameters from the closest relative. The authors also discuss the use of bootstrapped parameters derived from foreign gene finders to train a gene finder for a novel genome. They show that this approach can be effective, with bootstrapped parameters performing well in many cases. However, in some genomes, such as Oryza sativa, bootstrapped parameters are only somewhat helpful. The study concludes that gene prediction is sensitive to species-specific parameters, and that every genome needs a dedicated gene finder. The authors emphasize the importance of developing gene finders that are specifically adapted to each genome to ensure accurate gene prediction.This research article discusses the challenges of gene prediction in novel genomes, where experimental data is limited. The authors introduce SNAP, a new ab initio gene finding program that is adaptable to various organisms and has freely available source code. They demonstrate that using a foreign gene finder for a novel genome can lead to inaccurate results, and that the most compatible parameters may not come from the nearest phylogenetic relative. Instead, they show that foreign gene finders can be used to bootstrap parameter estimation, leading to highly accurate results. SNAP was trained and evaluated on four genomes: Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, and Oryza sativa. The results show that SNAP is more accurate than Genscan in all genomes. In C. elegans, SNAP performs better than HMMGene and almost as well as Genefinder. In D. melanogaster, SNAP is similar to Augustus. The HMMs used in SNAP are trained for each genome, which contributes to its superior performance. The study also highlights that gene prediction in novel genomes can be highly inaccurate due to significant compositional differences between genomes. These differences include variations in codon frequency, splice site features, and translation start sites. The results show that foreign parameters may not perform well in a novel genome, and that choosing the best foreign gene finder is not simply a matter of using parameters from the closest relative. The authors also discuss the use of bootstrapped parameters derived from foreign gene finders to train a gene finder for a novel genome. They show that this approach can be effective, with bootstrapped parameters performing well in many cases. However, in some genomes, such as Oryza sativa, bootstrapped parameters are only somewhat helpful. The study concludes that gene prediction is sensitive to species-specific parameters, and that every genome needs a dedicated gene finder. The authors emphasize the importance of developing gene finders that are specifically adapted to each genome to ensure accurate gene prediction.
Reach us at info@study.space