Identifying bacterial genes and endosymbiont DNA with Glimmer

Identifying bacterial genes and endosymbiont DNA with Glimmer

January 19, 2007 | Arthur L. Delcher, Kirsten A. Bratke, Edwin C. Powers, Steven L. Salzberg
The paper presents an improved version of the Glimmer gene-finding software, Glimmer 3.0, which has been enhanced to better identify bacterial and endosymbiont genes. The new version improves the accuracy of gene prediction by reducing false positives and increasing the detection of correct start sites. It also includes a new module that distinguishes between host and endosymbiont DNA, which is crucial for genome sequencing projects that inadvertently capture endosymbiont DNA. The Glimmer system uses an interpolated Markov model (IMM) to predict genes. The new version of Glimmer improves the IMM by scanning open reading frames (ORFs) in reverse, from the stop codon back toward the start codon, which increases the accuracy of start site prediction. Additionally, Glimmer 3.0 integrates ribosome binding site (RBS) evidence directly into the gene-finding algorithm, improving the accuracy of gene predictions. The new version also reduces overlapping gene predictions, which were a problem in previous versions. It uses a dynamic programming algorithm to select the set of ORFs and start sites with the highest total score, ensuring that no overlaps exceed a specified maximum. This results in a more accurate and specific set of gene predictions. The Glimmer 3.0 system also includes an improved training process that uses a new routine to filter ORFs based on amino-acid composition, which helps to eliminate ORFs that are unlikely to be protein-coding genes. This results in a more accurate training set for the IMM. The paper compares the performance of Glimmer 3.0 with previous versions and other gene-finding systems, showing that Glimmer 3.0 has a higher accuracy and specificity in predicting genes. It also demonstrates the effectiveness of the new module in separating host and endosymbiont DNA in a recent genome project. Overall, the new Glimmer 3.0 system represents a significant improvement over previous versions, with enhanced accuracy, specificity, and the ability to distinguish between host and endosymbiont DNA. It is available as open-source software and is freely accessible.The paper presents an improved version of the Glimmer gene-finding software, Glimmer 3.0, which has been enhanced to better identify bacterial and endosymbiont genes. The new version improves the accuracy of gene prediction by reducing false positives and increasing the detection of correct start sites. It also includes a new module that distinguishes between host and endosymbiont DNA, which is crucial for genome sequencing projects that inadvertently capture endosymbiont DNA. The Glimmer system uses an interpolated Markov model (IMM) to predict genes. The new version of Glimmer improves the IMM by scanning open reading frames (ORFs) in reverse, from the stop codon back toward the start codon, which increases the accuracy of start site prediction. Additionally, Glimmer 3.0 integrates ribosome binding site (RBS) evidence directly into the gene-finding algorithm, improving the accuracy of gene predictions. The new version also reduces overlapping gene predictions, which were a problem in previous versions. It uses a dynamic programming algorithm to select the set of ORFs and start sites with the highest total score, ensuring that no overlaps exceed a specified maximum. This results in a more accurate and specific set of gene predictions. The Glimmer 3.0 system also includes an improved training process that uses a new routine to filter ORFs based on amino-acid composition, which helps to eliminate ORFs that are unlikely to be protein-coding genes. This results in a more accurate training set for the IMM. The paper compares the performance of Glimmer 3.0 with previous versions and other gene-finding systems, showing that Glimmer 3.0 has a higher accuracy and specificity in predicting genes. It also demonstrates the effectiveness of the new module in separating host and endosymbiont DNA in a recent genome project. Overall, the new Glimmer 3.0 system represents a significant improvement over previous versions, with enhanced accuracy, specificity, and the ability to distinguish between host and endosymbiont DNA. It is available as open-source software and is freely accessible.
Reach us at info@study.space