Understanding KAAS%3A an automatic genome annotation and pathway reconstruction server

KAAS is a web-based server for automatic genome annotation and pathway reconstruction. It assigns K numbers (KEGG Orthology identifiers) to genes in a genome, enabling the reconstruction of KEGG pathways and BRITE functional hierarchies. The method is based on sequence similarity, bidirectional best hit information, and heuristics. It has achieved high accuracy when compared to manually curated KEGG GENES database. The server uses BLAST scores to identify homologs and selects ortholog candidates based on BLAST scores and bidirectional hit rate (BHR). The BHR is defined as the product of forward and reverse BLAST scores. Ortholog candidates are grouped into KO groups, and the K number of the group with the highest score is assigned to the query sequence. The assignment score is calculated based on the likelihood and heuristics. The server supports both bidirectional best hit (BBH) and single-directional best hit (SBH) methods. The BBH method is preferred for complete genome queries, while the SBH method is suitable for limited ORFs or ESTs. Users can input FASTA formatted ORFs or ESTs. The server provides three views of the analyzed data: KO list, KO hierarchy, and pathway map. The KO list is a flat list of query genes with K numbers. The KO hierarchy is a hierarchical list of annotated genes. The pathway map is a list of pathways with links to graphical pathway maps. The server uses the latest KEGG GENES entries as reference data. It provides a representative set of species to reduce computation time without drastically lowering accuracy. The computation time for a prokaryotic genome with about 4000 amino acid sequences takes approximately 1 hour. The accuracy of KAAS was tested by reassigning K numbers to selected organisms in the manually curated KEGG GENES database. The results showed that the PPV for human gene reassignment was more than 90%, and for E. coli, the accuracy was higher than that of human. The accuracy for Arabidopsis was lower due to the lack of plant species in the KEGG GENES database. KAAS is a rapid and high-performance tool for genome annotation, as many closely related organisms are now included in the KEGG GENES database. The accuracy of assignment for plants is expected to improve as more plant genome projects are processed.KAAS is a web-based server for automatic genome annotation and pathway reconstruction. It assigns K numbers (KEGG Orthology identifiers) to genes in a genome, enabling the reconstruction of KEGG pathways and BRITE functional hierarchies. The method is based on sequence similarity, bidirectional best hit information, and heuristics. It has achieved high accuracy when compared to manually curated KEGG GENES database. The server uses BLAST scores to identify homologs and selects ortholog candidates based on BLAST scores and bidirectional hit rate (BHR). The BHR is defined as the product of forward and reverse BLAST scores. Ortholog candidates are grouped into KO groups, and the K number of the group with the highest score is assigned to the query sequence. The assignment score is calculated based on the likelihood and heuristics. The server supports both bidirectional best hit (BBH) and single-directional best hit (SBH) methods. The BBH method is preferred for complete genome queries, while the SBH method is suitable for limited ORFs or ESTs. Users can input FASTA formatted ORFs or ESTs. The server provides three views of the analyzed data: KO list, KO hierarchy, and pathway map. The KO list is a flat list of query genes with K numbers. The KO hierarchy is a hierarchical list of annotated genes. The pathway map is a list of pathways with links to graphical pathway maps. The server uses the latest KEGG GENES entries as reference data. It provides a representative set of species to reduce computation time without drastically lowering accuracy. The computation time for a prokaryotic genome with about 4000 amino acid sequences takes approximately 1 hour. The accuracy of KAAS was tested by reassigning K numbers to selected organisms in the manually curated KEGG GENES database. The results showed that the PPV for human gene reassignment was more than 90%, and for E. coli, the accuracy was higher than that of human. The accuracy for Arabidopsis was lower due to the lack of plant species in the KEGG GENES database. KAAS is a rapid and high-performance tool for genome annotation, as many closely related organisms are now included in the KEGG GENES database. The accuracy of assignment for plants is expected to improve as more plant genome projects are processed.

KAAS: an automatic genome annotation and pathway reconstruction server

2007 | Yuki Moriya, Masumi Itoh, Shujiro Okuda, Akiyasu C. Yoshizawa and Minoru Kanehisa